HW1.1 Image Manipulation

HW1.1 Image Manipulation

A1.1. Load image and determine information of this image.

A1.2. Convert the data type of image to double and save it to imd, which is MATLAB's default data type required by many built-in functions. Data of this type lies in range [0,1].

  • There is no difference when displaying im and imd using imshow().

  • Because imshow() automatically recognizes the type of image and devises the correct range based on that for display.

Draw a 2 pixel wide black border without changing the size, save the new image to im_framed.

A1.3. Coursework platform accepts a tolerance of 0.01. Add Gaussian noise with σ=0.005 to the image. How many pixels do you expect to lie outside the toleranc range?

  • Given

  • We use


C1.2. Copy each second row and each third column of im_framed, save it to im_part.

A1.4. Draw a red frame around the image using plot() function.

  • plot() does not change the image data, and is not constrained to integer pixel positions.

C1.3. Use Kronecker product to create colorized copies of grayscale image im


Calculate each color channel separately using


Merge these channels to a color image.

HW1.2. Periodic Sequences

C2.1. Compute atomic image elements from im specified by the periodicity matrix pm, given the location loc_base of the topmost pixel (x,y).

Notice that index x in an image corresponds to the column/second index in a matrix, and index y in an image corresponds to the row/first index in a matrix.

C2.2. Implement an inverse operation: generate a periodic image given atomic element.?

C2.3. Calculate NCC between two equal-sized periodic images.


N is number of elements.

C2.4. Find the atomic element of a given periodic image using exhausive search and correlation.?

A2.1. Why do you observe several maxima in the NCC values of the whole image?

  • Because the considered image has partial repetitive parts, which can be matched to the whole.

HW2.1. Image Sampling

C1.1. Generate a regular and rectangular sampling grid of dimensions N×N, N=400, which starts at (1,1) and spans exactly the range [1,512] for x and y.

C1.2. Given gray scale image im, use interp2() to find image intensity at non-integer coordinates (gridX,gridY). Try different interpolation options.

A1.1. Describe how above images differ and compare the runtime.

  • Image interpolated using cubic can better preserve edges and other fine details, but takes more time.

  • linear interpolation loses some details but runs faster. F=TriScatteredInterp(gridX(:),gridY(:),im_scatter(:),'linear');

C1.3. Given image intensity at non-integer coordinates (gridX,gridY), reconstruct image with size sz from im_scatter using TriScatteredInterp().

C1.4. Generate a horizontally aligned hexagonal grid with the first sampling point located at (1,1). The horizontal and vertical sampling distance is dx and dy. The sampling locations must span horizontally x[1,511] and vertically y[1,512], given

  • First generate points of grid using meshgrid() with given dx and dy

  • Then shift each second row by 0.5dx to obtain the hexagonal grid

C1.5. Interpolate image im using interp2() at hexagonal sampling positions above, then reconstruct the image using TriScatteredInterp().

C1.6. To know the construction quality, compute PSNR, remove a border of 5 pixels to neglect border effect.


A1.2. Which image has better/higher PSNR?

  • Hexagonal sampling has better PSNR as it exhibits a better spectral efficiency.

HW2.2. Zone Plate

C2.1. Generate an N×N image matrix whose entries represent the Euclidean distance in pixel unit to the top-left pixel. Then calculate zone plate image using


Choose k such that the maximum frequency/phase change along last two diagonal pixels is π.


A2.1. Subsample Izp with 2:1 and 4:1, what effects do you observe?

  • Higher frequency components are projected to lower frequency after subsampling.

  • Images after subsampling have low frquency components on three corners, which are similar to the top-left corner.

HW2.3. Image Filters

A3.1. Analyze different filters, plot the filter using surf() and analyze its frequency response using freqz2().

  • Gaussian: low pass

  • Smaller Gaussian filters in space have a wider frequency spectrum,

  • Laplacian: high pass

  • Prewitt/Laplacian of Gaussian: high pass, and reduces sensitivity to noise.

C3.1. Find out how freqz2() generates its plot and implement it manually using fft2().

C3.2. Filter the image with a Gaussian filter (16×16,σ=3) and replicate border processing.

A3.2. ….

HW3.2. Separable Block Transforms

C2.1. Implement function block_splitter with special support for border blocks. Image is like .

C2.2. Write a simple block_func that flips the image block left to right and decreases the brightness by substracting 0.3 from each pixel.

A2.1. Above operation ensures that pixel value is non-negative, so it's nonlinear.

C2.3. Apply given Haar transform matrix to the image.

C2.4. Reorder the transformed image to make it meaningful for human.

C2.5. Implement calculate_statis to calculate the mean and variance for each coefficient over all blocks.

A2.2. What can be observed by analyzing im_var?

  • For natural image, variances of block elements are similar to each other, also similar to the global statistics. This may change once a transformation is applied.

  • After Haar transformation, variances decay considerably from (0,0) coefficient into higher coefficients such as (block_sz,block_sz), which means it packs most energy/variance into low-order coefficients.

C2.6. Implement an image blur using a symmetric impulse response of [0.25,0.5,0.25] to the image horizonally and vertically.

A2.3. Observe the blurred image and discuss the limitation of AXAT along the block borders.

  • Image is blurred, i.e., low-pass filtered so that fine details cannot be observed anymore.

  • The block processor AXAT does not have access to neighbor pixels outside the block, so border effect such as dark frame is observed around blocks, thus not "smooth".

  • In order to mitigate the border effect, one could use overlapping blocks at the cost of performance.

HW4.1. Discrete Cosine Transform

C1.1. Transform images to DCT domain using block processing.

A1.1. What can be observed by analyzing im_mean and im_var of all blocks?

  • The mean of the DC-coefficient is proportional to the mean of the untransformed image in the respective area.

  • The variance of the DC coefficient is the largest, and it decreases with increasing spatial frequency.

  • Variances in horizontal directions exhibit slightly higher amplitude compared to the vertical direction.

  • AC-coefficients exhibit zero mean.

C1.2. Perform compression of a DCT-transformed image by keeping only the first N coefficients of each block, others are set to zero. Return in image in spatial domain resulting from compression.

A1.2. Compare the compressed image and original image, what effects do you observe when using different compression level (N=9,16,64)?

  • If only 9 coefficients are kept, one can observe significant block effects in image regions with much details, such as hair and eyes.

  • If 64 coefficients are kept, one observes only minor block effects.

  • Different image regions require different numbers of coefficients to look well for human.

C1.3. Because of its separable property, DCT transformation can be efficiently extended to 3 dimensions. Implement apply_3ddct to a block along each dimension subsequently to get the 3D transformed DCT image.

C1.4. Apply above 3D DCT to any 3D sequence video. Ignore the padding. Return the DC and high-frequency components of the resulting sequence.

A1.3. Which DCT component of the 3D DCT transformed sequence contains the most energy? What is the best strategy for video compression?

  • The most energy is stored in DC component.

  • The energy decreases with increasing coefficient positions.

  • For sequences with dynamic content but with ares of same color, it is best to preserve more DCT coefficients corresponding to time dimension.

  • For sequences with static content but with high level of spatial detail, it is best to preserve the coefficients corresponding to the spatial domain.

HW4.2. Karhunen-Loeve-Transform

C2.1. Implement a block processor which reshapes a 32×32 block into a vector, and adds 0.1 to the vector elements [529:544,272:32:784] and reshapes the result back to a block. Perform above block processor to an image.

C2.2. Calculate the KLT basis functions for 8×8 blocks using following steps:

  • Reconstructure image into blocks of 8×8 and save results to blocks

  • Calculate the autocorrelation matrix for those blocks and save results to C_im

  • Perform EVD of C_im

  • Sort the resulting eigenvalues with descending order and correspondingly the eigenvectors

  • The resulting eigenvectors constitutes the KLT basis images. Reshape each eigenvector to an 8×8 block and pack them into a 64×64 image in column major order

A2.1. Compare KLT basis images when using different images for training.

  • KLT is the most efficient transform in terms of decorrelation and energy packing.

  • But it depends on image statistics. As shown above, autocorrelation of a specific image is calculated to derive KLT basis images.

  • For different images with different statistics, the resulting KLT basis images would be different.

C2.3. Project an image to the KLT basis above. Compare the back-transformed image with the original one and calculte the PSNR.

C2.4. Calculate the autocorrelation matrix of all blocks from KLT transformed image. (same as C2.2.)

A2.2. How should the autocorrelation matrix of KLT blocks look like?

  • When KLT learned from Lenna image is applied to Lenna image, it allows for perfect decorrelation (except for numerical errors).

  • Thus, the autocorrelation matrix exhibits only elements on its diagonal and zero elsewhere.

  • If this KLT is applied to other images, the diagonal of the autocorrelation matrix still dominates, but there are significant non-diagonal elements.

  • KLT is not suitable as a generic compression method.

A2.3. Calculate the statistics (mean and variance) of above KLT transformed image.

  • KLT of Lenna on its own base exhibits a monotomic decrease of the variance coefficients.

  • For other images, the energy compaction is not optimal, there is a noisy behavior in the variance plot.

A2.4. Recall the variance plot for Haar transform, plot the variance of DCT and KLT, what do you observe?

  • The KLT of Lenna on its own base has a monotonic decrease and higher energy compaction efficiency than DCT and Haar.

HW5.1. Autoregressive Models

C1.1. Calculate the autocovariance of a natural image I, using neighboring 11×11 for each pixel as follows:

(10)CII(k,l)=E[I(x,y)I(x+k,y+l)] with 5k,l5
  • Don't forget to substract the mean value of the image.

  • Take into account a 5 pixel border.

  • Extract the variance σx2 and the covariance for 1 pixel horizontal and vertical displacement.

A1.1. Discuss above values and explain the structure of the autocovariance matrix.

  • The center of autocovariance matrix is the average energy of pixels.

  • Since we measure all pixels, the autocovariance matrix should be point symmetrix around its center point.

  • (With increasing distance between pixels, the autocovariance drops almost linearly. A slightly higher covariance is observed in vertical direction)

C1.2. A separable AR(1) process can be used to generate a 2D signal, and has autocovariance

(11)Cxx(k,l)=σx2ρh|k|ρv|l|,  σx2=σz2(1ρh2)(1ρv2)

The model is implemented by a casual recursive filter according to


where nz is a zero-mean Gaussian noise with standard deviation σz.

Generate a 512×512 image im_out using above process. The process will need some pixels to start up, so add a temporary border of 100 pixels filled with zeros to the top and to the left. Notice that indexes of matrix and indexes of image are reversed, so ρh can be obtained by Cxx,10/Cxx,00=CII,67/CII,66=cov_h/var.

Compute the autocovariance of above generated image, compare it to the autocovariance obtained by the equation in C1.2.

  • The separable AR(1) process produces an autocovariance which is piecewise planar in four quadrants.

  • This results in a modelling error which increases from the center towards the border along k=0 or l=0.

  • There is a constant offset between the measured autocovairance and expected autocovariance.

  • This error can be minimized using a larger startup border.

A1.3. Repeat C1.1. with another image. Explain why this autocovariance matrix looks considerably different. How would an image generated by an AR(1) process with parameters measured from this image look like?

  • The resulting autocovariance matrix exhibits a significantly higher correlation in vertical direction, due to the dominant vartical structures in the image.

  • An AR(1) image created with corresponding parameters will exhibit vertically dominating structures.

HW5.2. Pyramid Representations

C2.1. A 3×3 approximation of Gaussian lowpass filter is given


Compute an appropriate 3×3 reconstruction filter G for 1:2 upsampling using linear interpolation.

  • First, zeros are filled between pixels

  • The reconstructed matrix should be

  • We begin from the block at top-left corner, this determines two entries of the filter

  • Continuing with this technique, we get the filter


C2.2. Generate a Laplacian pyramid of given image im using above filter with 5 levels. Level 1 holds the finest detail at full resolution 512×512, while level 5 corresponds to the lowest level of detail 32×32. The scaling factor is 2:1 horizontally and vertically.

C2.3. Obtain the 2D filter kernel for analysis and synthesis steps given 1D lowpass filter for 9-7 Cohen-Daubechies-Feauveau wavelet.

C2.4. Calculate the variances of each level of the pyramid. To neglect border effects, remove a border of 4 pixels.

A2.1. Explain the difference you observe for var_pyr when using 9-7 Cohen-Daubechies-Feauveau wavelet and Gaussian filters.

  • In both cases, the variance/energy of levels 1-4 are lower than the last level.

  • With CDF filters, a little more energy is packed into level 5.

C2.5. Resonstruct image from lap_pyr. Calculate the PSNR, neglect a border of 4 pixels.


您的电子邮箱地址不会被公开。 必填项已用*标注