EIE4435
Image and Audio Processing
Semester 1, 2022/23
Q1. (a) A color image has a frame. resolution of 640×480 with a 4:2:0 color sub-sampling format and 8 bits for each component.
(i) Draw a diagram to illustrate the 4:2:0 color sub-sampling format. (2 marks)
(ii) Explain why the color components can be sub-sampled. (2 marks)
(iii) Calculate the size for this image in bytes. (2 marks)
(b) What is the significance of K in the CMYK color model? (2 marks)
(c) What color would a magneta paper appear when exposed to yellow light? Why? (2 marks)
Q2. (a) Derive the histogram of the image shown in Figure Q2(a). The bit-depth of the image is 4. Judge if the contrast of the image is poor. Give your reason based on the histogram. (4 marks)
(b) Fig. Q2(b) shows an image corrupted with salt and pepper noise. You are requested to design the filtering technique to locate the pixels that are corrupted with pepper noise. Suggest a method to achieve it. You should provide the filter you used and the step details. (6 marks)
Q3. (a) The 4 8×8 images shown in Fig. O3(a) are encoded with JPEG coding. Based on their expected DCT coefficients, answer the following questions:
(i) Which image has the smallest DC coefficient? (1 mark)
(ii) Which image has large high frequency AC coefficients? (1 mark)
(iii) Which image(s) has/have no non-zero AC coefficient? (1 mark)
(b) Suppose a 256 gray-level image is divided into 8x8 blocks and encoded using a JPEG encoder. The 64 quantized Discrete Cosine Transform. (DCT) coefficients of the three 8×8 blocks in the image are given in Fig. Q3(b). Each block consists of a DC coefficient (the top-left corner) and 63 AC coefficients.
(i) Draw a block diagram to show the major steps in DCT-based coding. (4 marks)
(ii) Describe the pattern of pixel intensities in Block 3. (1 mark)
(iii) Using the DC Coding Table and the AC Coding Table in Table Q3(c) and Table Q3(d) respectively, find the output bitstreams for Block 1, Block 2 and Block 3. (11 marks)
(iv) Determine the compression ratio for Block 1, Block 2 and Block 3. Comment on your results. (3 marks)
Q4. Fig. Q4 shows the intensity map of an image.
(a) Segment the image with region growing by pixel aggregation. The two marked pixels are seeds. A region grows by including its eight-connected neighbors (i, j) in the region when the following criterion is satisfied.
(i) g(i, j) ≤ 5 and s ≤ 5 or
(ii) g(i, j) > 5 and s > 5
where s is the intensity value of the seed of a region and g(i,j) is the intensity value of an eight-connected neighbor of the region. (5 marks)
(b) State three problems of region growing by pixel aggregation. (3 marks)
(c) Split and merge segmentation is another image processing technique used to segment an image. Briefly discuss the key steps in this approach. (6 marks)
Q5. (a) Sketch the medial axis of the shape shown in Fig. Q5(a). You should indicate which portions of the medial axis are straight lines and which are not.
(Use the attached diagram provided in the appendix to answer this question.) (5 marks)
(b) The 8-directional chain code of the contour of an obiect is 076666553321212. Fig. Q5(b) shows the 8-directional code definition.
(i) Sketch the contour of the object. (2 marks)
(ii) Normalize the 8-directional chain code with respect to the orientation of the object. (2 marks)
(iii) Now the 8-directional chain code of an input object is 067676553322221. Determine whether it is the same object with different orientation as object in Q5(b)(i). Show your steps clearly. (3 marks)
Q6. (a) What is the Nyquist Sampling Theorem, and why is it important for audio processing? (3 marks)
(b) What is "aliasing" and what causes it? (3 marks)
(c) The frequency range of human hearing is usually given as 20 Hz to 20,000 Hz, meaning that we can hear sounds in that range. With the aid of Q6(a) and Q6(b), why do we always choose 44.1 kHz as recording sampling rate? (5 marks)
(d) If a piece of stereo music is sampled at 44.1 kHz with 12 bits per sample for 3 minutes, what is the total file size of the piece of music in bits? (2 marks)
Q7. A perceptual audio codec is used to compress an audio signal. From the result of the spectrum analysis, the codec groups every 8 barks into a subband and then allocates bits to different subbands based on a psychoacoustic model. All samples in the same subband are quantized with the same quantizer, and the bit resolution of which is allocated by the codec.
Fig. Q7(a) shows the frequency spectrum of a windowed segment of audio signal. The psychoacoustic model shown in Fig. Q7(b) is used in the audio codec to derive the masking threshold for the audio segment.
(a) Locate the potential maskers (2 marks)
(b) Based on the given psychoacoustic model, show the masking threshold in the attached diagram provided in the appendix. (6 marks)
(c) Determine the Signal-to-Mask levels of each subband. (3 marks)
(d) Suppose that allocating one additional bit to a subband results in a 6dB drop of the noise floor in that subband. Allocate an appropriate number of bits to all subbands. (3 marks)
(e) Hence, briefly explain how the results in Q7(b) and Q7(d) can be used in perceptual coding. (5 marks)