讲解 EIE4435 Image and Audio Processing Semester 1, 2021/22调试数据库编程

EIE4435

Image and Audio Processing

Semester 1, 2021/22

Q1. (a) Given that the Hue-Saturation subspace (a cross section of the HSI space) shown in Fig. Q1 is a perfect circle and colors A, B and C can be represented as the three points shown in the subspace, answer the following questions.

Sort the colors according to their luminance in descending order. (2 marks)

(ii) Which color has the largest saturation coefficent? (1 mark)

(b) What color would a green paper appear when exposed to cyan light? (2 marks)

(i) Change it to the YUV 4:2:0 format by using the average technique. (2 marks)

(ii) Give one advantage of using the YUV 4:2:0 color format in digital image. (2 marks)

(d) Compute the median filtering output with 3×3 window of the following Y plane:

Hence, explain the usage of the median filter. (4 marks)

Q2. Fig. Q2 shows the results of applying Discrete Cosine Transform. (DCT) and quantization to an 8x8 image block.

(a) Why is the JPEG format generally better than GIF for photographic images? (3 marks)

(b) Explain the physical meaning of the DC coefficient and the AC coefficients in the DCT block. (3 marks)

(d) Using the DC and AC coding tables as shown in Table Q2(a) and Table Q2(b) respectively, code the bitstream of this block. Assume that the DC coefficient of the previous block is 34. (12 marks)

Determine the compression ratio for the block. (2 marks)

Q3. Suppose that a 3-bit image of size 64 x 64 pixels has the intensity distribution in Table Q3, where the intensity level r (where k=0, 1,..., 7) are integers in the range of [0, 7]. In Table Q3, n denotes the number of pixels that have intensity rk.

(a) Plot the histogram of the image. Judge if the contrast of the image is poor. Give your reason based on the histogram. (3 marks)

By using Table Q3, derive the discrete form. of the global histogram equalization transformation function. (10 marks)

(d) What is the advantage of the local histogram equalization over the global histogram equalization? (4 marks)

Q4. (a) Sketch the medial axis of the shape shown in Fig. Q4(a). You should indicate which portions of the medial axis are straight lines and which are not.

(Use the attached diagram provided in the appendix to answer this question.) (5 marks)

(b) Fig. Q4(b) shows an image with a white object, and the 4-directional code definition. The contour of the object is represented by the red line in Fig. Q4(b).

(i) Represent the contour of the white object in a clockwise direction with a 4direction chain code. Use the rightmost pixel in the first row of the object as the starting point. (2 marks)

(ii) What are the purposes of the first difference code and the shape number? (4 marks)

(iii) Hence, detemine the first difference code and the shape number of the result obtained in (b)(i). (4 marks)

(iv) Derive the occurrence probabilities of individual index values in the shape number. (2 marks)

(v) The shape number of the object contour is now further encoded with Huffman coding. Derive a codeword table for its implementation. (4 marks)

Q5. (a) The data sampling rate of a Compact Disc-Digital Audio (CD-DA) is 44.1 kHz.

(i) Explain why this sampling rate is sufficient for audio signals. (2 marks)

(ii) Why is dither needed in analog to digital audio conversion? Explain its working principle. (4 marks)

(iii) If a piece of stereo music is sampled at 44.1 kHz with 16 bits per sample for 30 seconds, what is the total file size of the piece of music in bytes? (2 marks)

(b) A perceptual audio codec is used to compress an audio signal. From the result of the spectrum analysis, the codec groups every 8 barks into a subband and then allocates bits to different subbands based on a psychoacoustic model. All samples in the same subband are quantized with the same quantizer, and the bit resolution of which is allocated by the codec.

Fig. Q5(a) shows the frequency spectrum of a windowed segment of audio signal. The psychoacoustic model shown in Fig. Q5(b) is used in the audio codec to derive the masking threshold for the audio segment.

(i) Based on the given psychoacoustic model, show the masking threshold in the attached diagram provided in the appendix. (4 marks)

(ii) Determine the Signal-to-Mask levels of each subband. (4 marks)

(iii) Suppose that allocating one additional bit to a subband results in a 6dB drop of the noise floor in that subband. Allocate an appropriate number of bits to all subbands. (2 marks)

(iv) From perceptual quality point of view, is it advantageous to increase the number of subands from 4 to 8? Explain why? (2 marks)

(v) Suppose you are now asked to design an audio codec for the elderly. Suggest two ways to adjust the hearing threshold shown in Fig. Q5(a) such that a higher compression ratio can be achieved without lowering the perceptual audio quality. (2 marks)