CS6114 Assignment
Assignment 03
Video Coding
V ideo coding is the process of preparing digital video content for storage or transmission, such as in a data file or bitstream. The libx265 H.265 codec (MPEG-4 High Efficiency Video Coding) was designed to achieve
from 25% to 50% better data compression at the same level of video quality,
or substantially improved video quality at the same bit rate than the previous
libx264 H.264 (MPEG-4 Advanced Video Coding) standard for a variety of
different applications. In turn H.264 was designed to achieve good video
quality at substantially lower bit rates than previous MPEG standards.
One application area where H.264 and H.265 are currently in use is the delivery
of streamed content over data networks using HTTP. In this application, the
H.264/H.265 standard is used in conjunction with the MPEG-DASH standard to
describe (MPEG-DASH) and encode media content for delivery using standard
web infrastructure (web server/CDN/ HTTPS/TCP/IP).
MPEG-DASH partitions content into a sequence of short, usually fixed duration
segments. These segments are encoded at a variety of different bitrates, giving
1
https://www.vcodex.com/hevc-an-introduction-to-high-efficiency-coding/
Slice
CS6114 Assignment
alternative versions of the content. These alternative segments are time-
aligned so that while the content is being played back by an MPEG-DASH
client, the client can use a bitrate adaptation (ABR) algorithm to select the next
segment to be presented that has the highest bitrate (quality) that can be
downloaded in time for playback without causing stalls or buffering.
Adaptive streaming requires that each segment of these alternative versions of
the content do not exceed (or significantly fall short of) the nominal bitrate for
that version of the content. To achieve this objective the encoding must
employ constrained bitrate encoding techniques.
Rate Control
Rate control is the process used by the encoder in deciding how to allocate
bits to encode each picture. The goal of (lossy) video coding is to reduce the
bitrate while retaining as much quality as possible. Rate control is a crucial
step in determining that tradeoff between size and quality.
CBR and VBR encoding sets a target data rate and a bitrate control technique
is applied by the encoding application to achieve the target bitrate. It can be
difficult to choose an appropriate data rate for constrained connections and
the quality of experience (QoE) for viewers can be impacted if the range of
VBR is too high or in the case of CBR, if the nature of the content varies
greatly. Often constrained VBR between 110%-150% is used, however this
assumes a target bitrate to achieve an acceptable level of quality is known
before the content is encoded.
Not all video content is equally compressible. Low motion and smooth
gradients compress well (few bits for high perceived quality) , whereas high
motion and fine spatial detail are less compressible (more bits to preserve
quality). Often it is easier to specify a target quality and let the encoding
application vary the data rate to achieve this target. However, the data rate
required to achieve the target quality is unknown in advance.
Constant Rate Factor (CRF) encoding specifies a quality level and the
encoding application adjusts the data rate to achieve the target quality. The
result is content with a fixed quality level, but the data rate is unknown in
advance. If quality is the objective this is not a concern, but if the data rate
varies significantly over the duration of the content, it may have implications
for the deliverability.
Capped CRF applies the data rate necessary to achieve a target quality,
together with a maximum data rate to ensure deliverability.
2
CS6114 Assignment
In this assignment, you will encode source material using both the libx264
(H.264) and libx265 (H.265) codecs and compare the resulting bitstreams in
terms of coding efficiency and quality. The expected use of the encoded
material is delivery via MPEG-DASH. However, you do not have to create an
MPD file or fragment the representations.
Encoding
In ffmpeg the the Video Buffering Verifier (VBV) enforces that the bitrate is
constrained to a maximum bitrate. This is essential for content that will be
streamed, as it ensures that each segment will not exceed (or substantially fall
short of) the nominal bitrate for that version of the content. VBV can be used
both with 2-pass VBR (use it in both passes), or with CRF encoding (capped
CRF). For example using libx264 (H.264)
ffmpeg -i source.mov
-c:v libx264 -crf 23 -maxrate 500K -bufsize 256K
source-at—0.5M.mp4
Using the above parameters will probably not result in full use of the available
bitrate (i.e the resulting bitrate will not be 500kbps). It can be advantageous to
specify a target average bitrate and allow the maximum rate to exceed this by
a small amount.
Applying VBV to CRF encoding, requires determining the CRF value that on
average, results in the maximum bitrate, but does not exceed it. If the encode
always exceeds the maximum bitrate, the CRF is too low. However, if the
bitrate does not always hit the maximum, extra quality may be gained by
lowering the CRF value. A value of +6 halves the bitrate (in general, but this is
content dependent). The buffer size (bufsize) parameter determines how
strict ffmpeg is at checking the variability of the bitrate. For streaming you will
want to ensure that the bitrate for each segment conforms closely to the
nominal bitrate.
3
CS6114 Assignment
Steps
The source content is a Quicktime movie containing a video stream. The video
stream is encoded using libx264 (H.264) using intra-frame compression only
(i.e. all I-pictures). The content is a mixture of different styles of content.
Use ffmpeg and the libx264 codec to encode two H.264 video
bitstream representations, at nominal bitrates of 0.5Mbps and 2Mbps
using GOP lengths of 50 and 100.
Use ffmpeg and the libx265 codec to encode two H.265 video
bitstream representations, at nominal bitrates of 0.5Mbps and 2Mbps
using GOP lengths of 50 and 100.
You will generate eight encodings
H.264 0.5Mbps using GOP length of 50
H.264 0.5Mbps using a GOP length of 100
H.264 2Mbps using GOP length of 50
H.264 2Mbps using a GOP length of 100
H.265 0.5Mbps using GOP length of 50
H.265 0.5Mbps using a GOP length of 100
H.265 2Mbps using GOP length of 50
H.265 2Mbps using a GOP length of 100
You must investigate
What CRF value and constrained bitrate parameters are needed to
achieve these target nominal bitrates for each combination of codec
and GOP length?
Are these values the same for H.264 and H.265?
Which codec is better able to achieve the nominal bitrate?
The difference in encoding times.
4
CS6114 Assignment
Use the supplied Jupyter notebook to investigate
The bitrate of each GOP in each encoding. Does each conform to the
target nominal bitrate? Which encoding is the most consistent? Explain
your findings.
The quality of the encodings. Are there differences in quality between
the encodings and within the encodings? Are these differences
practical (i.e. noticeable)?
You may choose to augment the Jupyter notebook to generate additional
visualisations or analyses of the results.
Questions
Using these results write a report (maximum 2000 words) addressing the
following questions
What differences are there between H.264 and H.265 encoding?
Is the quality of the content consistent for the different codecs and
encodings?
Does changing the GOP length affect the quality of the coded content?
What GOP length would you recommend to give the best balance
between quality (coding efficiency) and adaptability (shorter GOP
lengths) for each of these codecs?
Which metric (PSNR or SSIM) is a better measure of quality? Do you
notice any visual difference?
Is there a relationship between picture type, size and quality? Does this
relationship hold across different bitrates and codecs?
Be sure to include any other observations that you noted and the results and
rationale for any additional analysis you performed.
5
CS6114 Assignment
Deliverables
An archive (zip file) containing the encoded bitstreams, any data files (e.g. CSV
files), Jupyter notebooks and your report. Do not include the source material
provided to you.
Resources
The following resources are available on Canvas or externally
A Jupyter notebook that compares videos and produces a CSV file
containing a measure (the PSNR and SSIM) of the frame-by-frame
difference between the reference movie (the original) and an encoded
video
This Jupyter notebook also demonstrates how frame-level information
from ffprobe can be combined with the quality measures
Encoding for MPEG-DASH tutorials https://blog.streamroot.io/encode-
multi-bitrate-videos-mpeg-dash-mse-based-media-players/