首页 >
> 详细

CMPT 365 Project Spring 2020

Objective: Apply course materal to a more in-depth project. The Project can be in teams

of 2 (but as well can be individual). Each team gets a single score, and both members of the

team receive identical marks.

For this project, the programming language and platform are open; the objective is to

learn about the problem, not to make a lovely GUI (at least, not necessarily — do so, if it

aids you). It’s also open-ended in that there are many approaches to the problem that stem

from the discussion below: e.g., the specification discusses video wipes, and we don’t look at

video dissolves or other types of gradual transitions.

1. Project

1.1. STI by Copying Pixels

We are interested in finding and characterizing video transitions, specifically cuts and wipes.

One approach to this problem is to construct a “spatio-temporal” image [STI

bb] =an image which contains video content for each frame along the ordinate axis, versus time

along the abscissa. A simple version of such a construction consists of copying a column (or

row), or weighted average of a few, directly into the STI.

E.g., suppose that we have a video containing a wipe, as in Fig. 1 below:

Figure 1: Frames from a video that includes a horizontal wipe.

Now suppose we copy the center column from each frame into a new, STI, image: the STI

has the same number of rows r as does the original video frame, but a number of columns

equal to the number of frames f in the video. The result is shown below in Fig. 2(a). If,

instead, we copy over the center row from each frame, turned sideways, then the STI has

the same number of rows as c, the number of columns in the original video, and a number

of columns f equal to the number of frames. Such an STI made up of rows is shown in

Fig. 2(b).

For example, this video happens to be 120 × 160 pixels per frame, and the clip has 100

frames. So the STI made from the center columns is of size 120 × 100. The STI made from

the center rows is 160×100 pixels. Clearly, for the former a transition is evident at the time

when the video wipe reaches the center column of the frame. Since the wipe is upright, the

edge is also upright. For the latter STI, made from rows, the wipe shows up as a diagonal

edge along the time direction.

In general, one can make a taxonomy of such pairs of STIs based on whether the wipe

is a vertical wipe, a horizontal wipe, an iris opening, etc. As well, one can use a diagonal

across the frame, rather than a column or row, to make a third kind of STI.

1

(a) (b)

Figure 2: (a): STI consisting of center column from each frame; (b): And from center row.

1.2. STI by Histogram Differences

One problem with the method outlined above is how to find that nice edge our eye sees so

clearly in Fig. 2. For one thing, it’s often the case that the edge is not so visible, especially

in noisy videos such as from broadcast TV. Also, if we rely on the pixels themselves, small

movements can muddy the STI. For example, Fig. 3(b) shows an abrupt cut allright, and

then the flash of a flash-camera, but the diagonal edge indicating a wipe that follows is hard

to see.

(a) (b)

Figure 3: (a): Frame during a wipe (broadcast video); (b): STI as in Fig. 2(b).

One approach that is taken to characterize images is to use a histogram of colour, rather

than just the raw pixel data. Histograms are fairly insensitive to troublesome problems such

as movement and occlusion (i.e., losing sight of an object over time).

It turns out that we also know that if we replace the colour, RGB, by the chromaticity

(see Chapter 4)

{r, g} = {R, G}/(R + G + B)

then the image is much more characteristic of the surfaces being imaged, rather than of the

light illuminating those surfaces. So, instead of colour, {R, G, B}, let’s use chromaticity,

{r, g}. [But watch out for black, i.e., {R, G, B}={0, 0, 0}, pixels.]

The nice thing about chromaticity is that it’s 2D. So our histogram is a 2D array, with

r along one axis and g along another. The chromaticity is necessarily in the interval [0, 1].

But how many bins along each axis should we use? Applying (a cheap version of) a rule of

thumb called Sturges’s Rule, the number of bins N = 1 + log2(n), where n=size of data, so

a rough idea is to use n = number of rows, so e.g. for frames of size 120 × 160 we would

use N = floor(1 + log2

(120)) = 7 bins along each of the r and g directions. That makes our

2

histogram, H, a small, 7 × 7 array of integer values. If we normalize to make the sum of H

equal unity, then we have an array of floats.

Now, suppose we wish to compare one column in a frame with the same column in the

previous frame. Then we should compare the histogram Ht at time t with the histogram

Ht−1 for the previous frame. Here, we make a 2D histogram Ht

just for the particular column

we’re working on, and the histogram Ht−1 for a histogram for the same column, but for the

previous frame. Let’s get a measure of histogram difference. One measure would be the

Euclidean distance between all the entries in the histogram: the square root of the sum-ofsquares

of the differences between entries. But a better value turns out to be the so-called

“histogram intersection”:

min [Ht(i, j), Ht−1(i, j)]

This formula assumes that one has first divided each histogram by its sum, so that each adds

up to unity, e.g.,

j Ht(i, j) = 1. How I works is that, for each array entry, you add up

the smaller of the values at that array location, over the two histograms being compared.

Since we compared histograms for a single column, and obtained a scalar I, each column

in the whole frame can give us its own scalar, at the current time frame. So an STI made

out of I values could have a number of rows equal to the number of columns in the video

frame, and number of columns equal to the number of frames. E.g., in Fig. 4(a), we compare

histograms for columns, so the STI is 160 × 100, for our example video. If we do the same

for histograms of rows, then our STI is of size 120 × 100, as in Fig. 4(b).

(a) (b) (c)

Figure 4: (a): STI made from histogram intersections of each column in each frame with the

previous time instant; (b): And from rows. (c): A thresholded version of (a).

How the histogram intersection works is that, if one histogram is similar to the other,

then I ≃ 1. But if the histograms are different, then I ≃ 0. So if we are comparing a video

frame column to the same column at a previous instant, then we expect to have I about 1,

and that’s what we see except at a wipe, where the current column is from one video but

the previous-time column is from another video.

The advantage of doing all this is that the output is much cleaner. If we wish to find

a diagonal edge, then Fig. 4(a) provides a fairly clean sets of zero values down a straight

3

diagonal, in a background of 1s. A “thresholded” version is shown in Fig. 4(c), where we

display the boolean value (I > τ ), with τ some high enough value (say, 0.7). Clearly,

Fig. 4(c) provides a nice feature to search for in a video, if we wish to automatically find

wipes.

2. Your job

Implement these ideas and test on videos. If you’re really enthusiastic, try your code on

video dissolves, as well. For faster processing, and also to remove some noise, the size of

input frames can first be reduced to 32 × 32.

If you’re truly over the top, then you’ll likely be saying to yourself that histogram intersection is

not the best one could do, in that to be compared, pixels have to be at roughly the same colour.

But red is fairly like orange, so a reddish histogram bin should not give a zero intersection with

an orange histogram bin.

Therefore, IBM has developed a system that uses a different definition of histogram-difference.

Suppose we reshape each histogram into a long vector (e.g., a 49-element vector). Denote by

z the difference between histograms: z = Ht − Ht−1. Now the Euclidean distance between

histograms is the square root of z

T z , where T means transpose.

But instead, let’s interpose a matrix A , defining a new squared distance D2:D2 = zT A z

where A encapsulates nearness of colours. A form suggested is

Aij = (1 − dij/dmax)

with dij defined as a three-dimensional color difference and dmax = max(dij). E.g., we could

use Euclidean distance between chromaticities {r, g}. Then dmax =√2.

3. What to submit

1. As a zipfile, submit code including documentation, along with a short (3-4 pages)

written report summarizing what you have done for your project: What worked? What

didn’t? How would you like to extend your code? [And, what language did you use?

Problems with libraries?]

Not required, but as a different submission component it would be nice to include a

video URL as well, if possible.

Your project will be marked on effort and usefulness of the resulting product.

2. How hard was this project?

State briefly how would you improve this project?

Any ideas for a different project at the same difficulty level? Would you enjoy more

difficulty? Less?

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Comp 250 Assignment 3 2020-05-24
- Macm 316 – Computing Assignment 7 2020-05-24
- Sta457 Assignment 2020-05-24
- Homework 10 2020-05-24
- Lab 2 Msc: Time Series Prediction With... 2020-05-24
- Comp2011作业代做、Data Analysis作业代写、C++编程语言 2020-05-24
- 代做compsys201作业、Python，Java，C/C++编程语言作业 2020-05-24
- Program留学生作业代做、Python编程设计作业调试、Data作业代写 2020-05-24
- 代写 Practical 3 Covid-19程序作业，代写... 2020-05-23
- 代写comp3059作业、代做programming作业、Java语言作业代 2020-05-23
- Coit12206作业代写、Program课程作业代做、Java、Pytho 2020-05-23
- Data2001作业代做、Data Science作业代做、Sql语言作业代 2020-05-23
- 代写comp2017作业、代写c/C++语言作业、代写data作业、C/C+ 2020-05-23
- Data留学生作业代做、Python编程设计作业调试、代写program课程 2020-05-22
- Mkan1-Uc 5103作业代写、代做analytics作业、Java，P 2020-05-22
- Pols 512作业代写、R编程设计作业调试、Data留学生作业代做、代写r 2020-05-21
- Econ 6070作业代做、Data课程作业代写、代做java，Python 2020-05-21
- Pstat 170 2020-05-20
- Comp 250 Assignment 3 2020-05-20
- Data留学生作业代做、代写r程序语言作业、代做r实验作业、代写progra 2020-05-20