辅导COMP5111M01辅导数据库、数据库编程辅导

Module Code: COMP5111M01

Module Title: Big Data Systems c© UNIVERSITY OF LEEDS

School of Computing Semester 2 2018/2019

Calculator instructions:

- You are not allowed to use any calculator in this examination.

Dictionary instructions:

- A basic English dictionary is available to use: raise your hand and ask an invigilator, if you

need it.

Examination Information

- There are 4 pages to this examination.

- There are 2 hours to complete the examination.

- Answer all 3 questions.

- The number in brackets [ ] indicates the marks available for each question or part

question.

- You are reminded of the need for clear presentation in your answers.

- The total number of marks for this examination paper is 60.

- You are allowed to use annotated materials.

Page 1 of 4 Turn the page over

Module Code: COMP5111M01

Question 1

(a) Facebook is an example of a massively connected social media platform, generating huge

volumes of data. Give an example scenario where Facebook may batch process some data,

and an example scenario where Facebook may need to process data in real-time.

[2 marks]

(b) There are several big data platforms available with different characteristics and choosing

the right platform requires an in-depth knowledge about the capabilities of these platforms.

You need to decide the right platform to choose from and therefore you investigate what

your application’s needs are. Give two fundamental issues that you will consider before

making the right decision.

[2 marks]

10-fold growth in world data by 2025. Give two reasons - with real-world examples - why

this trend is occurring.

[2 marks]

(d) State the similarities and differences between traditional computing clusters and the com-

puting clouds launched in recent years, considering the technical and economic aspects as

listed below:

• Hardware, software, and technical support.

• Resource allocation and provisioning methods.

• Infrastructure management and protection.

• Support of utility computing services.

[8 marks]

(e) You are designing an application that requires both data acquisition and pre-processing of

raw data for event filtering. Moreover you have the freedom to describe the underlying

hardware to use to perform the pre-processing. Which hardware architecture would you

choose for such an application? Justify your answer.

[3 marks]

(f) How does specialist hardware deployment and the use of a technology like Apache Storm

compare to the more traditional MapReduce solution?

[3 marks]

[Question 1 Total: 20 marks]

Page 2 of 4 Turn the page over

Module Code: COMP5111M01

Question 2

(a) Self-driving vehicles are a technology that is rapidly moving towards mass-market produc-

tion. Give examples of how a self-driving vehicle relates to the 5 Vs of Big Data (Volume,

Velocity, Variety, Veracity, Value).

[5 marks]

(b) The Hadoop Distributed File System (HDFS) is a popular storage mechanism for large

quantities of data. Explain how HDFS ensures the fault-tolerance of data stored on its

data nodes.

[2 marks]

containers and Virtual Machines using three criteria of your choice.

[3 marks]

(d) The original Hadoop’s MapReduce is used to process large sets of data on a large number of

collective servers. However, it often performs poorly while involving too many servers, e.g.

running 40K concurrent tasks over 4K servers. Clearly explain why such poor performance.

Outline a possible mitigation strategy. .

[5 marks]

(e) Apache Storm is an example of a Continuous Operator Model (COM) system, used to

process streaming data. Explain how Apache Storm guarantees that all data emitted by its

spouts will be processed.

[3 marks]

(f) Discuss two disadvantages of using Apache Storm to process streamed data.

[2 marks]

[Question 2 Total: 20 marks]

Page 3 of 4 Turn the page over

Module Code: COMP5111M01

Question 3

(a) Apache Spark is one of the most popular Big Data Systems in today’s industry. Discuss

two advantages that Spark offers over the more traditional Apache Hadoop framework, and

explain why these advantages are significant. Explain why Hadoop is still useful, and give

an example of how Hadoop could still be used.

[5 marks]

(b) Data deduplication is a specialized data compression technique for eliminating duplicate

copies of repeating data. Explain the concepts of both source-based and target-based

deduplication. Discuss an advantage and a disadvantage to each approach in the context

of Cloud Computing.

[5 marks]

database management model. Discuss two advantages and two disadvantages of using

NoSQL in the context of a big data system. Give an example scenario where use of a

NoSQL database would be appropriate.

[5 marks]

(d) Neo4j is an example of a NoSQL Graph database. Use an example to explain what type of

application a Graph database is suitable for. Discuss two advantages and two disadvantages

of graph databases.

[5 marks]

[Question 3 Total: 20 marks]

[Grand Total: 60 marks]

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

讲解 artificial intelligence... 2024-04-18
辅导 kxo206 database managem... 2024-04-18
辅导 comp9417 project: multi... 2024-04-18
辅导 bl5611 drugdiscovery202... 2024-04-18
辅导 comp5313/comp4313—larg... 2024-04-18
辅导 aem 4500 / econ 3860 / ... 2024-04-18
辅导 math 1151, spring 2024 ... 2024-04-18
辅导 7ssmm712 – topics in a... 2024-04-18
辅导 elec eng 3088/7088 comp... 2024-04-18
辅导 pols0010 data analysis ... 2024-04-18
辅导 econ 602: course projec... 2024-04-18
讲解 economics 253 - spring ... 2024-04-18
辅导 artd 6151: sustainabili... 2024-04-18
讲解 ifn647 text, web and me... 2024-04-18
讲解 cse340 project 2: pars... 2024-04-18
讲解 mane - 4500 modeling an... 2024-04-18
讲解 civil 750 - timber engi... 2024-04-18
辅导 qbus6860 sustainable en... 2024-04-18
辅导 25721 investment manage... 2024-04-18
讲解 eeee4123 hdl for progra... 2024-04-18

热点标签

comp5313/comp4313—large

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！