ASSIGNMENT 2 INSTRUCTIONS
CS39A0190 Data analytics basics
”DATASETS”-file has three sheets. First sheet ”DATASET1” is its own task and sheets two
and three, ”DATASET2_1” and ”DATASET2_2” are another one.
The datasets needs to be cleaned before they can be analysed. You can turn the data into
table or keep as a range, whichever you find easier to work with.
DATASET1 basic info
The data is process data where Events go through a process from A to G.
Process step colums A to G have dates as data. The date depicts when the process step began. Process
step G means completion of the whole process.
DATASET2 basic info
The data is sales data for product range. SKU (stock keeping unit) is assigned to each unique product.
Agency depicts a retailer. Volume is the amount sold during the time and average max temp describes
the temperature at the retailer for the sales period.
DATASETS
Cleaning the data
Remove empty rows.
Remove incomplete Event records. Every process step must have a date.
Remove duplicate Event records. Each Event ID is unique.
If Excel does not recognize the process step data as dates, convert the cell contents into dates.
Event IDs and Event types are numbers by format.
Analysing the data
Calculate the duration (in days) between each process step and the total duration for all Events.
Calculate the average duration between each process step and the average total process duration for all
Event types.
Example for a single Event:
DATASET1 – 3 POINTS
First, use Text to Columns to get the comma separated data into separate columns.
Cleaning the data
Both DATASET2_1 and DATASET2_2 sheets have a column that uses dot as a decimal separator. If your
Excel has comma as a decimal separator, convert the dots to commas.
Check for duplicates, missing data, empty rows... and handle accordingly if found.
Analysing the data
Add the Avg_Max_Temp data from DATASET2_2 to DATASET_1. See example below.
Calculate monthly total sales (all SKUs) for each Agency.
Calculate annual total sales for each SKU.
DATASET2 – 3 POINTS
Data is from Eurostat, population data. Actual national population data from 2021 and
projections for 2022-2100.
Projections has multiple scenarios in separate sheets.
Your task is to combine the data for EU countries + Iceland, Norway and Switzerland (30
countries in total).
Open a blank Excel spreadsheet and connect to both of the Excel files to get data.
Clean the data, merge the datasets, and clean the merged data if necessary.
You will return results only for the 30 countries.
As a result, you will have for each country a 2021 actual population (total, males, and
females) and projections for 2022-2100 (total, males, and females).
From projections, use two scenarios. Baseline and a free choice for the second.
POWER QUERY – 6 POINTS
Find two datasets (from anywhere) that you can connect to and combine in Power Query.
Use two different sources for the data.
E.g., one from CSV and one from Web
Clean the data using Power Query and combine them into one dataset.
FIND AND MERGE – 3 POINTS
Return three Excel-files.
DATASETS
POWER QUERY
FIND AND MERGE
All used functions and Queries must remain in the submitted files. Being able to see the
data transformation process in the files is more important than the end result.
SUBMISSION