Assignment 3b: Group Report and final code
Assignment Overview
Assignment 3 consists of two main deliverables: runnable code and the report.
1) Part 3a: Individual Code
weight: 5% (of course total)
due: end of (week 11)
mode: individual
2) Part 3b: Group Report
weight: 25% (of course total)
due: end of (week 12)
mode: group
To successfully accomplish this task, you need to demonstrate good coding and analytical skills as well as professional communication and writing skills.
You will work in groups of three. Equal contribution and engagement of each group member is expected.
Business Scenario
Your work on this task is based on the following scenario:
You are working in a team of developers for a grocery store. The store manager noticed that some items are often bought together. The manager wants to find out exactly what items customers buy most often together in one basket (we call them itemsets). This information will be used to place itemsets close together, so that customers can find them quickly, which in turn may increase sales.
After analysing the problem, your team has discovered that once frequent itemsets are identified, it is also possible to recommend products from these itemsets to customers on the store website.
Your team, being knowledgeable of both frequent itemsets mining and recommendation systems, wants to go even further: you want to test other well-known recommendation methods, such as collaborative filtering, to see which recommendation method works better.
The store collects details about customers’ buying habits through a loyalty programme and your team is given access to the representative dataset. The system you build, however, should scale to around one million customer transactions.
The project has been approved by the store management, so you are ready to start building the system which can help to significantly increase sales.
Weighting, Report size & Due Dates
This assessment is worth 25% of your overall grade.
The report should be limited to 12 A4 pages including references.
The submissions are due Sunday Night, 23:59 (end of week 12).
Note:
1. The work you did for Assignment 3a (https://myuni.adelaide.edu.au/courses/101178/assignments/424671) is the basis for Assignment 3b. High-quality work you did individually will help you achieve the best results as a group! This is a group task, each group member gets the same mark for the report.
2. You final code submitted here is the whole system that you used to produce the results for the report.
The code includes individual parts integrated into one system.
Course Learning Outcomes
CLO 2: Apply suitable algorithms for particular data mining problems.
CLO 3: Design and develop processes and products to solve business problems related to data mining.
CLO 4: Resolve data mining problems in collaboration with others.
CLO 5: Communicate effectively in a variety of forms using appropriate terminology.
Task Description
Purpose:
To practice of using association rule mining and recommender system methods and to apply pattern mining and recommendation system methods to solve a practical problem.
Instructions
As a group you will work together to produce a report of no more than approx. 12 pages including references. Your work on Assignment 3a and the results you obtained are the basis for your report.
Task 1, 2, 3
See Assignment 3a: Individual code (https://myuni.adelaide.edu.au/courses/101178/assignments/424670) for detailed instructions of the individual tasks.
Part B: Report
1. The report should be limited to 12 A4 pages including references, noting that including necessary contents has a priority over page limit.
The report should contain:
1. Title page: title of the project, names and ids of group members.
2. Executive summary (non-technical, <= 1 page). This section is for the company management that may not be familiar with technical details. It should include a brief problem description, benefits for the company and feasibility of scaling the solution. You can include test results, but need to explain what they mean for a layperson.
3. Introduction (this starts the technical report): a brief explanation of the problem, the aim of the project.
4. Exploratory analysis: analysis of data that will give some insights how to use it, and potential solutions and potential problems that you may encounter. Diagram of the proposed system.
This section should include the highlights of your exploratory analysis. It is easy to go overboard with figures and details, but remember the intended audience.
5. Frequent pattern mining:
Brief description of frequent itemset mining method(s) and hyper-parameters used. If you choose a method that is not discussed in the course, method description is required and why it is selected, and a reference to a paper or a source.
Selection of pattern as a guide: for association rules: confidence >=80%, for patterns: support >=5% (these numbers are indicative only, you will need to work out your own values for confidence and support which result in good or 'the best' recommendations)
Choose your own thresholds to maximise benefit/profit for the company, justify your choice. Tip: make as parameters, as they affect the timing.
6. Collaborative filtering recommendation method.
Brief description of the method.
Metrics used for evaluation on test dataset.
7. Recommendation method from frequent patterns.
Brief description of the method.
How this method is planned to work with patters/association rules as an input, and what will the output of this method.
8. Discussion of results.
1. How the results were obtained, what metrics were used for evaluation. How the patterns and recommendations were ranked.
2. Five examples of frequent patterns with their confidence and support on both training and test sets.
3. 10 examples of recommendations from these patterns, two examples from each of the above patterns.
4. Table or chart of metrics with discussion, showing results of testing frequent patterns on the test set.
5. Table or chart of metrics with brief discussion, showing results of recommendations on training and test sets, with and without frequent patterns used.
6. Estimation of timing of the system if the dataset is scaled up to one million transactions.
9. Conclusion and Recommendations: which method do you recommend to use and why (recommendation from frequent patterns or directly from the dataset). Include scaling up consideration and benefits for the company. Include recommendations for future improvements.
10. Reflection: what is one main thing you have learned through this project and what would you do better next time.
11. References (Harvard)
12. Contribution Appendices
1. This section does not count towards the 12-page limit. Instead, this section should summarise the work each group member was responsible for. This includes sections of code, proto-figures etc. There is no limit to how long this section is. It should instead represent what each member contributed even if it didn't go into the final report etc.