讲解 MLE 5217、辅导 Python程序设计
Dept. of Materials Science & Engineering NUS
MLE 5217 : Take-Home Assignments
Objectives
Based on the chemical composition of materials build a classiffcation model to distinguish metals and non-metals
(Model 1), and then build a regression model to predict the bandgap of non-metallic compounds (Model 2).
Please use a separate jupyter notebook for each of the models.
Data
The data contains the chemical formula and energy band gaps (in eV) of experimentally measured compounds.
These measurements have been obtained using a number of techniques such as diffuse reffectance, resistivity
measurements, surface photovoltage, photoconduction, and UV-vis measurements. Therefore a given compound
may have more than one measurement value.
Tasks
Model I (30 marks)
Dataset: Classiffcation data.csv
Fit a Support Vector Classiffcation model to separate metals from non-metals in the data. Ensure that you:
• Follow the usual machine learning process.
• Use a suitable composition based feature vector to vectorize the chemical compounds.
• You may use your judgement on how to differentiate between metals & non-metals. As a guide, two possible
options are given below.
Option 1 : for metals Eg = 0, and Non-metals Eg > 0
Option 2: for metals Eg ≤ 0.5, for non-metals Eg > 0.5
• Use suitable metrics to quantify the performance of the classiffer.
• For added advantage you may optimize the hyper-parameters of the Support Vector Classiffer. Note: Optimization
algorithms can require high processing power, therefore may cause your computer to freeze (Ensure
you have saved all your work before you run such codes). In such a case you may either do a manual
optimization or leave the code without execution.
• Comment on the overall performance of the model.
Model II (30 marks)
Dataset: Regression data.csv
Fit a Regression Equation to the non-metals to predict the bandgap energies based on their chemical composition
• Use a suitable composition based feature vector to vectorize the chemical compounds. You may try multiple
feature vectors and analyse the outcomes.
• You may experiment with different models for regression analysis if required.
• Comment on the overall performance of the model and suggest any short-comings or potential improvements.
September 2024Important : Comments
• Write clear comments in the code so that a user can follow the logic.
• In instances where you have made decisions, justify them.
• In instances where you may have decided to follow a different analysis path (than what is outlined in the
tasks), explain your thinking in the comments.
• Acknowledge (if any) references used at the bottom of the notebook.
Submission
• Ensure that each of the cells of code in the ffnal Jupyter notebooks have been Run for output (Except for
the hyper-parameter optimization if any).
• The two models (I and II) have been entered in two separate notebooks.
• Name the ffles by your name as ”YourName 1.ipynb” and ”YourName 2.ipynb”
• It is your responsibility to Ensure that the correct ffles are being submitted, and the ffle extensions
are in the correct format (.ipynb).
• Submission will be via Canvas, and late submissions will be penalized.
Evaluation
The primary emphasis will be on the depth and thoroughness of your approach to the problem. Key areas of focus
will include:
* Data Exploration: Demonstrating a thorough investigation of the data, exploring different analytical
possibilities, and thoughtfully selecting the best course of action.
* Implementation: Translating your chosen approach into clean and efffcient code.
* Machine Learning Process: Executing the machine learning process correctly and methodically, ensuring
proper data handling, model selection, and evaluation.
* Clarity of Explanation: Providing clear explanations of each step, with logical reasoning for the decisions made.
*Critical Analysis: Identifying any limitations of the approach, suggesting potential improvements, and making
relevant statistical inferences based on the results.
================================================================
2