Tooling/Setup for Professional Python Development
DS3500: Advanced Programming with Data
Purpose
The purpose of this first assignment is to get you settled into working with professional python software development tools, including working with environments, using an IDE, and sharing code with Git. We’ll go over the various steps together in class. The goals of the assignment are:
• You are up to date with the Anaconda python distribution
• You have a professional IDE such as PyCharm that is tied to the Anaconda base environment
• You are comfortable with navigating the command line, editing text files, and issuing basic git commands
• You have set up a private repo for your classwork and cloned the public read-only class repo to a convenient location such as Documents/courses.
• You are configured for the kind of software development environment you are likely to encounter in industry or on a co-op. Professional software developers work in these sorts of development environments every day.
Overview
Step 1. Install the anaconda distribution. If you installed it for DS2500, you are probably all set, though it can be convenient, and save you some downstream headaches if you occasionally delete your anaconda installation folder and do a fresh install. From the terminal, or in Windows, using the Anaconda prompt, running the command conda info will report which version you are using. You can update the libraries in your base installation with the command:
conda update --all
Step 2. Install PyCharm. Alternatives are ok, but I will be using PyCharm in class. The free community edition should be fine. You can set yourself up with a JetBrains educational account and get free access to the Professional edition if you like.
Step 3. Set up a new Anaconda environment, separate from base, called ds. Activate ds and install matplotlib, pandas and other libraries of your choosing into the ds environment.
Step 4. If you aren’t already familiar with git, read Pro Git, Chapter 1. (https://git- scm.com/book/en/v2/) Install git on your laptop, if you don’t already have it already. Section 1.5 gives installation instructions for Linux, macOS, and Windows. You should know, at minimum, the commands: init, clone, push, pull, commit, add, and status. It’s possible to use the git Desktop, or even PyCharm but I think most professional developers prefer using the command line.
Step 5. Make sure you have an account on the Khoury Enterprise Git Server
(https://github.khoury.northeastern.edu/). I believe Khoury students already have an account. If not, you can sign up for an account here:https://my.khoury.northeastern.edu/account/apply
While you will use your Khoury account to log into the Git Server, Khoury recently changed the Git Server so that you can no longer these credentials for authentication when executing git commands. Instead, you will set up ssh keys that allow you to authenticate automatically. On the GIT server, go to your profile settings and select SSH and GPG keys on the left-hand panel. You’ll need to set up and register SSH keys for each device that you use to connect to the git server. On this page you’ll find a link to a guide to generating SSH keys. YouTube is an excellent resource for help with this sort of thing also!
Step 6. Create a PRIVATE personal repo for this class. I suggest calling it ds3500 or ds3500_priv. Add a readme and a .gitignore configured for python development. (The .gitignore file is a hidden file containing lots of file patterns that will be ignored by git and not synchronized with the repo.) Clone it to a folder of your choice. I recommend:
Documents/courses/. This will create a ds3500 subfolder. THIS is where you will create your python projects, homework, etc.
Step 7. Navigate to Documents/courses and clone the class repo: ds3500_fa25. It will live side- by-side with your private repo. I’ll be posting class lecture handouts and code samples in this repo going forward. Remember this is a read-only repository. If I distribute starting code through this repo, you’ll want to copy it to a personal private repo of your own choosing. Remember, do not add code or files to this repo. Treat the public repo as READ ONLY. Use your private repo for creating projects and code instead.
Step 8. Using terminal commands, copy the handouts folder from public repo to your private repo. Write down these commands. You will use them whenever I give you starting code for a homework assignment.
Step 9. Create a python project in your private repo configured to use your new ds environment as the active interpreter. Projects are directories where project-specific program files will be stored. (Do NOT store programs in random file system locations such as your desktop. And remember that the public (shared) repo is just for receiving code from class and other hand-outs. You do not have commit privileges on the class repo. Only add projects, code, and data to your personal (private) repository. Be Organized!
Step 10. Create a simple plot – it can be anything you like so long as it uses matplotlib. Take a screenshot of your PyCharm window showing both your plot and your status bar so that we can see that ds is indeed your active project environment.
Step 11. Unzip hw/hw1/secret.zip and read an inspiring quote.
Deliverables
On Gradescope, please submit a single PDF (not DOCX) document with the following seven pieces of information:
a) Your Khoury enterprise git server username
b) The URL of your private class repo. Since it is a private repo, we shouldn’t be able to access it! (The TAs will verify that you configured your repo as private.)
c) Your anaconda version
d) Your installed git version number
e) Your PyCharm screenshot showing your plot and your status bar with the ds environment. This project should be created and saved in your private repo!
f) The text of the inspiring quote from step 11.
g) The terminal commands used to copy the handouts folder from the public class repo to your private class repo. Include the commands to navigate to the public repo clone folder, copy the handouts folder, navigate to your private repo, and verify that the files have been copied. We just want to make sure you are comfortable with moving files from place to place and navigating the directory tree using the command line.