辅导 program、讲解 Python设计程序

Homework 6 specsheet
-Extra Credit (replaces lowest HW)-

In this homework, we apply a RL framework to environments available at the OpenAI gym.

Mission command approach: As per §4.5 of the Sittyba, we will tell you what to do, not how to do it.
That is up to you. However, we want you to:
a) Do this homework yourself. Do not copy answers or code from someone else.
b) Restrict your methods (for now) to what was covered in the lecture/lab (in other words, basic
reinforcement learning involving Q-learning, policy gradients, multi-armed bandits, etc.)

Here is what we would like you to do:
1) Go to https://gymnasium.farama.org/index.html
2) Pick one of the available environments – we recommend one of the classic Atari 2600 games:
https://gymnasium.farama.org/environments/atari/ [Make sure to pick one we did not already
cover in lecture or lab, but you can pick any environment that is not an Atari game too]
3) Train an agent to achieve a reasonable level of performance in this environment.
4) Write a brief statement as to how you trained the agent, how you managed the explore / exploit
tradeoff, and explaining any other choices you might have made.
5) Also make sure to comment on how the training went – what was challenging for the agent,
what made training feasible? Explanations of what you couldn't do and why are encouraged with
emphasis on the “why”
6) Document the performance of the agent by plotting total rewards as a function of training
episodes.
7) Make sure to include your code as a separate file.

Suggestions and recommendations:
1. Picking a more complex environment will merit more grade points. To check complexity, go to
https://github.com/openai/gym/wiki/Table-of-environments and look at Observation Space and
Action Space. We recommend to choose an environment which has Discrete Action space. We
want to keep grading criteria (in terms of points) flexible to see what students can actually do,
but as a broad heuristic, something with the complexity of “LunarLander-v2” would be ok,
something with the complexity of “BipedalWalker-v2” would be good, and something with the
complexity of “AirRaid-ram-v0” would be excellent. But don’t necessarily pick those specific
environments. Pick something that sparks joy, for you personally. It will shine through.
2. Try implementing an algorithm on your own instead of using stable baselines 3. If you use sb 3,
explain what you did to optimize the model. Try checking how far your model can go by trying
more complex environments and find the breaking point
3. You can also use the library NEAT-Python: https://neat-python.readthedocs.io/en/latest/ If you
decide to use NEAT, experiment on how far NEAT can go and note your observations.
4. So either a) implement your own algorithm, or b) use SB-3 (and note what you did to optimize
the model) or c) use NEAT-Python, find the most complex env you are able to solve with NEAT
and note what leads to better NEAT implementations
5. Whichever environment you pick, make sure your RL bot is learning the environment reasonably
well (as evinced by the plot of total reward over episodes of training).