辅导 MGT001371 - Scheduling Manufacturing Systems Homework Assignment III讲解 Python语言

MGT001371 - Scheduling Manufacturing Systems

Homework Assignment III

• To be submitted on 04.07.2024 (Thursday 13:00) via TUM Moodle.

• Late submission will not be accepted. No collaboration!

• You must submit your handwritten or typewritten solutions in a single .pdf file and your codes in separate .py files.

• You should zip (compress) all of your files and change the name of your .zip file as ”NameLastname studentID”, i.e., ”BaharOkumusoglu 01234567.zip” .

• Coding questions will be graded based on the result they generate and not the entered syntax. If a particular code section does not work, zero points are given to that section. We try to give points for the outcome of individual code sections wherever possible, but reserve the right to grade multiple sections together if necessary.

• Be precise. Do not make vague statements and leave any room for interpretation. You must also present your step-by-step solutions.

1. (10 points) Short quiz questions

(a) (2 points) In online scheduling, a production plan is created for the planning horizon considering future uncertainties. As the outcomes of uncertain parameter are already considered, later rescheduling is not needed. True or False?

(b) (2 points) An undiscounted (γ = 1) sequential decision model acts greedily, i.e., chooses the best action in the current state that maximizes the immediate reward. True or False?

(c) (2 points) In the value iteration algorithm solving an MDP, the value function repre- sents the expected long-term cumulative rewards for each state. True or False?

(d) (2 points) Capacity consumption constraints as we know them from mathematical pro- gramming can be considered in an MDP by defining state space-dependent action spaces A(s) if the available machine capacity is part of the state space. True or False?

(e) (2 points) For an MDP problem with 2,000 states, 10 possible actions in every state and 10,000 entries in the reward matrix, what is the dimension of the transition probability matrix?

Background for following questions

Biomanufacturing refers to the production of biological products, such as therapeutic pro- teins, vaccines, enzymes, and other bio-based materials, using living cells or organisms. It involves utilizing biotechnological processes to manufacture these products on a large scale. There are numerous operational challenges that make production planning and process con- trol challenging in biomanufacturing. For example, product formation is often non-linear with lower productivity rates at the beginning and end of a batch fermentation. The use of living organism and natural raw materials can lead to uncertainties in product quality, yield or processing times. Also, biomanufacturing is typically carried out in batches, where a specific quantity of cells or organisms is cultivated and processed together. Batch failures can occur due to contamination, process deviations, or unforeseen events usually requiring to discard the whole batch.

2. (30 points) Consider a biomanufacturing process that produces biopharmaceuticals in a batch process. A batch can be fermented for a maximum of 4 days. During that time, the product is being formed in a non-linear fashion with the revenues representing the reward achieved if the batch is harvested after the respective batch fermentation duration. The problem is, that the batch can also fail which results in zero revenue. The probability of a batch failure increases with the batch fermentation duration. Fore example, a batch which has been fermented for 2 days has a probability of 10% to fail within the next day of fermentation.

The operating cost to ferment the batch for another day is C = −10MU. The revenue and batch failure risk per batch duration are given in Table 1. The operations manager wants to determine the optimal harvest day. Note that a batch that has failed or has been fermented for four days needs to be harvested. A harvested batch is not replaced with a new one, hence the problem can be modelled as an indefinite-horizon MDP problem. This means that the dynamic process stops whenever the batch is harvested.

Batch duration [Days]	Revenue R [MU]	Batch failure risk [%]
0	0	2
1	10	5
2	40	10
3	80	20
4	100	n/a
Failure ”F”	0	n/a

Table 1: Process parameter for batch biomanufacturing process with batch failure risk.

(a) (10 points) Formulate the state space, action space and rewards for each state and action pair for this dynamic decision-making problem.

(b) (10 points) Visualize the transition probabilities in a graph with the nodes being the states and the arcs stating the action and transition probabilities. You can use the template provided in the appendix. Note the additional ”Harvested (H)” state, which is a terminal (sink) state with no allowed action and no immediate reward.

(c) (10 points) Write down the Bellman optimality equation for each state explicitly and determine the optimal actions for each state. What is the optimal harvest time point for this batch fermentation?

3. (20 points) Biopharmaceuticals are drugs produced in a biomanufacturing process, usually using mammalian cells. The production process of the active pharmaceutical ingredient

(API) consists of an upstream stage, in which the cells are fermented in batches to produce the API, and the downstream process which captures and purifies the API after batch harvest.

We focus on the upstream stage where each batch of a particular product is fermented for approximately 10-14 days and a new batch is started once the previous was harvested. Note that a batch can only be harvested in the morning, i.e., the decision to harvest can only be made once a day. The API formation during a batch fermentation follows a sigmoidal curve with slower formation rates at the beginning and at the end of the process. The uncertain product formation rate, i.e., the quantity of product added during the next day of fermentation, follows a Normal distribution N(µ(s),σ(s)) with the mean formation rate µ(s) and the standard deviation σ(s) depending on the current product concentration.

The product concentration is Xmin = 10 L/mg at the beginning, as the batch is being trans- ferred from a previous fermentation stage. The maximum product concentration that can be achieved is Xmax = 2150 L/mg . The volume of the bioreactor is assumed to be constant at VB = 15, 000L. We consider fermentation cost per day of CF = −20$, fixed batch harvest cost of CH = −350$ and a revenue of CR = 0.0001 mg/$. The operational question arising is the optimal batch harvest time point assuming that a harvested batch is immediately replaced with a new batch.

(a) (8 points) The state space needs to represent the current product concentration in the bioreactor. Hence, it can be any value in the interval [Xmin,Xmax]. What is the problem with such a state space if we want to solve the problem using dynamic programming, e.g., policy iteration? Briefly explain how this problem could be addressed and which additional modelling parameter needs to be introduced. Is the solution we obtain from the policy iteration algorithm still optimal? Name a class of solution algorithms that would not have a problem with this kind of state space.

(b) (12 points) Define the state space, action space and reward function of an MDP model for this problem using formal mathematical notation and briefly explaining it.

4. (40 points) Python implementation

(a) (30 points) Implement a general policy iteration algorithm in Python to determine the optimal policy for an MDP problem. For this, write three functions: (1) Policy evaluation that takes the MDP and a policy as an input and returns the state values, (2) policy improvement that takes the MDP, a policy and the state values as an input and returns an improved policy, and (3) general policy iteration that calls the functions (1) and (2) iteratively until the convergence criterion is met. In the Python template ”Scheduling MDP DP HW4.py” you will find the core structure of these three functions with missing code sections marked as #CODE HERE.

(b) (10 points) Solve the biopharmaceutical batch fermentation problem from Question 3

in Python using policy or value iteration. The Python template ”Scheduling MDP Biopharma Case

provides you the parameters and some pre-filled code sections for this case. You have to define the state space, action space and reward function.

What is the optimal harvest policy for this problem and how can it be implemented in practice? How does the policy change if the batch harvest CH are doubled, from CH = −350 to CH = −700? Why does the harvest policy change like this?