Transforming Enterprises
Introduction to the Beer Distribution Game, a supply chain simulation.
Use computational modelling to understand supply chain dynamics and find better playing strategies.
Train autonomous agents to play the game using a machine learning / reinforcement learning approach
Download all the code and
in-depth computational notebooks from GitHub.
All resources are available via www.transforming-enterprises.com
But let's look at the rules and some potential pitfalls first!
The rules of the game are simple – in every round you perform the following steps:
1
2
3
4
Backorder and inventory costs
Delays
Individual Supply Chain Cost.
Your accumulated cost should remain below $8.300.
Overall Supply Chain Costs.
Total supply chain cost should remain below $29.300.
Change in order behaviour from 100 units to 400 units
Peak order of over 30,000 units!
The problem: Each player can only control part of the control loop
Even if every player behaves rationally, there will still be a "whiplash" effect – because the orders become successively larger along the supply chain
The solution to dealing with the whiplash effect is to adjust inventory slowly!
Build a computational model to capture your understanding of the system
Use simulations to test the effect of different policies
Find policies that help you reach your targets
Controlling complex systems is hard and even harder when you only have partial control
We often don't behave rationally when overwhelmed with information and under pressure
Sometimes it pays to "have faith in the system"
Small changes can have large effects
Even if everybody optimises locally, this doesn't necessarily lead to a global optimum
Concrete learnings from the Beer Game
The idea: use autonomous agents to play the game and train them using reinforcement learning, a machine learning technique.
Agents have information about their environment.
They perform actions and receive rewards (or punishments) for them.
They learn through trial and error.
Each agent has a "Q-Table", which defines the expected total reward for each state and each possible action.
Agents start with an "empty" q-table and then fill it by learning through trial and error.
Always choose the action with the highest reward!
A reward for reaching the cost targets (and some milestones along the way)
"Game over" as soon as cost targets are missed (to avoid wasting time and memory)
Ten episodes
37,500 Episodes (almost there)
50,000 Episodes
The AI agents can now play the game
But: the agents are optimised towards the concrete game situation - they would fail in a more dynamic stetting.
The ordering strategy we developed initially is robust in all ordering situations.
Reinforcement learning algorithms are quite easy to implement.
Finding the right reward policies is difficult: it is hard to avoid setting rewards to narrow (much like in real life).
BUT: If you can define clear objectives and rewards, using reinforcement learning in combination with simulations can be very useful for automating control systems.
What characteristics define an adaptive enterprise?
Which capabilities and buidling blocks do you need to create an adaptive enterprise?
How can you get there?
An adaptive enterprise is an enterprise that can flexibly adapt to a changing economic environment.
Save The Date!
"Virtual Coffee Lounge"
for post-event discussions
Oliver co-founded transentis in 1997 and has been managing partner ever since.
After reading mathematics and theoretical physics at Cambridge University (MA Cantab) and the University of Innsbruck he later specialised in business engineering at the University of St. Gallen (Executive MBA, Dr. oec.).
Oliver’s personal mission is to help his clients to explore, re-design and transform their enterprises using his expertise in enterprise architecture and enterprise analytics.
oliver.grasl@transentis.com
+49 173 6546727
+49 30 800937050