Using Simulations and AI to Optimize Supply ChainS

Illustrated using the Beer Distribution Game

Berlin, 28.5.2020

Transforming Enterprises

Topics Today

Introduction to the Beer Distribution Game, a supply chain simulation.

Use computational modelling to understand supply chain dynamics and find better playing strategies.

Train autonomous agents to play the game using a machine learning / reinforcement learning approach

Watch A LIve Recording Of This Presentation

Download the Code and Computational Notebooks

Download all the code and
in-depth computational notebooks from GitHub.

All resources are available via


The Beer Distribution Game

An Introduction

  • ​The Beer Game was developed in the 1960s at MIT to illustrate how difficult it is to manage dynamic systems – in this case a supply chain that delivers beer from a brewery to the end consumer
  • The game became well known in the 1990s after Peter Senge's description of it in his world-wide best selling book The Fifth Discipline
  • The game is great because even though it is about a very simple system and despite very simple rules, the resulting behaviour is quite complex.

A Brief Walkthrough OF the BEER Game

Try it yourself at

The game is usually played with four players, but the single player version is also fun.

But let's look at the rules and some potential pitfalls first!

The situation

  • You are part of a supply chain that delivers beer from a brewery to the end consumer
  • try to meet the demand of your respective customer at all times, while keeping inventory low.

The Rules

The rules of the game are simple – in every round you perform the following steps:

  1. Check deliveries. Check how many units of beer are being delivery to you from your supplier in the supply chain.
  2. Check orders. Check how many units of beer your client in the supply chain has ordered.
  3. Deliver beer. Deliver as much beer as you can to satisfy demand (the game does this for you).
  4. Make an order decision. Decide how many units of beer you need from your supplier to keep your inventory stocked up.





Some Pitfalls to be aware of

Backorder and inventory costs


How Performance is Appraised

Individual Supply Chain Cost.

Your accumulated cost should remain below $8.300.

Overall Supply Chain Costs.

Total supply chain cost should remain below $29.300.

let's Give it A go

Order Behaviour in a typical Game

What the consumer orders

How the supply chain reacts

Change in order behaviour from 100 units to 400 units

Peak order of over 30,000 units!

Surplus in a typical Game

ThE Feedback Loop Governing The Supply Chain

ThE Feedback Loop Governing The Supply Chain

The problem: Each player can only control part of the control loop

ThE Error most people Make: Including the Back Order

Improvement Strategy 1: Ignore Back Orders

Outgoing Orders = Incoming Orders + Target Inventory - Inventory

Improvement Strategy 1: Ignore Back Orders

Improvement Strategy 1: Ignore Back Orders

Improvement Strategy 2: Remember opeN orders

Remember the orders that are in the supply line 

Improvement Strategy 2: Remember opeN orders

Remember the orders that are in the supply line 

The Target Supply Line Depends on the Delivery Delay

Target Supply Line = Delivery Delay * Incoming Orders

Orders = Incoming Orders + Target Inventory - Inventory + Target Supply Line - Supply Line

Order Behaviour with the new ordering policy

The Individual Cost Target is now reached

The whiplash effect

Even if every player behaves rationally, there will still be a "whiplash" effect – because the orders become successively larger along the supply chain

SuppLy chain costs are still way off target

Improvement Strategy 3: Increase Inventory adjustment time

The solution to dealing with the whiplash effect is to adjust inventory slowly!

ImproveD order Behavour

Target Supply chain costs are met

Target Retailer Costs are also Met

SuMMARY OF the computational modeling approach

Build a computational model to capture your understanding of the system

Use simulations to test the effect of different policies

Find policies that help you reach your targets

Controlling complex systems is hard and even harder when you only have partial control

We often don't behave rationally when overwhelmed with information and under pressure

Sometimes it pays to "have faith in the system"

Small changes can have large effects

Even if everybody optimises locally, this doesn't necessarily lead to a global optimum

Concrete learnings from the Beer Game

Training AI To Play the Beergame

An approach using reinforcement learning

The idea: use autonomous agents to play the game and train them using reinforcement learning, a machine learning technique.

Agents have information about their environment.


They perform actions and receive rewards (or punishments) for them.


They learn through trial and error.

Agents For the Beer game

The agents for the Beer Game are very simple


A reinforcement learning technique

Each agent has a "Q-Table", which defines the expected total reward for each state and each possible action.


Agents start with an "empty" q-table and then fill it by learning through trial and error.

Always choose the action with the highest reward!

THe Key: Setting the right rewards

Much like with us human beings, we need to set the right rewards

A reward for reaching the cost targets (and some milestones along the way)

"Game over" as soon as cost targets are missed (to avoid wasting time and memory)

THe Results

Ten episodes

37,500 Episodes (almost there)

50,000 Episodes

The AI agents can now play the game

THe Winner Is ...

The agents outperform our initial ordering strategy!

But: the agents are optimised towards the concrete game situation - they would fail in a more dynamic stetting.

The ordering strategy we developed initially is robust in all ordering situations.



Reinforcement learning algorithms are quite easy to implement.

Finding the right reward policies is difficult: it is hard to avoid setting rewards to narrow (much like in real life).

BUT: If you can define clear objectives and rewards, using reinforcement learning in combination with simulations can be very useful for automating control systems.

Next meetup After The Summer Break

Building An Adaptive Enterprise on 24.9.2020

What characteristics define an adaptive enterprise?

Which capabilities and buidling blocks do you need to create an adaptive enterprise?

How can you get there?

An adaptive enterprise is an enterprise that can flexibly adapt to a changing economic environment.

Save The Date!

"Virtual Coffee Lounge"

for post-event discussions


Dr. Oliver Grasl

Oliver co-founded transentis in 1997 and has been managing partner ever since.

After reading mathematics and theoretical physics at Cambridge University (MA Cantab) and the University of Innsbruck he later specialised in business engineering at the University of St. Gallen (Executive MBA, Dr. oec.).

Oliver’s personal mission is to help his clients to explore, re-design and transform their enterprises using his expertise in enterprise architecture and enterprise analytics.

+49 173 6546727

+49 30 800937050