DSC 40A in-class game

Multi-Armed Bandit Tournament

Each round has four shuffled slot machines with different hidden payout rates. Your job is to learn quickly, update your beliefs after every pull, and then exploit the strongest machine before the round ends. The leaderboard should rank students by Skill Score, which rewards choosing strong machines across the full tournament, not just getting lucky in one round.

Bayesian intuition

Every win adds evidence that a machine might be good. Every loss adds evidence that it might not be. Early on, uncertainty is wide, so a single outcome should move you a little, not all the way.

Why the old exploit breaks

Arm labels reshuffle every round, and your leaderboard score tracks the quality of the hidden probabilities you chose. Repeating the same button no longer produces a strong score.

How scoring works

  • Round Reward is the noisy part: how many wins you actually saw in one round.
  • Skill Score is the main score: how strong your choices were across the tournament.
  • Each round lasts at most five minutes, so you need to explore early and commit before the timer locks the machines.
Start by treating every machine as a 50/50 guess.
Every win nudges your estimate up. Every loss nudges it down.
Quick update rule: (wins + 1) / (wins + losses + 2)
Example: 3 wins and 1 loss gives 4 / 6 = 66.7%
Skill Score = 100 x (sum of true probabilities you chose) / (sum of best available probabilities)
Machines Tested
0
How many different machines you have tried this round
Round Reward
0
How many wins you got in this round
Tournament Reward
0
How many wins you have across all rounds so far
Skill Score
0.0
This is the overall leaderboard score

What To Do Now

Start by sampling several machines. The wide uncertainty bands show that your prior is still broad.
Submission Code
MAB-XXXX-XXXX
This identifies your tournament run. It gets copied together with your final score.
Current Best Guess
No evidence yet
Early guesses are weak. Once uncertainty narrows, this becomes more meaningful.

Casino Floor

Click a machine to pull it. The big number is your current posterior mean. The blue band is a rough plausible range. Wide band means "you still do not know much yet."

Press Start Round to begin the 5:00 timer. Submit unlocks only after Round 6.

Round Report (This Round Only)

Finish a round to reveal the true machine probabilities, compare your posterior beliefs to reality, and see whether your exploration paid off.

Bayes Coach

Leaderboard

Students submit only after the tournament is complete. The form should record the final Skill Score, the raw reward, and the submission code from the copied line.

Suggested columns for the reset leaderboard: student name, skill score, raw reward, submission code, and optionally section or timestamp.