DSC 40A in-class game
Multi-Armed Bandit Tournament
Each round has four shuffled slot machines with different hidden payout rates. Your job is to learn quickly, update your beliefs after every pull, and then exploit the strongest machine before the round ends. The leaderboard should rank students by Skill Score, which rewards choosing strong machines across the full tournament, not just getting lucky in one round.
Every win adds evidence that a machine might be good. Every loss adds evidence that it might not be. Early on, uncertainty is wide, so a single outcome should move you a little, not all the way.
Arm labels reshuffle every round, and your leaderboard score tracks the quality of the hidden probabilities you chose. Repeating the same button no longer produces a strong score.
How scoring works
- Round Reward is the noisy part: how many wins you actually saw in one round.
- Skill Score is the main score: how strong your choices were across the tournament.
- Each round lasts at most five minutes, so you need to explore early and commit before the timer locks the machines.
Every win nudges your estimate up. Every loss nudges it down.
Quick update rule: (wins + 1) / (wins + losses + 2)
Example: 3 wins and 1 loss gives 4 / 6 = 66.7%
Skill Score = 100 x (sum of true probabilities you chose) / (sum of best available probabilities)
What To Do Now
Casino Floor
Click a machine to pull it. The big number is your current posterior mean. The blue band is a rough plausible range. Wide band means "you still do not know much yet."
Press Start Round to begin the 5:00 timer. Submit unlocks only after Round 6.
Round Report (This Round Only)
Bayes Coach
Leaderboard
Students submit only after the tournament is complete. The form should record the final Skill Score, the raw reward, and the submission code from the copied line.