Understanding reinforcement learning requires a solid grasp of probability and randomness. In this first notebook, we’ll explore basic probability concepts through simple Python simulations. By the end, you should feel comfortable drawing samples from different probability distributions and understanding concepts like expected value and variance in a hands-on way. ### Introduction In reinforcement learning, an agent makes decisions in the face of uncertainty. Outcomes are often random, so we need to understand probability to predict and evaluate an agent’s performance. We’ll start with simple simulations: - Coin flips and dice rolls: Drawing samples from discrete distributions. - Continuous distributions: Sampling from distributions like uniform and normal. - Estimating probabilities: Using simulations to verify theoretical expectations (Law of Large Numbers). - Expected value: Understanding the average outcome of a random process.
Let’s begin by simulating some basic random events.
Simulating a Fair Coin Flip
import random# Simulate 10 coin flips and count Headsnum_flips =10heads_count =0for i inrange(num_flips): flip = random.random() # uniform [0,1)if flip <0.5: heads_count +=1print(f"Out of {num_flips} flips, got Heads = {heads_count}")
Out of 10 flips, got Heads = 6
import mathtrials =1000total_heads =0for t inrange(trials): heads =0for i inrange(num_flips):if random.random() <0.5: heads +=1 total_heads += headsavg_heads = total_heads / trialsprint(f"Average Heads in {num_flips} flips (over {trials} trials) = {avg_heads:.2f}")
Average Heads in 10 flips (over 1000 trials) = 4.95
Simulating a Six-Sided Die Roll
outcomes = [random.randint(1, 6) for _ inrange(20)]print("20 die rolls:", outcomes)print("Average roll value:", sum(outcomes)/len(outcomes))
20 die rolls: [4, 2, 2, 3, 6, 3, 2, 6, 5, 5, 1, 5, 3, 5, 4, 3, 4, 1, 3, 5]
Average roll value: 3.6
outcomes = random.choices(["H", "T"], weights=[0.7, 0.3], k=10)print("10 flips of a biased 70/30 coin:", outcomes)print("Number of Heads:", outcomes.count("H"))
10 flips of a biased 70/30 coin: ['T', 'T', 'H', 'H', 'T', 'H', 'H', 'T', 'H', 'T']
Number of Heads: 5
Sampling from Continuous Distributions
Beyond discrete outcomes, many processes produce continuous-valued outcomes. For example, an agent’s sensor reading might be normally distributed noise, or time between events might be exponentially distributed.
Let’s draw samples from a normal distribution with mean 0 and standard deviation 1 (standard normal). We’ll generate 5 samples:
for _ inrange(5): sample = random.gauss(0, 1) # or random.normalvariate(0,1)print(f"Normal sample: {sample:.3f}")
Normal sample: -1.252
Normal sample: -0.559
Normal sample: 0.812
Normal sample: -1.397
Normal sample: 0.005
This hands-on practice with probability will be valuable as we move into reinforcement learning. An RL agent’s rewards and state transitions are often random, so understanding distributions will help in designing and evaluating learning algorithms.
In the next notebook, we’ll create a simple gambling scenario to start applying these concepts to decision-making problems.
Exercise: Generate 10,000 samples from random.gauss(0,1). Compute the average and standard deviation of these samples to confirm they are close to 0 and 1, respectively. (Hint: Use sum(samples)/N for mean, and (sum((x-mean)2 for x in samples)/N)0.5 for standard deviation.)