Building A March Madness Bracket Simulator
March Madness is upon us. We will soon be inundated with power ratings and probabilities to help us craft a bracket on the NCAA tournament. Whether you are entering a bracket contest with friends or getting into the weeds betting NCAA props and futures, you’ll want to be able to quantify each school’s chances of making it through each round. The best way to do this is by running Monte Carlo simulations of the tournament. While Unabated does not have a March Madness bracket simulator product (yet), this article will take you through the process of creating your very own simulator.
While this article will not alone allow you to create your own simulator, it will point you in the right direction and give you a framework for doing so, should you so choose. If you know R, Python, or some other programming language, that’s ideal, but you can even run simulations in Excel.
Power Ratings For NCAA Tournament Teams
The first step in building your own simulator is coming up with a quantitative rating for each team – a power rating – that can be used to compare teams. Since I don’t model basketball, I don’t have my own power ratings. If you’re like me, your best option is to use publicly available power ratings, such as KenPom, ESPN’s BPI, or TeamRankings. There are dozens of quantitative ratings systems out there. I would highly recommend creating a weighted average of publicly available power ratings you respect, weighting more for ratings you feel are sharper.
Ken Massey’s site (no relation to Cade Massey of Massey-Peabody) has a great page where you can compare the ranking of each team across various systems, and also links to the sites of 60 different publicly available rating systems. Over the years, there have been sites that have tracked how good different rating systems have performed. But I’ll leave it to the reader to do their research and figure out their own rating blend. The difference in power ratings between two schools should represent the average point differential if they played on a neutral court.
Remember that your simulations will only be as good as the ratings they’re based on. The old adage “garbage in, garbage out” applies here. If your ratings differ significantly from the market – you can see this by looking at the point spread applied by your ratings and comparing it to the point spreads for first-round games – your simulation will likely show lots of value on futures, but that will just be a result of showing value on each of the individual games they play.
Creating Your March Madness Bracket Simulator
Now that you have your ratings, you’re ready to get started. First, you need to set up your bracket. I would create a csv file that looks just like the bracket, and create a column with a sequence of numbers, from 1 to 64, that can be used to sort and keep the bracket in order. This will come in handy in determining the matchups you are simulating in subsequent rounds. To actually build your simulation, you need two key things:
Translating Power Ratings To Win Probability
You need a way of translating your point spread (the difference between ratings) into a win probability. Since your point spread is just using the average value of a point, it will represent the average, or mean, score differential between two teams playing. Your power ratings don’t know that a game is much less likely to land on “1” than “3”. Your ratings treats every point as being worth the same. Since Unabated’s Alternate Line Calculator uses a point spread (a median), as the input, it will not be ideal for our purposes.
You can do one of two things:
One option is to sample from a normal distribution centered around your mean score differential. You will need a standard deviation for the distribution, which does vary a bit based on the game total and line. But since the shot clock was changed to 30 seconds, the average standard deviation of the cover differential (the difference between actual score diff and expected score diff) is about 11.2 points. You simulate a score differential based on this distribution, and if it’s greater than 0, you give the team a win; if it’s less than 0, they get a loss.
The other option is to just to estimate a moneyline and simulate from that. If you fit a logistic regression of team win/loss on the point spread for historical data (you can add the total here if you want to be really on-point), you get a simple formula for translating your point spread into a win probability.
I fit this logistic regression using historical data, since the shot clock changed to 30 seconds, and got the following equation to convert point spread to win probability.
Win Prob=1/(1+e-(.163*pred score diff))
Account For Dynamic Uncertainty In Your March Madness Bracket Simulator
The second thing you need to do for your simulation is account for dynamic uncertainty. What I mean by that is that a team’s rating will change as a result of how well they play. When George Mason made the Final 4 as a #11 seed, the market’s assessment of how good they were changed significantly over the course of the tournament, as the Patriots exceeded expectations in each round. Team ratings change over the course of the season; why should the tournament be any different?
This means that in order for your simulations to reflect reality, we need team power ratings to change after each round of games. This is a function of how well each team played relative to expectation. If you had a team favored by 2 points in the first round, and they won by 17 points, their rating should improve. On the other hand, a 17-point favorite winning by 2 should have their rating decline, despite advancing to the next round.
But there’s a lot more to whether a team’s rating should improve than just how many points they exceed their projection. Maybe they benefited from their opponent shooting 50% from the free throw line or missing a bunch of wide-open threes. Maybe the opponent’s best player got in foul trouble early and played half the minutes he normally does.
The problem is you don’t know the HOW in your simple simulation; just know the final score. Let’s revisit this in a second. First, you need a rating updating function that’s based on what you do have in your simulation – a final score difference.
Using Historical Scores And Data
To build a rating update function properly, you’ll need some historical ratings and game scores. The way I’d approach it is to run a regression of the change in a team’s rating on score differential relative to expectation. Have an interaction of score diff differential relative to expectation with some form of number of games the team has played. I would surmise that log of games played would fit best, though I haven’t fit this myself.
We would expect a team’s rating to be a lot more reactive to one game early in the season than late in the season. We have a much better idea of how good a team is late in the season. In more mathematical terms, the standard deviation of a team’s true power rating is much higher early in the season than late in the season, meaning an individual game will carry more weight early in the season than late. The regression might look like this:
rating diffi=ß1(scorediffi-1-exp scorediffi-1)+ß2(score diffi-1-exp score diffi-1)*ln(game number)
Just as important as the coefficients is the residual. In the absence of some sort of “shock”, like a star player being hurt. I’d expect the residuals (errors) to be normally distributed. These errors are the difference between the model’s prediction of rating change and actual rating change. What drives them is the HOW I mentioned earlier. The final score doesn’t tell the entire story of how well a team played. A good rating system will use statistics that a simple simulation will not know.
So our way around this is to sample from a distribution of rating changes. The distribution is normal, it’s centered at our estimate (based on the regression), and has a standard deviation equal to the standard deviation of the residuals from the regression.
That’s it. That’s the crux of the simulation. Simulate game. Update power rating. Rinse and repeat. After each round, you determine new matchups, and just continue on through the tournament. You can add bells and whistles – bonuses/penalties based on travel distance, for example – but the structure is quite simple. Trust me when I say that you do not need to be an expert coder to handle Monte Carlo simulations.
Taking A Shortcut Using Teamrankings.com
Perhaps your head is spinning at everything that’s been laid out. Maybe you realize that this task is longer than the runway between now and the first round. Don’t worry, there is a shortcut available. Our friends at Teamrankings.com have developed a powerful suite of tools that simulate brackets for you. In fact, they even have resources that take into consideration things that we didn’t even touch on in this article.
For instance, if you’re playing in a tournament pool, your goal is to win the pool. Sometimes that requires some contrarian picking depending on the pool size and the structure of the tournament. Teamrankings takes that into consideration with their Bracket Builder Tool.
You can also use Teamrankings to solve for a variety of betting related questions. Want to know the odds a certain team makes it to the Sweet 16? They’ve got answers for you in their Survival Odds tool. Want to analyze a matchup? They’ve got a Matchup Predictor Tool.
They’ve given Unabated users a special discount on their product as well. To learn more about their product and their methods, check out this recent livestream with them: