---
title: I Open-Sourced My UFC Prediction Model, Weights, and Database
description: Five years of building MMA-AI, an open-source UFC prediction model with model weights, a historical odds database, and the ways I learned not to fool myself with machine learning.
canonical_url: https://mcinerney.ai/writings/i-open-sourced-my-ufc-prediction-model-weights-and-database/
date_published: "2026-06-05"
last_updated: "2026-06-05"
generated_html: true
kicker: Open Source
deck: Five years, 15,000+ hours, an 8% ROI since 2024, and a lot of machine-learning mistakes preserved in amber.
tags: Machine Learning, Sports Prediction
card_image: /writings/i-open-sourced-my-ufc-prediction-model-weights-and-database/assets/shap-belal-gabriel.png
card_summary: Five years of building an open-source UFC prediction model, weights, database, and a pile of lessons about leakage, validation, calibration, and odds.
sitemap_summary: Open-source release post for MMA-AI covering the UFC prediction model, model weights, database, betting ROI, data leakage, validation, calibration, odds, feature selection, and LLM-assisted development.
llms_summary: Release post for MMA-AI, an open-source UFC prediction model with model weights and database, covering sports prediction ML, historical odds, expected value, data leakage, validation, calibration, feature selection, and LLM-assisted feature engineering.
markdown_label: MMA-AI open-source release article markdown mirror
---

# I Open-Sourced My UFC Prediction Model, Weights, and Database

Five years, 15,000+ hours, an 8% ROI since 2024, and a lot of machine-learning mistakes preserved in amber.

**Repo:** [DanMcInerney/mma-ai](https://github.com/DanMcInerney/mma-ai).

**Database + model:** [huggingface.co/datasets/DanMcInerney/mma-ai](https://huggingface.co/datasets/DanMcInerney/mma-ai).

![SHAP feature-impact chart for Belal Muhammad versus Gabriel Bonfim.](assets/shap-belal-gabriel.png)

<figure class="article-figure">
  <img src="assets/2026.png" alt="MMA-AI 2026 event ROI list showing 11 of 12 positive ROI events.">
  <figcaption>11/12 positive ROI events in 2026 baaaaaby!!</figcaption>
</figure>

## Intro

Back in 2011 I read *The Singularity Is Near* by Ray Kurzweil. I found the evidence extremely compelling and knew my future lay in AI, but I was just a broke college kid finishing a useless psychology degree. I had been training BJJ and Muay Thai for a few years, took and won my first fight, then immediately locked myself in my room and designed a five-year plan to learn hacking, Python, Linux administration, and networking at the same time. Learn something, automate it with Python, release it open source, and do it again.

Fast forward to 2020. I was a senior security researcher and bored of the job. I had automated the majority of my job away, so I locked myself in my room again for a few years and built the [mma-ai.net](https://mma-ai.net) model. At that point, there was only one other modeler of UFC: [wolftickets.ai](https://wolftickets.ai). I cold emailed him for advice and ended up becoming coworkers with him at an AI security startup that was later acquired by Palo Alto Networks.

And here we are. Five years and 15,000+ hours into MMA-AI's database and model. It is officially time to open source it.

I'm fairly certain this is the first time a Vegas-beating machine-learning model has been completely open sourced. Since 2024 it has posted about 8% ROI, which puts it into the same category as the world's best UFC sharps. Except I never had to do tape study or any of that boring stuff. I just hit enter on my keyboard like a nerd.

I should note this is a complete release: code, model weights, and database. I'm fairly certain the database is the largest in the world of UFC stats, and it includes incredibly granular historical odds. Not just closing odds, but odds scraped by the hour starting at open. Truly a treasure trove of information. Probably about 60-70% coverage of all fights in the last 10 years.

The codebase and database are an utter terror, by the way. It is five years of stream-of-consciousness programming, and it is so big and bloated that I'm scared to refactor it in case I introduce one of the many tiny bugs that blows the model up, as I have done in the past. Forgive the code. It works, and that is what's important.

I think what I'm most proud of is the thousands of hours of collaboration, teaching, and learning I've done over these five years with other people interested in this incredibly niche hobby of machine learning for sports prediction. Huge shoutout to the OG in the space, wolftickets.ai. He basically taught me everything I know, and I've done my best to pass that knowledge on to dozens and dozens of people who reached out to me for help over the years.

It's funny. We all make the same mistakes early on. Wolftickets would tell me something about machine learning and I'd be like, no, that can't be right, I can do it better. Then two years later I'd be doing exactly what he told me to do.

## MMA-AI

I packaged the model into a Docker container with a local interface. The database and data are on Hugging Face, and the Docker container pulls them down. Go to the Predict tab and hit predict to see the next event, or whatever event you choose.

![MMA-AI event prediction output showing positive expected value picks for an upcoming UFC event.](assets/upcoming-picks.png)

Up until now I have been doing this manually by running Python scripts, so I had Codex whip this container and web app up so nontechnical folks can use it. If something is broken or not working, ask Claude Code or Codex to fix it and submit a PR. I'm not interested in turning this into some grand feature-complete package. It is just a simple way to use the model without any technical knowledge.

I added a small feature that lets you include an LLM API key so you can quickly query the database for data analytics too, but I'd suggest using Claude Code or Codex for more detailed analysis. I also included strong `AGENTS.md` and `README.md` files so they can understand the disorganized mess that is the code and database.

![MMA-AI analytics view showing an LLM-generated SQL analysis and chart of opponent-adjusted significant-striking accuracy.](assets/data-analytics-llm.png)

## Funny Mistakes

### 1. Model Ensembles

I started with a single XGBoost model. Wolftickets told me he uses a library called AutoGluon to create an array of models and that this is generally best practice. So I ignored that, thinking I could tune this thing myself better than some group of ivory tower academics trying to put training wheels on real engineering.

Two years later, I switched to AutoGluon. Turns out I'm not a better machine-learning engineer than a funded team of experts.

### 2. Hyperparameter Optimization

Wolftickets told me hyperparameter optimization was mostly a waste of time. I thought, no! That can't be true. All these tutorials on Medium talk about HPO as a basic skill. All these tutorials are high on Google search for "machine learning tutorials." They can't be dumb and terrible!

Incorrect. They were all dumb and terrible.

At one point, I was running Optuna to optimize my parameters and optimizing based on test accuracy. I ran it overnight, and my test accuracy just kept climbing higher and higher. 70%, 74%, 78%!! I chose the 78% test-accuracy hyperparameters. Then I got bombed on in the next two UFC events.

Then I learned what "overfitting" was.

Overfitting is the bane of my existence. It is what makes modeling a bit of an art as well as a science. You can't know how your model will perform in the future, but if you optimize wholly on the past, then you're just making a model that is really good at predicting past data and really bad at predicting future data.

Looking back through the old repo, the funniest part is that the code preserved the entire thought process. Some of the comments are basically "ChatGPT suggested this search space," along with a billion `.txt` files named things like `xgboost-age-1000-trial-min-noodds-top100-hpo.txt`, where I was copying and pasting results then flipping between the tabs to eyeball comparisons between features. Just a beautiful historical document of me standing in a burning room repeating, "maybe if I tweak this one last thing" 50 million times.

### 3. Shuffled Cross-Fold Validation

![Cross-validation explainer diagram showing shuffled folds with train and test splits.](assets/kfold.png)

This is in every tutorial. "You should always do shuffled cross-fold validation to ensure robust validity."

No you shouldn't. At least not for time-ordered data series like sports predictions.

The old code also had a stratified k-fold path that shuffled fights around before validation. UFC fights happen in time. This sounds too obvious to write down, but I apparently needed to learn it by losing money.

If you randomly split fights into folds, your model gets to learn patterns from future fights while validating on older fights. It might not be seeing the literal answer key for that one fight, but it is absolutely getting to know the future distribution of the sport. The fighter population changes. The meta changes. Judging changes. Training changes. Market behavior changes. COVID happened. New regional pipelines appear. Random heavyweight nonsense remains eternal, but almost everything else moves.

The corrected version is boring and obvious, but not actually easy: train on the past, validate on the future, test on the even further future.

## Other Ways To Shoot Yourself In The Fucking Face

### Data Leakage

This is probably the most common mistake people make. There are so many ways to leak training data into holdout test data that it's insane.

The most obvious is training your model on post-fight stats, not pre-fight stats. Many a Medium article has been written by some student who claims 80%+ accuracy in their sports model but trained on post-event stats. There are even a ton of academic papers that fall for this trap. But consider some of the more subtle and insidious ways.

### Train/Test Split

So many tutorials are not sports-specific and tell you to always randomize cross-fold validation data or do something else. Sports are more or less a time series of data. Will this fighter win given his previous data at this point in time?

I've tried it all: plain 80/20 train-test split, a random-sampled test set over the last few years while the rest is training data, walk-forward fold validation, many different kinds of cross-fold validation. There are so many ways to leak data and mess up the model with the train-test split.

I eventually settled on a super simple plain 85/15 train-test split. It doesn't overfit, it doesn't involve complex code that can subtly break, and it tests the model closest to the dates of the upcoming predicted fights.

### Filtering Data

A lot of people think more data equals better. That's mostly true, but quality of data is not talked about enough in ML tutorials.

We store something like 10,000+ fights in the database. But not all fights are the same. Women's fighting patterns are extremely different than men's. Fighters without much fight history end up poisoning the well when we try to do stuff like stat smoothing and training, since they're too sparse in data to find patterns in.

Maybe the single highest increase in accuracy I ever got was from removing women, removing fighters with fewer than two fights in the UFC, and setting the data cutoff date to 2014. This removes so much noise from the raw data that the model can really hone in on patterns that give it an edge.

Related to this is hidden features and data balance. In the UFC, the red corner wins about 60% of the time. On [ufcstats.com](http://ufcstats.com), the red-corner fighter is always listed first, so that is reflected in the database. `Fighter1` is always red corner, at least past about 2010. This leads to data imbalance. We're predicting 1 or 0. `1 = win`, `0 = lose`.

I did extensive testing around balancing the training data so `Fighter1` wins 50% of the time. At the conclusion of 100 hours of testing, it appeared that this was harmful to the model's backtested ROI and accuracy. I believe this is because the corner assignment is essentially a hidden feature that encodes multiple dimensions. It is gentle guidance into which fighter is generally favored to win, and it encodes geographic advantages, since the hometown fighter is usually red corner, among a complex array of other things.

I do not balance the corner assignments. `Fighter1` in the database is always red corner. I find it to be a valuable hidden feature.

### Calibration

One of the most controversial areas of sports prediction. I cannot tell you how many times people tell me I'm an idiot and if I'm not calibrating the model and calculating the +EV rather than just letting its win picks ride, then it's trash, garbage, shit and I should kill myself.

From my experience, this is what raw model output tends to look like:

![Sports betting odds versus machine-learning probability distributions, showing model probabilities clustering near the middle.](assets/ml-clustering.png)

So the answer is calibration, right? Isotonic and Platt scaling are the two most common ML calibration methods, but whenever I used those, backtest profit fell and accuracy generally stayed the same or fell. I tweaked, modified, and adjusted, but it is just so hard to calibrate these things right. The main reason is the lack of data. You really want 10,000+ rows of data to get these calibration algorithms to work right. We don't have that. We have about 4,000 fights that pass the barriers: at least two fights in the UFC, men only, etc.

I have never been happy or felt confident that there was a good way to employ calibration and test it in a way that didn't leave me wondering if I was just overfitting the data. So I don't calibrate anymore. If I run hundreds of tests and I'm still not confident in the outcome, then it's time to simplify and go back to what I know isn't going to break the model. It is super easy to overfit the model via calibration.

### Odds

Closing lines are incredibly informative, but they may contain information unavailable when you claim to be making the prediction. If you say "I predict fights a week out" but train on closing odds, you're letting the market digest late injuries, weigh-in misses, camp rumors, and public betting pressure for you.

This gets especially dangerous because closing odds really are predictive. Your metrics will often improve. Your logloss might look prettier. Your calibration might look less embarrassing. And then you have to ask the horrible adult question: did I actually build a better fight model, or did I just build a model that copies the market after the market has already finished updating?

I have gone back and forth a thousand times and run a thousand tests to figure out if I should include the odds in the model. I settled on no. Generally, including the odds lowered the backtested ROI, but I don't think the case is closed.

### Computing Global Statistics Before Splitting

Say you normalize reach, pace, takedown defense, or strike volume using the mean and standard deviation of the whole dataset. Congratulations: your 2014 model now knows the distribution of 2023 fighters. That sounds tiny, but ML models are really good at identifying patterns. What seems like a tiny leak can end up annihilating your real-world accuracy.

The same applies to imputers, scalers, feature selectors, target encoders, anything that computes a global statistic. If it saw the future before the split, assume it leaked.

The corrected version is not complicated. Fit the scaler on training data only. Apply it to validation, test, and prediction data later. The annoying part is remembering that every tiny preprocessing step is part of the model.

### Training Dataset vs. Prediction Dataset

You can have a perfect non-leaking training dataset but completely fuck your model by not preparing the future prediction dataset correctly.

The training path and prediction path have to agree on everything. Same features, same order, same scaler, same missing-value logic, same fighter mapping, same date handling, same definition of "known before the fight."

The worst bugs here do not crash. They just quietly make the model dumber. You still get predictions. They still look official. The CSV still writes. The dashboard still renders. But somewhere inside the pipeline, the future row is not the same species as the training row.

This is why the release stores `feats.txt`, keeps scalers with the model artifacts, and routes manual matchups through the same `predict.py` and `InferenceDataBuilder` path as event predictions. Boring consistency is the entire game here.

### Feature Selection

This is probably the hardest part of modeling. There is no good automated way of doing this, and don't just link me to some GitHub library that claims to do it. I've tested them all.

This is where domain knowledge is really valuable. Your feature-importance measurements may say 10 age-related stats are the 10 most predictive, so you should totally just keep all of them, right? Wrong. Understand exactly what is being measured and make sure every feature you include in your model has as little overlap as possible with other features. You want comprehensive fight measurements, not a model with 10 features measuring the same thing.

Feature selection is so hard because if you train your model 10 times, you'll get 10 different feature-importance measurements, sometimes wildly different. Too many features equals overfitting. Too few features equals impaired model.

## Can't You Sell This To A Sportsbook Or Sell The Predictions For Millions?

Dear Bellagio, if you would like to buy an updated and improved version of this model for a milly, you can contact my people.

Honestly, it doesn't appear so. My guess is that sportsbooks don't really care that much about accurate odds. They have their own machine-learning modelers and techniques to set the opening line, but they make money whether the odds are accurate or not because the line moves with the money coming in. If they set a very inaccurate opening line, it quickly self-corrects as sharps place their money on the value side. They're not constantly updating the lines manually.

As for selling the predictions, my model is more accurate than 99.9% of the UFC pickers over the long term. The pickers who make the most money are not the best predictors. They're the best marketers, and marketing is not fun, so I don't do it.

I also just don't think the market for purchasing sports picks is really that big. Average bettors are betting because it's fun to have skin in the game, not to make millions of dollars. People are hard-pressed to spend even $5 for extremely high-quality picks because just copying someone else's picks takes the fun out of testing your own knowledge. At least, that's what I think. I'm probably wrong and am leaving very large amounts of money on the table by releasing this.

Anyway, it seems to me like your best path to riches if you have a successful model is to just bet with it yourself.

The unsexy answer is that prediction quality and business quality are different skills. I like building the model. I like teasing out subtle patterns that predict the future. I like twisting and turning the infinite knobs in an ML model's code. I do not like becoming a TikTok/Discord pick salesman.

## How Much Did You Make?

All told, I wagered about $150,000 over the years, and yes IRS, I reported all of it. My gross profit was about $10,000. If we then factor in taxes, compute costs, and a myriad of other costs involved in running the site and whatnot, I'd estimate my net profit to be like -$2k. This isn't even factoring in the cost of my computers. I murdered a bunch of laptops by training my model day and night over the years. The heavy computations just burn these things to the ground.

![DraftKings lifetime stat sheet showing $51,277 spent, $56,807 won, and $5,530 net total.](assets/draftkings.png)

This was all my betting up to the end of 2024. After 2024 I got off DraftKings and used a bunch of fractured markets, so I don't have a nice clean chart like this for post-2024.

![BetMMA.tips progress chart showing MMA handicapper performance over time.](assets/betmmatips.png)

I used to third-party track its accuracy on [betmma.tips](https://www.betmma.tips/flyingtriangl3), but that site is a PITA and takes forever, so I stopped. You can see all my pre-event timestamped picks on my free [Patreon](https://www.patreon.com/mmaai) if you want to verify, or just ask the community over there in the chat.

You know what really pissed me off though? A couple years ago I signed up for Google Cloud's AutoML solution with Vertex AI, thinking surely if I give Google ML engineers my data, the model Vertex AI spits out will be better than my handcrafted locally trained ones.

Nope. Not only did it perform worse in both metrics and the real world, training one model cost me hundreds and hundreds of dollars and took literally 10x as long. Vertex AI lulls you in with like $300 of free credit, which I assumed would last me months. Bill came in because I'm lazy and didn't set spending caps: $1,500 for just a few model runs and data hosting. Vertex AI, you suck and I fucking hate you.

There is probably a lesson here about expected value including your own time, hardware, cloud bills, and sanity, but I refuse to learn it.

## What About LLMs?

I highly suspect that LLMs will be better predictors of sports than the best traditional ML models in maybe two years. I have done much analysis on this.

Currently, LLMs have a very hard time making accurate confidence scores for future events. They're actually fairly accurate at predicting who will win, but they are not accurate at giving you the percent chance that a given player, team, or fighter will win, which is how you actually make money in sports betting. You need to know which picks are positive expected value, not just who wins. If you bet on the odds favorite for the last 10 years, you'd be at about -10% profit. The key is accurately calibrated win-percentage chance, which you then compare to the odds.

That being said, I haven't written a line of code myself for about three years. On my wedding day I was copying and pasting out of ChatGPT 3.5 as I refactored the codebase for the third time. ChatGPT-4 was a real step up. I used it to brainstorm my most important feature: `adjperf`. I had ChatGPT brainstorm creative feature engineering that would most help me, and in a list of 10 ideas, z-score normalization, what I label as `_adjperf`, popped out. Then ChatGPT wrote all the code for me.

It wasn't until ChatGPT-4o came out that it was really able to handle the extremely complex SQL necessary for generating these features accurately. ChatGPT-4o was also the key to unlocking the Bayesian smoothing that has been extremely helpful in improving performance. It took a lot of back and forth and dead ends before me and ol' buckethead settled on Poisson-gamma and beta-binomial smoothing, and it was a lot of debugging, but worth it. It would've taken me years of statistics studying to find those concepts myself without LLM help.

## Why Are You Doing This And What Now?

Because I love open source and knowledge sharing. Also, after five years of absolutely crazy amounts of time spent learning and building, I think I've mostly reached the point of diminishing returns.

My interest over the last few years has been completely dominated by agents. I love agents. I love agent harnesses and orchestrators. I want to spend 23 hours a day building agent stuff, but what was the point of 15,000+ hours spent on this project if I'm going to let it die on my laptop?

I'm hoping people can use the database and do really cool data analytics on UFC. My dream is that one day one of the UFC announcers will emit the words "Dan McInerney," "MMA-AI," or just cite some analytics that came from my database.

One other thing I always wanted that never happened: for a coach or trainer to use my database of stats and analytics to help inform the training or fight selection of a UFC fighter. There are massive piles of gold in the hills of machine learning and data analytics for UFC fighters to optimize their strategy and training. You can see exactly how your historical skills match up against arbitrary opponents using the SHAP charts:

![Radar chart comparing individual stats for Belal Muhammad and Gabriel Bonfim.](assets/individual-stats-belal-gabriel.png)

So if any of you fighters want a leg up, reach out and I'll see what I can do for you.

Thank you to all my [Patreon supporters](https://www.patreon.com/mmaai). Patreon was mostly free before, but now I'm making it so that no predictions will ever be locked behind a paywall. The community in the chat is incredible. It's probably the densest concentration of UFC modelers anywhere on the internet.

I've learned so much from guys like MMAPICKS, Jordan Cairns, r3n3gad3, Ben, Moira, Ticonderoga, PizzaPizza, Kevin, Neil, Grant, KD, and many more who I'm forgetting. I hope I helped some of you along the way as well.

Go check out [clankerfights.ai](https://clankerfights.ai), a recent project Marcello and I did that pits various unhinged LLMs against each other in an array of games that you can watch, bet on, and influence by paying fake coins to send messages to the room of agents.

## Sitemap

- [Sitemap](https://mcinerney.ai/sitemap.md)