OpenAI Wins Poker Battle of LLMs Over Small Sample

Robots playing poker

A theoretical poker contest pitting several large language models (LLMs) against each other in a cash game wrapped up last Monday. When the dust settled after 3,799 no-limit hold’em hands, OpenAI (creators of popular generative AI ChatGPT) had narrowly taken the PokerBattle AI crown over Claude Sonnet 4.5 and Elon Musk’s Grok.

Llama 4, created by Meta (Facebook/Instagram), produced the worst results by far. It burned through its entire $100,000 bankroll before the contest ended. Blinds for the theoretical poker game were $10-$20, so Llama lost 50 buy-ins across a rather short sample.

PokerScout reached out to PokerBattle AI creator Max Pavlov with a series of questions regarding the experiment. He had not provided a response as of Thursday afternoon.

PokerBattle AI Full Results

Here’s a full look at how each LLM finished in PokerBattle AI.

PlaceLLMResultHands Played
1OpenAI o3+$36,6913,799
2Claude Sonnet 4.5+$33,6413,799
3Grok+$28,7963,799
4DeepSeek R1+$18,4163,799
5Gemini 2.5 Pro+$14,6553,799
6Mistral Magistral+$3,2813,799
7Kimi K2-$14,3703,799
8Z.AI GLM 4.6-$21,5103,799
9Meta Llama 4-$100,0003,501

The massive loss by Llama allowed six of the nine bots in the game to profit.

It’s worth noting that the AIs were allowed to adjust to their opponents’ games. They were allowed to take notes and given game stats such as each player’s VPIP (rate of voluntarily putting money in the pot). Thus, with Llama effectively acting as the game’s “whale,” the results may reflect which LLM most efficiently exploited it, rather than which was playing the closest to a theoretically optimal strategy.

It’s also worth keeping in mind that 3,799 hands is not even close to a significant sample size in poker. LLMs play poker very slowly, explaining their entire “thought process” in words instead of just spitting out an action.

A human player multi-tabling could get through that many hands in as little as one day. Results over a sample size of just a few thousand hands will exhibit a high degree of variance and may not reflect long-term performance.

When PokerScout queried several LLMs to test out how they approached a poker hand, a paid subscription to OpenAI’s ChatGPT did seem to provide the most coherent advice. In that sense, at least, the PokerBattle AI results seem to fit with first-hand, subjective evaluation.

Still, the quality of ChatGPT’s strategic reasoning wasn’t all that strong. Poker players needn’t fear a sudden influx of ChatGPT-powered opponents in their games. These LLMs remain a long way from producing poker strategy that threatens safe poker sites in the same manner as other real-time assistance (RTA) tools.

Published
Categorized as News, Poker
Deputy Editor

Mo has been reporting on the poker industry since 2013, excepting a foray into the sports betting space from 2021-2025. He's a regular in live tournaments and cash games at buy-in levels around $400-$2,000.