PDA

View Full Version : Statistics Question: Determining Optimal Sample Size



PCP Poker
01-26-2009, 08:37 PM
I have always datamined a level for a month or two before moving up to that level to determine characteristics of the most successful players at that level. I have recently discovered the power that HEM can add to this process. I love the ranges and median values provided on the players tab and I find this information invaluable.

I have data on around 100k players at my next chosen level. I need to determine what the best sample size would be for seeing the most effective playing styles at this level (i.e. the best vpip and pfr ranges, cbet %, etc). Obviously the smaller number of hands played by a player, the less reliable the information will be, and the higher I make that number of hands, the less players I will have to sample from. I need a mathematical way to determing the optimal sample size to balance quantity with quality of information. I have data available in the following amounts:

100 hands played ~ 30,000 players
500 hands played ~ 10,000 players
1,000 hands played ~ 5,500 players
5,000 hands played ~ 1,000 players
10,000 hands played ~ 340 players
15,000 hands played ~ 150 players

Any ideas on the optimal sample size to compensate for variance? I'm sure there's a formula on how to exactly determine this, but it's been 10 years since my college stats class that I slept through anyways...

TierTier
01-27-2009, 02:40 PM
What your asking for isn't answerable with statistics.

There is no optimal singular Vpip or any other stat because it is influenced by the other statistics. A simple example; we would expect a winning tight player to get to showdown more often then a LAG who likes to splash around and see flops. How would you determine the optimal Vpip and WTSD based on these two players? You can't because the stats affect one another. Now, add in a dozen more stats and an entire continuum of playing styles that are (maybe) adjusting to one another. Not to mention that the best style is going to be a contrarian one and if you're playing a high enough stake (small number of players) your addition into the mix is going to throw it off anyway.

It's possible that out in math land there's an optimal set of all the available statistics, but, you would likely not have enough data after a lifetime of datamining to make the comparison.

PCP Poker
01-27-2009, 08:58 PM
I'm not looking for the optimal VPIP, Flop C Bet%, Etc. I'm looking for the optimal sample size to provide the most relevant data. Then I can filter data I have based on that sample size and look at the "characteristics" of the most successful players. Believe me, I understand how VPIP etc. can affect all of the other stats. I'm not looking for a mathematically perfect strategy. What I am looking for are the things that make someone successful at a specific level. I analyze each successful player on their own merits and don't believe in grouping or rating players.

All I need to know is what the best sample size would be with the data I have available to achieve the best data.

Orthoguy
01-28-2009, 12:21 AM
Check out this link http://www.surveysystem.com/sscalc.htm

It requires a little bit of study, but doesn't seem too difficult to figure out.

Good luck, Orthoguy

PCP Poker
01-28-2009, 12:42 AM
Thanks.

I actually already discovered that site. Their calculations only work for binomials. They were designed for a survey with simple yes/no answers.

I think I'm actually going to have to write a perl script to figure this out. I'm working on one that uses uuDevil's method for calculating the confidence level. If you're not familiar with that, check out this website (http://www.castrovalva.com/~la/winlose.htm)

TierTier
01-28-2009, 11:08 PM
I'm not looking for the optimal VPIP, Flop C Bet%, Etc. I'm looking for the optimal sample size to provide the most relevant data. Then I can filter data I have based on that sample size and look at the "characteristics" of the most successful players. Believe me, I understand how VPIP etc. can affect all of the other stats. I'm not looking for a mathematically perfect strategy. What I am looking for are the things that make someone successful at a specific level. I analyze each successful player on their own merits and don't believe in grouping or rating players.

All I need to know is what the best sample size would be with the data I have available to achieve the best data.

Assuming you've done this for at least two levels before, what meaningful results have you come to from the sort of analysis you're describing?

For instance, have you been able to say something like, "Winning players at NL50 cbet more often than winning players at NL25?"

What separates a winning player at NL50 versus a winning player at NL25 isn't going to be well represented by statistics because the difference is in the approach to the game. The stats are the effect, not the cause, and simply attempting to emulate them isn't going to do anything unless you presuppose that what your emulating is an optimal strategy. That was the point of my first post.

PCP Poker
01-28-2009, 11:50 PM
I don't try to be anywhere as specific as in your example. And I don't compare play in between levels. What I do look at is the most successful players' styles at this level. For example, I might find a good winning LAG player, and since LAG is a style I feel uncomfortable playing, I analyze his play to see what makes it work for him. I'll look at the situations he gets himself into (i.e. 3 betting, playing oop on the flop, etc) and see how he handles those situations.
I agree whole-heartedly that emulating someone's stats won't make you play like them. I never set out to accomplish anything like this. The only reason I look at their stats is to try to deduce their holdings. Since the only way we have known hole cards on a villain is if they show it down or show the hand, we can't rely on "known hole cards" data because it is so skewed. We can, however, make educated guesses based upon their stats. Isn't that the whole reason why HUD's have become so popular?
FWIW I've been a long term winner (500k+ hands at limits up to 3/6) in Limit Holdem, but I've only recently made the switch to NL. I'm starting out at NL25 and building my bankroll the right way, by learning the game and not getting too far ahead of myself. In limit it was much easier to get a feel for what makes a successful player at a given limit. For example, at 3/6, players with the highest turn and river aggression % were often the highest winners. But there are so many variables in NL that are making this type of analysis tricky. All I can do is hope to learn something about the game along the way....

TierTier
01-29-2009, 12:12 AM
Alright, sounds like what you're doing is helpful, I'm just being an academic nitpick. =0

Along the lines of your original question, I wouldn't put much stock on an given player's data without 20k hands or more. Realistically, winning players can and do have 50k+ losing streaks and the inverse also holds (although, I would imagine is not nearly as common..)

So, what you're looking to do is a two step process...

(1) Determine if given player has a true win rate that is positive
(2) Analyze how they handle themselves in situations where you have a statistically significant sample size

Finishing (1) doesn't necessarily qualify the second part of (2) for all possible scenarios. For instance, a bet OOP on turn against missed cbet statistic might very well be meaningful at 2k hands where triple barrelling would not be.

PCP Poker
01-29-2009, 12:40 AM
I've basically committed myself to finding each player's confidence level using the uuDevil tool I referenced earlier. I will only analyze players with at least a 95% confidence level. For some players, this only requires 6-7 K hands, and for others, 20 K. Either way, I'm learning a lot so far about this crazy game.

PCP Poker
01-29-2009, 11:06 PM
To anyone that cares about this stuff...

I wrote a script to calculate the confidence levels for every player I have in my db of 25NL 6 max. The results were rather astonishing to me, and I suspect to you as well.

I analyzed 7,342,678 hands covering 101,625 players.

Most hands by any player: 104,881
Highest winnings by any player: $1325.25 (9.06 BB/100)
Highest losses: -$745.60 (oddly same player with the most hands)
Number of losing players with over 95% confidence: 526
Number of winning players with over 95% confidence: 8
Min hands for a player with 95% confidence: 6,797
Mean hands for 95% confidence players: 10,227

Also, what really surprised me is that all of the winners play a TAG style. Apparently TAG is still king if played right.

Orthoguy
01-29-2009, 11:58 PM
It doesn't sound right to me that out of >101K players only 8 could be tagged as winners with 95% confidence. I am probably misunderstanding what you mean by 95% confidence, or maybe the number of players you are looking at. Still it seems awfully low.

Anyhow, keep us posted on what you discover.

Good luck!

TierTier
01-30-2009, 07:39 AM
Min hands for a player with 95% confidence: 6,797


What standard error and bb/100 are you calculating for this guy?

PCP Poker
01-31-2009, 04:47 PM
@ Orthoguy

Confidence Level tells you how sure you can be whether or not you are a winner. A 95% confidence level means that you can be 95% sure that you will be a winner long-term. It does not however mean that you will maintain your same winrate. It is calculated by the formula c= Φ(un1/2/σ), where c is confidence level, Φ() is the standard cumulative normal distribution function, u is the winrate (bb/100 hands), n is number of hands (divided by 100 to normalize it), and σ is standard deviation (in bb/100). Most players in small stakes NL have an exceedingly high standard deviation, and this is even more pronounced in shorthanded games. This fact, combined with the difficulty of overcoming the rake at these levels, adds up to very few true winners.

@TierTier

That player is a freak of nature, or he is on a very long heater. His winrate is over 25 bb/100, and his sd is under 70. This winrate is obviously not sustainable, but it's obvious he's doing something right.

SwizzleStack
01-31-2009, 11:13 PM
I am wondering if part of the reason for not having a 95% confidence level on any non-TAG player has to with with higher variance associated with the style of play itself. I would assume it would take way more hands to reach a 95% CI for someone who is playing and splashing around in lots of pots. Just a guess.

PCP Poker
02-01-2009, 01:54 AM
Yea, that's exactly right. In my db there are lots of LAGs that have very high winrates, but their SD is also very high as well. There are also many nits with a very low SD, but their winrate is also low. At least at these small stakes, it seems that TAGs are the only ones able to achieve a balance between winrate and variance. I'm sure at higher levels, as postflop skill increases and the effect of rake decreases, that there are many more winning LAGs or other styles.

TierTier
02-07-2009, 01:47 PM
@TierTier

That player is a freak of nature, or he is on a very long heater. His winrate is over 25 bb/100, and his sd is under 70. This winrate is obviously not sustainable, but it's obvious he's doing something right.

It would actually be more likely that he is NOT doing something right and has simply put himself into high volatility, probably -E(x) situations, and ended up with the best of it.

Same reason why many of the biggest stacks come the middle of the tournament are not the best players. Enough of them pursued a high volatility strategy and we simply expect a few to get paid off.

PCP Poker
02-07-2009, 10:41 PM
I've actually examined this guy a lot since then. He is indeed a very tricky player. Might be a pro slumming low stakes making some strategy videos or something of that sorts. Or maybe he's just a donkish son of a bitch that happens to mimic the style of a pro-type player. Either way, he's been running almost on par with his all-in e.v., so I don't think luck is much of a factor either way. And he's now up past 13 k hands with still around 23 bb/100. SD is down to 61 as well.