Statistics Question: Determining Optimal Sample Size
Page 1 of 2 12 LastLast
Results 1 to 10 of 17
  1. #1
    Senior Member
    Join Date
    Jan 2009
    Posts
    103

    Default Statistics Question: Determining Optimal Sample Size

    I have always datamined a level for a month or two before moving up to that level to determine characteristics of the most successful players at that level. I have recently discovered the power that HEM can add to this process. I love the ranges and median values provided on the players tab and I find this information invaluable.

    I have data on around 100k players at my next chosen level. I need to determine what the best sample size would be for seeing the most effective playing styles at this level (i.e. the best vpip and pfr ranges, cbet %, etc). Obviously the smaller number of hands played by a player, the less reliable the information will be, and the higher I make that number of hands, the less players I will have to sample from. I need a mathematical way to determing the optimal sample size to balance quantity with quality of information. I have data available in the following amounts:

    100 hands played ~ 30,000 players
    500 hands played ~ 10,000 players
    1,000 hands played ~ 5,500 players
    5,000 hands played ~ 1,000 players
    10,000 hands played ~ 340 players
    15,000 hands played ~ 150 players

    Any ideas on the optimal sample size to compensate for variance? I'm sure there's a formula on how to exactly determine this, but it's been 10 years since my college stats class that I slept through anyways...

  2. #2
    Member
    Join Date
    Nov 2008
    Posts
    83

    Default

    What your asking for isn't answerable with statistics.

    There is no optimal singular Vpip or any other stat because it is influenced by the other statistics. A simple example; we would expect a winning tight player to get to showdown more often then a LAG who likes to splash around and see flops. How would you determine the optimal Vpip and WTSD based on these two players? You can't because the stats affect one another. Now, add in a dozen more stats and an entire continuum of playing styles that are (maybe) adjusting to one another. Not to mention that the best style is going to be a contrarian one and if you're playing a high enough stake (small number of players) your addition into the mix is going to throw it off anyway.

    It's possible that out in math land there's an optimal set of all the available statistics, but, you would likely not have enough data after a lifetime of datamining to make the comparison.

  3. #3
    Senior Member
    Join Date
    Jan 2009
    Posts
    103

    Default

    I'm not looking for the optimal VPIP, Flop C Bet%, Etc. I'm looking for the optimal sample size to provide the most relevant data. Then I can filter data I have based on that sample size and look at the "characteristics" of the most successful players. Believe me, I understand how VPIP etc. can affect all of the other stats. I'm not looking for a mathematically perfect strategy. What I am looking for are the things that make someone successful at a specific level. I analyze each successful player on their own merits and don't believe in grouping or rating players.

    All I need to know is what the best sample size would be with the data I have available to achieve the best data.

  4. #4
    Junior Member
    Join Date
    Jan 2009
    Location
    West Sacramento, CA
    Posts
    7

    Default Sample Size Calculator

    Check out this link http://www.surveysystem.com/sscalc.htm

    It requires a little bit of study, but doesn't seem too difficult to figure out.

    Good luck, Orthoguy

  5. #5
    Senior Member
    Join Date
    Jan 2009
    Posts
    103

    Default

    Thanks.

    I actually already discovered that site. Their calculations only work for binomials. They were designed for a survey with simple yes/no answers.

    I think I'm actually going to have to write a perl script to figure this out. I'm working on one that uses uuDevil's method for calculating the confidence level. If you're not familiar with that, check out this website

  6. #6
    Member
    Join Date
    Nov 2008
    Posts
    83

    Default

    Quote Originally Posted by PCP Poker View Post
    I'm not looking for the optimal VPIP, Flop C Bet%, Etc. I'm looking for the optimal sample size to provide the most relevant data. Then I can filter data I have based on that sample size and look at the "characteristics" of the most successful players. Believe me, I understand how VPIP etc. can affect all of the other stats. I'm not looking for a mathematically perfect strategy. What I am looking for are the things that make someone successful at a specific level. I analyze each successful player on their own merits and don't believe in grouping or rating players.

    All I need to know is what the best sample size would be with the data I have available to achieve the best data.
    Assuming you've done this for at least two levels before, what meaningful results have you come to from the sort of analysis you're describing?

    For instance, have you been able to say something like, "Winning players at NL50 cbet more often than winning players at NL25?"

    What separates a winning player at NL50 versus a winning player at NL25 isn't going to be well represented by statistics because the difference is in the approach to the game. The stats are the effect, not the cause, and simply attempting to emulate them isn't going to do anything unless you presuppose that what your emulating is an optimal strategy. That was the point of my first post.

  7. #7
    Senior Member
    Join Date
    Jan 2009
    Posts
    103

    Default

    I don't try to be anywhere as specific as in your example. And I don't compare play in between levels. What I do look at is the most successful players' styles at this level. For example, I might find a good winning LAG player, and since LAG is a style I feel uncomfortable playing, I analyze his play to see what makes it work for him. I'll look at the situations he gets himself into (i.e. 3 betting, playing oop on the flop, etc) and see how he handles those situations.
    I agree whole-heartedly that emulating someone's stats won't make you play like them. I never set out to accomplish anything like this. The only reason I look at their stats is to try to deduce their holdings. Since the only way we have known hole cards on a villain is if they show it down or show the hand, we can't rely on "known hole cards" data because it is so skewed. We can, however, make educated guesses based upon their stats. Isn't that the whole reason why HUD's have become so popular?
    FWIW I've been a long term winner (500k+ hands at limits up to 3/6) in Limit Holdem, but I've only recently made the switch to NL. I'm starting out at NL25 and building my bankroll the right way, by learning the game and not getting too far ahead of myself. In limit it was much easier to get a feel for what makes a successful player at a given limit. For example, at 3/6, players with the highest turn and river aggression % were often the highest winners. But there are so many variables in NL that are making this type of analysis tricky. All I can do is hope to learn something about the game along the way....

  8. #8
    Member
    Join Date
    Nov 2008
    Posts
    83

    Default

    Alright, sounds like what you're doing is helpful, I'm just being an academic nitpick. =0

    Along the lines of your original question, I wouldn't put much stock on an given player's data without 20k hands or more. Realistically, winning players can and do have 50k+ losing streaks and the inverse also holds (although, I would imagine is not nearly as common..)

    So, what you're looking to do is a two step process...

    (1) Determine if given player has a true win rate that is positive
    (2) Analyze how they handle themselves in situations where you have a statistically significant sample size

    Finishing (1) doesn't necessarily qualify the second part of (2) for all possible scenarios. For instance, a bet OOP on turn against missed cbet statistic might very well be meaningful at 2k hands where triple barrelling would not be.

  9. #9
    Senior Member
    Join Date
    Jan 2009
    Posts
    103

    Default

    I've basically committed myself to finding each player's confidence level using the uuDevil tool I referenced earlier. I will only analyze players with at least a 95% confidence level. For some players, this only requires 6-7 K hands, and for others, 20 K. Either way, I'm learning a lot so far about this crazy game.

  10. #10
    Senior Member
    Join Date
    Jan 2009
    Posts
    103

    Default

    To anyone that cares about this stuff...

    I wrote a script to calculate the confidence levels for every player I have in my db of 25NL 6 max. The results were rather astonishing to me, and I suspect to you as well.

    I analyzed 7,342,678 hands covering 101,625 players.

    Most hands by any player: 104,881
    Highest winnings by any player: $1325.25 (9.06 BB/100)
    Highest losses: -$745.60 (oddly same player with the most hands)
    Number of losing players with over 95% confidence: 526
    Number of winning players with over 95% confidence: 8
    Min hands for a player with 95% confidence: 6,797
    Mean hands for 95% confidence players: 10,227

    Also, what really surprised me is that all of the winners play a TAG style. Apparently TAG is still king if played right.

Similar Threads

  1. Question about first PFR size
    By MindControl in forum Manager General
    Replies: 4
    Last Post: 07-01-2009, 07:23 AM
  2. Useful sample size?
    By Orthoguy in forum Manager General
    Replies: 1
    Last Post: 01-21-2009, 03:08 PM
  3. Sample size of stats in HUD
    By ash0803 in forum Manager General
    Replies: 1
    Last Post: 12-18-2008, 01:20 PM
  4. can i hide the hud if sample size is small?
    By yummybuffet in forum Manager General
    Replies: 15
    Last Post: 10-15-2008, 11:57 PM
  5. Adjust color for sample size
    By Demeis in forum Manager General
    Replies: 1
    Last Post: 08-03-2008, 11:53 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •