PDA

View Full Version : Tips for Importing Large Databases?



cyberrico
11-22-2009, 02:21 PM
I have a brand new computer with 8GB RAM, a solid state drive and a fast processor on Windows 7 64 bit. I am installing everything from scratch.

I have 5 million hands to import. I have read various threads about changing values in the config file in postgresql, UAC off, firewall off, run as admin, importing small amounts at a time, etc etc etc but I can't get faster speeds than 7 hands per second. I know for a fact that this is very slow. My old computer imported this database in 4 days and on this new computer it will take me a month.

I am willing to start from scratch especially since I just installed it all.

I will be running a tight ship on this computer. I will backup regularly by making regular disk images so that I don't have to do this again but eventually it will be inevitable. In 2-3 years I will have to buy a new system and I will be doing this all over again. I will be obtaining large amounts of data again as I move up so something has to give.

My BB/100 is zero while this is happening. Please help.

fozzy71
11-22-2009, 02:46 PM
You are only getting 7 h/s when using the Import From Folder option? Laptop or desktop? Please do not use the Auto Import for bulk imports. It is significantly slower than the bulk import methods.

My 2 month old computer (Ci7 920, 9gb RAM) gets upwards of a 100 h/s when importing to a new DB.

Under Windows 7 you must set \holdem manager\dbcontrolpanel.exe, holdemmanager.exe and hmhud.exe to be started as administrator. Run the windows explorer and navigate to your Holdemmanager installation directory. Right click the executable's and select "Run as admin" from the compatibility tab.

Which Firewall, and Anti-Virus are you using?

cyberrico
11-22-2009, 03:43 PM
You are only getting 7 h/s when using the Import From Folder option? Laptop or desktop? Please do not use the Auto Import for bulk imports. It is significantly slower than the bulk import methods.

My 2 month old computer (Ci7 920, 9gb RAM) gets upwards of a 100 h/s when importing to a new DB.

Under Windows 7 you must set \holdem manager\dbcontrolpanel.exe, holdemmanager.exe and hmhud.exe to be started as administrator. Run the windows explorer and navigate to your Holdemmanager installation directory. Right click the executable's and select "Run as admin" from the compatibility tab.

Which Firewall, and Anti-Virus are you using?

Admin is set properly on all of the executables. Are there postgres exe's that benefit from this too?

AV was AVG. I uninstalled it and rebooted. No change.

I run import from folder and point it to a folder that is also on the solid state drive as are all of my apps in this environment for now. I've also tried importing files one at a time. Same thing.

I made the alterations to postgres as recommended in 2+2:


1.) Find your postgresql.conf file and open in notepad. Mine was in C:\Program Files\PostgreSQL\8.2\data

2.) Find the 'shared_buffers' value. The default value is comically low, but a good value is 1/4 of your total memory. I set mine to 768MB as I have 4GB of RAM (of which a little more than 3GB actually can be used in non 64-bit systems).

3.) Find the 'effective_cache_size' value. The default for this is also very low, but can be set to upward of 1/2 of your total memory. I set mine to 1536MB.

4.) Find the 'redirect_stderr' value and change it to 'off', unless you have a specific reason for looking at log files. If you're like me, the HEM queries result it tons of log data being written out. I had 10MB being written out every 20 minutes. Not only does this eat up tons of disk space, but this will cause your system to be wasting time writing to disk for logs you'll likely never look at.

5.) Restart either your computer or the PostgreSQL process from Start -> Control Panel -> Administrative Tools -> Services.

Every time I do this I get a "cannot connect" error in HEM and have to reinstall postgres. I made my adjustments according to the percentage of RAM as suggested by the author of that post and as I type this I am using his exact values despite the fact that I have twice his RAM.

Rebooting and trying again.

cyberrico
11-22-2009, 04:00 PM
No change. It started off at 34/s, is down to 23 after 16K hands (16 files) and is steadily declining. It says 2.5 days which sucks but I would suffer through, but I am sure it will degrade to 7/s and will take close to a week.

_Loki_
11-22-2009, 05:09 PM
Hi cyberrico & fozzy - I hope it is OK to jump in...

cyberrico thank you for posting those 2+2 tips here

I've just tried those alterations to postgres as shown in your 2+2 quote for postgresql 8.2 (BTW I'm on 8.4)

BUT...


4.) Find the 'redirect_stderr' value and change it to 'off'

I couldn't see a value called "redirect_stderr" & I wonder if this value doesn't exist in 8.4 ?

INSTEAD...
In the ERROR REPORTING AND LOGGING section I have previously changed the value for "logging_collector" from on to off as suggested to me by Fozzy:



#------------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#------------------------------------------------------------------------------
# - Where to Log -
log_destination = 'stderr' # Valid values are combinations of
# stderr, csvlog, syslog and eventlog,
# depending on platform. csvlog
# requires logging_collector to be on.
# This is used when logging to stderr:
logging_collector = off # Enable capturing of stderr and csvlog

Is this the equivalent step ?




Every time I do this I get a "cannot connect" error in HEM and have to reinstall postgres

I wonder why you have to reinstall postgres ?
I made sure my HM & postgre were not running.
I opened the config file in notepad
I changed the memory values
Saved them
Closed notepad
Rebooted my laptop & all is well - no reinstall of postgres needed

*** NOW THE BELOW POINT IS PURE SPECULATION - I HAVE NO PRACTICAL KNOWLEDGE TO BACK IT UP:

I have been researching SSD's with a view to buying one to make my laptop more rugged. I've decided to not bother because I want to wait until SSD's are proven absolutely...

I posted here recently that I've read that some current SSD's do not like being written to in huge batches - the performance drops very very quickly

The reason (I think) is to do with the firmware on some current SSD's & conventional defragging optimised for normal HDD's can make this situation worse

I'm not sure this helps you

cyberrico
11-22-2009, 05:18 PM
Loki,

I did the same thing for logging. I'm pretty sure it's correct. I followed the same steps you did. For whatever reason my system doesn't seem to like the larger values so I changed them to what the author of that post did which is a big difference from the default and there was no change for me at all.

fozzy71
11-22-2009, 06:42 PM
An inexpensive, possible solution, might be to get a new hard drive to use specifically for your Postgresql\data folder.

_Loki_
11-22-2009, 07:22 PM
I couldn't see a value called "redirect_stderr" & I wonder if this value doesn't exist in 8.4 ?

Cyberrico are you on 8.2 ?

cyberrico
11-22-2009, 07:28 PM
I couldn't see a value called "redirect_stderr" & I wonder if this value doesn't exist in 8.4 ?

Cyberrico are you on 8.2 ?

No I'm 8.4 and you're right it's not there and the only similar entry is already off.

cyberrico
11-22-2009, 07:31 PM
An inexpensive, possible solution, might be to get a new hard drive to use specifically for your Postgresql\data folder.

I have plenty of drives. Should it be on its own physical drive or will a partition work? I have two issues with this though. 1) I wanted to take advantage of the speed of my SSD (not that it's so amazing for the DB so far) and 2) backing up a disk image would require backing two drives up in tandem and I have to make sure that both backups are always grouped together.

fozzy71
11-22-2009, 08:04 PM
I have plenty of drives. Should it be on its own physical drive or will a partition work? I have two issues with this though. 1) I wanted to take advantage of the speed of my SSD (not that it's so amazing for the DB so far) and 2) backing up a disk image would require backing two drives up in tandem and I have to make sure that both backups are always grouped together.

Standard Postgresql performance tips are to have the DB separate from the OS drive, so they don't fight for read/write cycles. My PC is on a single 7.2k HDD and I get quite good speeds. I was actually considering a small SSD to add to my PC for a dedicated DB drive, but your post makes me not as eager. I cant imagine why your speeds are so slow, but something isn't quite right.

cyberrico
11-22-2009, 08:19 PM
Standard Postgresql performance tips are to have the DB separate from the OS drive, so they don't fight for read/write cycles. My PC is on a single 7.2k HDD and I get quite good speeds. I was actually considering a small SSD to add to my PC for a dedicated DB drive, but your post makes me not as eager. I cant imagine why your speeds are so slow, but something isn't quite right.

It's happened to me on all three of my computers on XP, Vista and Windows 7, all newly formatted with a clean OS and no crapware.

Well, here goes the new drive. Fingers are crossed because my import in now down to 7h/s.

For the record, this doesn't happen while 12 tabling in PS. I realize that is still less than 7h/s but even on my really low end PC (1GB RAM, slow processor) that I use as my DVR, I get exceptional performance from actually using HEM.

cyberrico
11-22-2009, 08:29 PM
How do I install postgres on another drive? It defaults to C:

cyberrico
11-22-2009, 08:35 PM
Nevermind. Figured it out. And WOW, if this holds up I will be done in a few hours.

_Loki_
11-22-2009, 09:35 PM
SSD READ performace is superb I'm told, but...

Google this exact expression (including quote marks)

"ssd write performance"

There are plenty of write problems - there are solutions - mostly

Patvs
11-22-2009, 10:24 PM
I have a brand new computer with 8GB RAM, a solid state drive and a fast processor on Windows 7 64 bit. I am installing everything from scratch.

Every time I do this I get a "cannot connect" error in HEM and have to reinstall postgres. I made my adjustments according to the percentage of RAM as suggested by the author of that post and as I type this I am using his exact values despite the fact that I have twice his RAM.



Which SSD do you have? (brand, size, firmware?) Can you test the speeds with CrystalDiskMark (http://www.brothersoft.com/crystaldiskmark-download-144091.html)
It should look like this: (note: set test size to 500 MB, not to the default 100 MB as used in pic)

http://i234.photobucket.com/albums/ee183/Fire-Wizard/CrystalDiskMark-1.png

Tip: Did you tweak Windows 7 for use with the SSD? (http://www.ocztechnologyforum.com/forum/showthread.php?t=63273) (click link)


---


Loki: about SSD write speeds: pic above is the Intel X25-M G2 Postville SSD.
Pic below is a 7200 RPM normal harddisk.

http://crystalmark.info/software/CrystalDiskMark/images/CrystalDiskMark22.png

So yes, the sequential write speeds are higher for the normal harddisk, which will be useful if you're often editing large video files (Avid) / rendering SFX (After Effects) / Unzipping large files.
But the SSD is just so much faster with 4K read (and write!) speeds, it's perfect for the PostgreSQL use. (especiallly when importing thousands of small handhistory .txt files.

cyberrico
11-23-2009, 12:01 AM
I'll do the test in a couple of days when the import is done. I have the Kingston 128GB drive. My database is now on its own non-SSD drive. As with all of my other imports, the speed starts off at X and degrades to Y. Right now Y seems to be holding at 33.1/s but I know it could and should be faster.

Patvs
11-23-2009, 12:33 AM
When importing 5 million hands at once, it's normal it starts at X and degrades to Y. I've seen a HUGE degration if the 5 million hands are stored in hundreds of these big 10+ MB files. It's a lot faster if it just imports tens of thousands of <100 kb files.

5 million hands takes up a lot of space, so I'm guessing you stored the .txt handhistories on a non-SSD drive. But it's a lot faster if the .txt files you're importing are on the SSD, along with the PostgreSQL database.

What do you mean your database is on its own non-SSD drive.
What are you using the SSD for then??

cyberrico
11-23-2009, 01:04 AM
When importing 5 million hands at once, it's normal it starts at X and degrades to Y. I've seen a HUGE degration if the 5 million hands are stored in hundreds of these big 10+ MB files. It's a lot faster if it just imports tens of thousands of <100 kb files.

5 million hands takes up a lot of space, so I'm guessing you stored the .txt handhistories on a non-SSD drive. But it's a lot faster if the .txt files you're importing are on the SSD, along with the PostgreSQL database.

What do you mean your database is on its own non-SSD drive.
What are you using the SSD for then??

Fozzy suggested that I put the database on a different physical drive than my OS. Yes, I could format both drives, put the OS on the regular HDD and the database on the SSD but this was easier especially since I didn't know that it would make a difference. The performance of my apps in general does benefit greatly from my OS and all of my programs being on a SSD so it's 6 of one, half a dozen of another.

Patvs
11-23-2009, 02:08 AM
The "SQL on separate drive" is only relevant if you have two normal harddisks (where if Windows + PostgreSQL both had to share the dismal 4K speeds of one drive it would be a disaster, or when you have two SSDs where you're too AFRAID to put them in a RAID0 array). In your situation EVERY program will benefit from your SSD. I would suggest you install Windows + HoldemManager + PostgreSQL (+ Microsoft Office, Adobe Photoshop, iTunes) ALL on the SSD.

The SSD will be a lot faster in showing monthly/yearly graphs, making backups (!), and VACUUMING (!) your database. And with your database on the SSD, you'll NEVER have to defrag the SSD.




I made the alterations to postgres as recommended in 2+2:

Every time I do this I get a "cannot connect" error in HEM and have to reinstall postgres. I made my adjustments according to the percentage of RAM as suggested by the author of that post and as I type this I am using his exact values despite the fact that I have twice his RAM.

The values you change in the .conf file should not be too big. It's funny, the default settings are way too low..... and if you follow the guides and set some values too high (even though you have PLENTY of RAM) it'll lead to connection problems. (even the EnterpriseDB Tuning Wizard, which changes the settings automatically, will make WRONG changes) Only changing them back to the original values, followed by a REBOOT will fix it. You should experiment however, what the highest values are you can use without getting connection problems.

Did you set the SSD in AHCI or in IDE mode (http://www.ithinkdiff.com/how-to-enable-ahci-in-windows-7-rc-after-installation/) in the BIOS? Which Kingston 128 GB: V, V+, M or E (fastest) series?

_Loki_
11-23-2009, 03:28 AM
Thanks Patvs

I understand better now

Question: why would you never have to defrag the SSD ? Is it done automatically "on the fly" ?

Patvs
11-23-2009, 04:15 AM
Defrag on a SSD reduces the life expectancy of the drive (because it results in unnecessary read/write actions) and it increases performance by 0.

Defrag is only useful on normal drives, where information that belongs together is scattered all over the drive. So defrag puts that information close together again, resulting in fewer work for the moving read-and-write head. (access time 10 ms)

In viewing the yearly graph of a large database in HoldemManager, the information comes from thousands of read instructions (=thousands x 10 ms // with transfer speeds of 0.5 MB/s)

The SSD has no moving head and can read all information (no matter where it's located on the SSD memory) with a 0.1 ms access time. (with transfer speeds of 10-25 MB/s depending on your SSD)
The SSD does have some minor other "problems", some are fixed by Garbage Collecting and TRIM and those are done automatically "on the fly".

Don't defrag your SSD!

cyberrico
11-23-2009, 08:07 PM
I originally used the SSD for my database but the import was 7 hands/second. Putting my database on its own drive (per Fozzy's recommendation) increased the speed significantly but still an unimpressive 33/s. It has degraded to about 15/s and I am only about 25% done. I have tried doing it one file at a time and it goes very quickly but that is also degrading plus there is a long delay on every new import be it one file or 5000.

My SSD would probably benefit from a tweak or two and my benchmark numbers were less impressive than what you posted but I'd really need some confidence that this would make a significant difference from 7/s if I am going to start all over yet again.

Granted, I don't want to spend another week and a half importing at this declining rate but the idea of starting over for the 5th time sounds painful.

My SSD is actually amazingly fast. For example, on a normal drive, extracting an 800MB file usually takes about 30-40 seconds minimum. The SSD does it in 3. I realize that there are all kinds of levels of reading writing and transferring data but that 7/s just doesn't seem like the issue.

Normally, I wouldn't ever ask a support person to sell me on a solution but I am getting two pieces of advice and again, I cringe at the idea of starting over again. I'll do it if you're confident that I will get out of these abysmal speeds.

Patvs
11-23-2009, 09:13 PM
My SSD is actually amazingly fast. For example, on a normal drive, extracting an 800MB file usually takes about 30-40 seconds minimum. The SSD does it in 3.

Extracting a 800MB file uses sequential speed.
PostgreSQL benefits from the 512K and 4K speeds.
(what are your SSD speeds (compared to your normal harddisk), are they slower than the 7200 RPM example pic of my post?)

As stated before, the SSD will be a lot faster in making backups, VACUUMING, etc. But importing your 5 million hands (especially if many files are 10+ MB .txt files) will take a long time.

Best solution to speed it up:
-check your SSD settings (AHCI in BIOS), enable caching, etc. (http://www.ocztechnologyforum.com/forum/showthread.php?t=63273)
-increase memory use values in your postgresql.conf and reboot
-make sure the .txt files are on the SSD during the import
-don't import ALL files at once: you can import ALL <500KB files at once, but devide the >500KB files over 4 different folders, and import one of those folders at a time

Your 5 million hands: is it an exported database you did in HoldemManager (or datamined hands with many large text files), or your original handhistories (small files only)? (how many files?)

cyberrico
11-24-2009, 10:07 AM
Extracting a 800MB file uses sequential speed.
PostgreSQL benefits from the 512K and 4K speeds.
(what are your SSD speeds (compared to your normal harddisk), are they slower than the 7200 RPM example pic of my post?)

As stated before, the SSD will be a lot faster in making backups, VACUUMING, etc. But importing your 5 million hands (especially if many files are 10+ MB .txt files) will take a long time.

Best solution to speed it up:
-check your SSD settings (AHCI in BIOS), enable caching, etc. (http://www.ocztechnologyforum.com/forum/showthread.php?t=63273)
-increase memory use values in your postgresql.conf and reboot
-make sure the .txt files are on the SSD during the import
-don't import ALL files at once: you can import ALL <500KB files at once, but devide the >500KB files over 4 different folders, and import one of those folders at a time

Your 5 million hands: is it an exported database you did in HoldemManager (or datamined hands with many large text files), or your original handhistories (small files only)? (how many files?)

5000 files datamined by a 3rd party.

I did change the values in that file and breaking imports into smaller chunks (I've tried everything) makes no difference.

Patvs
11-24-2009, 10:32 AM
5000 files, so every file is an annoying 1.2 MB... that WILL take a long time to import.

When your import is finished. Do purge all hands, and create a good backup.
So in the future, you can just RESTORE the backup file.
However, I reimport all my handhistories about twice a year. A year ago when HoldemManager added the "luck adjusted winnings" for SNGs/MTTs required a new database,... and next year with the new $EV by street, will surely require a new database and reimport.

cyberrico
11-24-2009, 10:38 AM
5000 files, so every file is an annoying 1.2 MB... that WILL take a long time to import.

When your import is finished. Do purge all hands, and create a good backup.
So in the future, you can just RESTORE the backup file.
However, I reimport all my handhistories about twice a year. A year ago when HoldemManager added the "luck adjusted winnings" for SNGs/MTTs required a new database,... and next year with the new $EV by street, will surely require a new database and reimport.

So you think that even with my incredibly fast system this import should take almost 2 weeks? I did it on my old slower computer in 4 days.

Patvs
11-24-2009, 05:08 PM
It should take less than 16 hours on your computer.
Focus on why your SSD is being outperformed by your normal harddisk.
(tips above in my other posts)

cyberrico
11-24-2009, 06:54 PM
I started over. Here's what I did:

-Everything on C: (Win 7 64, HEM, Postgres, txt files, etc) Kingston SSD V series
-Reconfigured the postgres file using the highest settings I could get away with.
-Tweaked Windows 7 per the link you provided
-Making subfolders for my txt files that only have 100 files in them (instead of the full 5000)

The one thing that I didn't understand was your directive to change the drive in the BIOS. It is set to SCSI. I tried setting it to AHCI and Windows wouldn't even let me boot. I set it back to SCSI.

There were no other settings in the BIOS that were meaningful to drive performance.

I did benchmarks on both dirves. They were about the same:

90/86
35/80
.45/1.0

I am currently importing at a rate of 44/s on file 34 of 100 (again, 5000 total)

I will tell you right now that by the time I get to file #100 it will be less than 20/s. I can reboot and start my next batch and it will be right back to less than 20/s. It will bottom out at about 7/s by the time I get to about file 1500 and that will take me more than 24 hours. After that, forget about it. 7/s on the remaining 3500 files = weeks.

Please note: This is pretty much identical behavior whether I put the DB and everything else all on the SSD, put everyting but the database on the SSD and put the DB on its own drive or put everything on another non-SSD drive which runs Vista 64.

The last time that I did the full import (about 9 months ago) I was on a much slower computer with a 2GB RAM.

I have also tried vaacuming and reindexing the database between big batches. I recall this helping the first time that I did this import but it is not making a difference in any of my attempts in the last few days that I have been grinding at this.

I can say with confidence at this point that my SSD drive is not the issue. The behavior of both drives in multiple attempts has been pretty much identical.

I suppose that the size of each of the TXT files could be an issue and that smaller files would improve import speed but that doesn't explain why as the database gets bigger, the import gets slower.

Patvs
11-24-2009, 07:50 PM
It's normal Windows won't boot if you change the BIOS setting to AHCI, that's why I provided this link (http://www.ithinkdiff.com/how-to-enable-ahci-in-windows-7-rc-after-installation/). I haven't tried that fix, but it claims it can set Windows to AHCI by editing the registry first.

Sorry to be blunt, but if you get 4K: .45/1.0 // 512K: 35/80 on your SSD the Kingston V-series rather sucks. So naturally where you put the DB / Windows / .txt files will make no difference if your SSD is as slow as the 7200 RPM drive. The only advantage is the low access time, which you will notice starting programs. Really can't help you any further.

When I google 4K write speeds for the Kingston V-series I find:
That's what your 512K/4K READ-speeds should be.

http://www.legitreviews.com/images/reviews/1005/crystal_vseries_100.jpg

fozzy71
11-24-2009, 10:06 PM
.......
-Making subfolders for my txt files that only have 100 files in them (instead of the full 5000)........

I am currently importing at a rate of 44/s on file 34 of 100 (again, 5000 total)

I will tell you right now that by the time I get to file #100 it will be less than 20/s. I can reboot and start my next batch and it will be right back to less than 20/s. It will bottom out at about 7/s by the time I get to about file 1500 and that will take me more than 24 hours. After that, forget about it. 7/s on the remaining 3500 files = weeks.
.........

Would you mind sending me 1 or 2 batches of 100 to import? I am curious how my generic HP Elite pc will perform on the same files.

Please www.7zip.org the hands and email them to me at fozzy@holdemmanager.net, with a link to this thread.

cyberrico
11-24-2009, 10:40 PM
Sent. Mine is an HP as well.

And the registry hack didn't work for AHCI. I did pretty extensive research on this and some people are simply forced to reinstall Windows 7 with AHCI enabled from the start.

Patvs
11-25-2009, 02:23 PM
I imported your files:

My performance monitor BEFORE import: CPU 0%, RAM 52%
DURING import: CPU 11%, RAM 53%.
Pic 1 is import into NEW database
Pic 2 is import into my existing database

http://img267.imageshack.us/img267/9897/testxv.jpg
http://img412.imageshack.us/img412/9121/test2p.jpg


fozzy said: "they imported fine/fast". I don't know his actual speeds, but it seems your problem must be caused by some driver/hardware/software conflict.
Email me your postgreSQL.conf file + more hardware info + list of all programs running (especially all the programs that produce an icon on the bottom-right of your screen)

cyberrico
11-26-2009, 10:39 PM
UPDATE:

I decided to reinstall my OS once again. Windows 7 (64 bit), this time with AHCI enabled in the bios prior to the installation. There is a registry hack that will allow you to change it on the fly but it didn't work for me so I started from scratch.

My SSD is not even plugged in. I am doing this from my normal drive.

Here are the steps that I took:

1) Enable AHCI
2) Unplug SSD
3) Format regular drive
4) Install Windows 7 64 bit
5) Install HEM and Postgresql
6) run hm, hud and import executables as admin
7) Import big database

Speed was pretty slow (12/s).

8) Made changes to the postgresql.conf file
9) reboot
10) continue import

Speed improved. I was standing "strong" at about 22/s. This is with the batch I did last night, the one I did this morning, the one I did before I left for the afternoon and it seems to be building up to 22/s with the batch I am importing right now.

Three elements in my current environment have changed that I had never done before in my previous 10 tries:

1) Only one hard drive is installed
2) AHCI is enabled
3) Poker Stars, Full Tilt (and just about everything else but Firefox) is not installed

I am sure the only real viable factor here is the AHCI. There were really no programs running in the background in any of my previous attempts and a second drive not even installed is certainly not a factor. My earlier results were the same on the SSD as they are on my normal drive. Benchmarks showed the SSD to be faster. In any case, I am taking it back and letting the technology mature just a hair more before I dive in again. It's a superior drive, I just expect a lot more for the price.

Here is how things performed poorly in previous attempts:

1) 7 hands per second (typically)
2) Would usually drop to about 4/s after the first 1200 or so files were imported
3) Batch size made no difference (yes, 2 files import quickly but there is a 1-2 minute wait while HEM prepares the import)
4) Import would start extremely fast (120/s) and eventually degrade to terrible speeds. In some environments it never saw 120.

CONCLUSION

I have done a lot of reading both here and on 2+2 about large data imports. There are three types of results that people get when doing so:

1) Super fast 90+/s
2) Mediocre 20-30/s
3) Horrible 7/s or less

Some people with 90/s didn't do anything to change the configuration of postgresql, Windows or AHCI/SCSI/IDE, some had super fast computers, some had low end machines. Same goes for the middle and low end. I got better speeds than I am even getting now importing data into my extremely low end piece of junk eMachines that I use a media streamer/DVR than I do on my very fast system. There is no common denominator that I could come up with.

I can't exactly say that the import functions or database management of Postgresql or HEM are necessarily flawed as obviously some people regardless of their environment have had success with importing 5 million hands in the same format (size, type, everything) mine is in.

Hardware is not the bottleneck. My system is only using a small fraction of my RAM and processor. I'm sure it's not the speed of my hard drive. My junk machine's drive is 5400 RPMs.

However, the day is coming when a million people will need to import millions of hands into their databases on a semi-regular basis.

Fozzy and Pat, you guys have been awesome. Thanks for all of your help. If you need more files for testing, let me know.

cyberrico
01-04-2010, 03:36 PM
Have there been any developments on this? I am going through the import process again and it is even slower than the first time.

Patvs
01-05-2010, 05:27 AM
The big development is the Intel X25-M G2 Postville no longer is the fastest SSD harddisk: it now is the soon to be released "OCZ Vertex 2" (http://www.anandtech.com/storage/showdoc.aspx?i=3702).

cyberrico
01-05-2010, 10:38 AM
The big development is the Intel X25-M G2 Postville no longer is the fastest SSD harddisk: it now is the soon to be released "OCZ Vertex 2" (http://www.anandtech.com/storage/showdoc.aspx?i=3702).

What's the obsession with SSD's in relation to this topic? I'm not using an SSD and I am getting 2.7h/s. I read somewhere in the 2+2 forums that 1.10 was a possible fix to slow import speeds. It didn't but I am absolutely certain that this is a software issue, not a hardware issue.

I know that I would be extremely unhappy with any other tracking software so I am stuck with making HEM work which I need for my livelihood. I just don't know how I can make this all work when I have to spend 2 weeks importing data every single time postgresql decides it needs to be reinstalled.

Is this a hot issue for the devs? Am I on an island here? Am I the only user on the planet who gets less than 10h/s import speed?

Patvs
01-05-2010, 11:04 AM
My obsession with SSDs, is I'm trying to convey you wouldn't have your import problems with a 80+ GB Intel X25-M G2 Postville. (and it's not a software problem)

The top priority for the devs is: multi currency support with a currency convertor. I've seen two recent posts of people with 5+ million hand databases and <10 h/s import speeds: one of them had a ridiculous professional server setup (which made no sense), the other a similar problem as yours, but wouldn't blame it on his hardware.

Actual poker-playing is my livelihood too.... my biggest nightmare, was every 3 months (!) when my database would get "corrupt" for NO REASON. And reimporting my database would take 3 weeks. Using pgAdminIII restore/backup (for a monthly backup) would solve part of my problem.... though importing ONLY last months hands would still take 12 hours.

So in building my poker computer, I eliminated every factor that could potentially become a bottleneck for lag issues / import speeds.

TexDanny
01-22-2010, 09:12 PM
I'd like to just come here and share my experiences with a recently bought high end computer and a large database. I've been using an intel 160GB X-25M g2 postville for the past week and I have to say that I'm impressed with it's overall performance. However, when it came down to importing my database (40gigbs of exported HH's folder) it was actually really really slow. I can't remember how many hands a second because it was usually showing 'importing 14parts out of 38 parts of a big file' or something like that. But I can tell you that it wasn't a small number because I'd have notice something was wrong right away. After 3 days importing I thought to myself: "I'm pretty sure my old but similitar in hardware computer w/o a ssd imported the db in less than 2 days. And since I'm using a SSD something must be wrong".

I went to my old computer and backed up my Postgres SQL database in a .backup 8GB file. I then restored it through Postgres SQL and it imported in less than 6 hours. I was so happy because my db was all back and runing perfectly in my SSD and I was ready to start playing. Unfortunately, I opened my computer and saw that I had only 11gb of free space remaning.

I've just written all of this to ask how the fuck does an 8gb database file then takes 120gb of my SSD!!!!!!!!! What am I supposed to do now? should I try the dump/restore thing in the 10 performance tips thread? I don't wanna break my db and have to go over all this process again. I haven't deleted the sql log files either, where are they again?

Patvs
01-22-2010, 11:40 PM
-If you're importing parts 14 out of 38 (!) these are HUGE .txt files (10+ MB?)
-pgAdminIII can do two backups: the default backup compresses the database, so yes, it's normal a 8 GB .backup becomes a 120 GB database.
-What to do: -1 delete the content of the pg_log folder
-2 PURGE ALL HANDS using the OPTIONS--> Observed Hand History Configuration (you'll lose the ability to REPLAY the hand, but it'll keep all stats/graphs/winnings/etc.
-3 Do a VACUUM/ANALYSE + REINDEX afterwards

Zendel
04-17-2010, 09:09 AM
-If you're importing parts 14 out of 38 (!) these are HUGE .txt files (10+ MB?)

HEM exports hand histories to huge files, some over 100 mb. Why not have it export to smaller files?

cyberrico
04-17-2010, 12:05 PM
HEM exports hand histories to huge files, some over 100 mb. Why not have it export to smaller files?

The file size is a big deal. It's probably why my imports are so slow. Much of the data I have is in large 1MB chunks. They import at ridiculously slow speeds.

However, when students send me their data (hundreds of thousands of hands) it imports almost instantaneously.

Let me recap some of what I have done in HUNDREDS of hours of testing:

5 different computers. All with modern, if not state of the art, components

8 different hard drives. All were good quality including one SSD Intel drive.

20 different versions of HEM.

Postgres 8.3 and 8.4

Windows Vista 32&64, XP, 7-64bit

All of the little tweaks to Windows and Postgres as outlined in the sticky post.


The only consistency is the data. It's the same 1MB files. Whether I import one or all 5000 at once, HEM/Postgres chokes on itself almost immediately. The HEM support staff has been amazing in helping me with this but I have to believe that I have done my due diligence to rule out user error or system problems.

IMO the import function of Postgresql should be put under the microscope to figure this out or a big red bold statement should be made that HEM isn't designed to import data files greater than 100K in size which would change how the 3rd party companies handle their data.

One problem is that there are like 3 of us in the history of HEM subscribers who have had this problem enough to work on it here in the forums. I think that most people that have had this problem have said "Bah screw it, I won't even import. I play 25K hands a week and will have fresh data anyway".

And of course the other glaring problem is that there are thousands of people who have the exact same data file sizes that I do and jam them in their database 1MB files, millions of hands, from the same source, every week/month.

dare_v
04-17-2010, 05:19 PM
Let me bump in here...

I guess I am one of those 3 people that have problem with slow import speed.

I tried everthing mentioned here and in other threads, but no luck. After a good start the import speed just slows down to a very bad number.

But I did some testing with another poker tracking software and difference was noticable. The main difference I noticed was that HEM uses UTF8 and "other software" uses ASCII, I dont know if this is related to import speed, but I doubt. Everynthing else was the same. I created new database in each programs and imported the same hands into both. With HEM it took around 30mins and with "other software" it was done in 10 mins. I did not change aynthing betwin importing, not even a computer restart. This was importing around 70.000 hands.

Quite odd, dont you think?

PS, a lot of people talking about AHCI, is there a way to "SEE" if AHCI is really turned on by the OS or not? Thou I have a SSD and I dont think some AHCI should make a difference since NCQ does not affect SSD.

cyberrico
04-17-2010, 05:43 PM
Let me bump in here...

I guess I am one of those 3 people that have problem with slow import speed.

I tried everthing mentioned here and in other threads, but no luck. After a good start the import speed just slows down to a very bad number.

But I did some testing with another poker tracking software and difference was noticable. The main difference I noticed was that HEM uses UTF8 and "other software" uses ASCII, I dont know if this is related to import speed, but I doubt. Everynthing else was the same. I created new database in each programs and imported the same hands into both. With HEM it took around 30mins and with "other software" it was done in 10 mins. I did not change aynthing betwin importing, not even a computer restart. This was importing around 70.000 hands.

Quite odd, dont you think?

PS, a lot of people talking about AHCI, is there a way to "SEE" if AHCI is really turned on by the OS or not? Thou I have a SSD and I dont think some AHCI should make a difference since NCQ does not affect SSD.

While I would agree that a 300% difference is significant, it's not the insane bottleneck that we are talking about here and I'm not sure that 70,000 hands is a fair assessment of import speed when comparing to another program.

But you do bring up a very interesting point with the UTF8 and ASCII. I understand the difference on a basic level but not enough to make an intelligent statement.

My speeds are less than 3 hands per second. 70k hands would import quickly for me but large files (1MB+ each) are at a snail's pace.

dare_v
04-17-2010, 05:56 PM
I have several directories to import, each around 70.000 hands in it (dont know how much files thou) and in HEM I just gave up when I wanted to import all at once (speed after 1 day was down to 2h/s and I just canceled it). Now I am importing in "other software" and with the last batch of files speed is down, but still solid - 20h/s

I have one more batch of files - 300.000 hands - but I dont dare to test with that yet :)

It just seems to me that the main issues is cache. We all start with a rather normal speed, but then, after a few tousands of hands, it just drops so much, which is shouldnt - and it does not in "other software"

Zendel
04-18-2010, 12:00 PM
It is quite frustrating. I've been waiting 3 days and have only imported 1/3 of my database. In hindsight I should have backed up my database, stayed with postgres 8.3, and been happy.

If HEM isn't going to import hand histories any faster, at least the export should be making files it can later deal with. It couldn't be much effort to at least fix that part of the problem. Now I have 100+ 100mb+ files that will take me until my next computer upgrade to import.

Patvs
04-18-2010, 01:38 PM
don't import the 100+ MB files...
only import original handhistories...

jenkulle
04-18-2010, 02:54 PM
I already suggested that HEM should EXPORT HH in a format or size that is better for reimporting it... but they told me to go suggest it in the forum, thus nothing will happen since it doesnt seem to be a priority.

that pisses me off because a lot of players just quit instead of importing hands... when it should be a basic function.


just seem logical to do so, I don't know how hard it is, but PT3 had an option about this and it worked great

dare_v
04-18-2010, 02:57 PM
So I downgraded to Postgres 8.3 to get the ASCII option back, but no help.

I dont know why PT3 is faster with importing, but maybe they have less data put into database which means less stats?

Exactly same conditions for both programs, and HEM finished the job in 23mins and PT3 in 10mins.

But it seems either postgres 8.3 or ascii database helped with maintaining same import speed while database getting bigger. So I dont get the infamous drop of import speed to 2h/s.

PS, during today's night I will try to import 300K hands into database and that will be the ultimate test, will report back.

Zendel
04-18-2010, 03:33 PM
don't import the 100+ MB files...
only import original handhistories...

While I probably still have the majority of the files, that is not really a satisfactory solution. I shouldn't have to keep the original hand histories. Why even have an export function if the files can't be reimported?

Patvs
04-18-2010, 03:47 PM
While I probably still have the majority of the files, that is not really a satisfactory solution. I shouldn't have to keep the original hand histories. Why even have an export function if the files can't be reimported?


The exported hands can be reimported. The exact same thing happens, if you request hands from PokerStars/Full Tilt, and they sent you the hands in large files.

If the files are large, the import will be slower.

Zendel
04-18-2010, 03:48 PM
So I downgraded to Postgres 8.3 to get the ASCII option back, but no help.

I dont know why PT3 is faster with importing, but maybe they have less data put into database which means less stats?

Exactly same conditions for both programs, and HEM finished the job in 23mins and PT3 in 10mins.

But it seems either postgres 8.3 or ascii database helped with maintaining same import speed while database getting bigger. So I dont get the infamous drop of import speed to 2h/s.

PS, during today's night I will try to import 300K hands into database and that will be the ultimate test, will report back.

I don't have a problem with large numbers of hands contained in smaller files. I just did a batch of 400k @ 62 hands/second. Its the large files that bog...

Zendel
04-18-2010, 03:50 PM
The exported hands can be reimported. The exact same thing happens, if you request hands from PokerStars/Full Tilt, and they sent you the hands in large files.

If the files are large, the import will be slower.

Why can't the export function of HEM be changed to create smaller files? It seems like such a simple solution.

dare_v
04-18-2010, 06:13 PM
I don't have a problem with large numbers of hands contained in smaller files. I just did a batch of 400k @ 62 hands/second. Its the large files that bog...

What is your PC hardware?

Zendel
04-18-2010, 09:51 PM
What is your PC hardware?

Relevant parts:
Q9550 Quad Core 2.83ghz running at 4ghz
4 gigs DDR II ram running at 475mhz
80gigs Intel X25-M SSD boot drive with HEM/Postgres installed
500gigs SATA II hard drive dedicated to postgres data

I have set HEM and HMimport to high cpu priority, but that doesn't make that much of a difference because I am rarely making much use of the cpu.

Patvs
04-18-2010, 11:41 PM
@Zendell you get 62 hands/s right? And the problem is.. LARGE files.. take FOREVER to import?
How many hands do you get if PostgreSQL and the database are on the SSD?

dare_v
04-19-2010, 04:16 AM
Relevant parts:
Q9550 Quad Core 2.83ghz running at 4ghz
4 gigs DDR II ram running at 475mhz
80gigs Intel X25-M SSD boot drive with HEM/Postgres installed
500gigs SATA II hard drive dedicated to postgres data

I have set HEM and HMimport to high cpu priority, but that doesn't make that much of a difference because I am rarely making much use of the cpu.

62h/s is great consider the fact you have postgres data on a normal HDD, not SSD. I have everything on SSD drive and getting 7h/s for large imports.

dare_v
04-19-2010, 05:17 AM
Do you guys think that import speed would be better if I put my OS onto a normal HDD and then use SSD only for my poker stuff? also put postgres on it?

I really dont need a fast OS drive, since I only use my desktop for watching movies and poker, I bought SSD only for the purpose of fast HEM/postgres

Zendel
04-19-2010, 05:43 AM
@Zendell you get 62 hands/s right? And the problem is.. LARGE files.. take FOREVER to import?
How many hands do you get if PostgreSQL and the database are on the SSD?

Not sure. I don't want to pay for another SSD just for my postgres database. I'm happy with 60 hands/s when I can get it. It is primarily the large files that slow down to <20 hands/s.

Zendel
04-19-2010, 05:49 AM
Do you guys think that import speed would be better if I put my OS onto a normal HDD and then use SSD only for my poker stuff? also put postgres on it?

That would probably be faster, but it depends on your HDD and SSD. If you have a newer SATA drive and an older SSD drive perhaps not. Older SSD had pretty slow write speeds.

dare_v
04-19-2010, 08:12 AM
Wow, this is shocking!

So I put my system back to VERY old HDD (IBM 80G 7200 rpm thou) and postgres database (only database!) with HH files on my SSD (another old model OCZ, one of a first SSD drives)

And my start speed is now 86h/s, compare to 30 earlier.
I tested with my laptop SSD here, just to see the speed difference and with it I got the 90h/s, so not much difference, which confirms my theory that with SSD it only matter their very low random access time and not the speed, because we rarely deal with big data transfers.

I did not import my whole database, because difference is noticable!

With my old database which is now around 500K I get the import speed of 46h/s when importing 80K hands from 120 files.

Now I am happy :)

jenkulle
04-19-2010, 01:46 PM
. Why even have an export function if the files can't be reimported?

exaaaaaactly

Zendel
04-19-2010, 06:50 PM
Wow, this is shocking!

So I put my system back to VERY old HDD (IBM 80G 7200 rpm thou) and postgres database (only database!) with HH files on my SSD (another old model OCZ, one of a first SSD drives)

And my start speed is now 86h/s, compare to 30 earlier.
I tested with my laptop SSD here, just to see the speed difference and with it I got the 90h/s, so not much difference, which confirms my theory that with SSD it only matter their very low random access time and not the speed, because we rarely deal with big data transfers.

I did not import my whole database, because difference is noticable!

With my old database which is now around 500K I get the import speed of 46h/s when importing 80K hands from 120 files.

Now I am happy :)

Awesome that was about to be my next experiment! My last post got me thinking the read speed of the SSD would help more with the hand histories side than the database side. I'll post up my results later tonight.

dare_v
04-19-2010, 07:06 PM
I also discovered the following. It is not the size of the database that is making it slower each time we import, but the amount of data HEM uses for certain level.

I imported 800K of hands from NLXXX, last hands were imported with a speed of 18h/s. The I went on and started to import 200K of hands from NLYYY level. And the speed was back to 86h/s. Second batch of 200K hands were already down to 25.

My guess is that HEM is already making some calculations while importing hands which then makes it slower and slower while database getting bigger and I dont honesly see a way to work around this, unless somebody puts database onto a RAM drive, just for the main import, then it can be moved.

Zendel
04-24-2010, 11:39 AM
I've been importing files for about a week now and have come to peace with the fact it is going to take forever. (Thankfully there is a casino where I can play live 15 minutes away.) My import speed has dropped down to 35 hands/s, from 65 hands/s, with smaller files as my database grows.

I am posting again because I have noticed the memory usage of HMImport balloons after it has been running for a few hours to a day. It starts at <150mb and reaches >400mb. Is that normal or could there be a memory leak? A memory leak could explain the import speed gradually slowing to a crawl.

Azalin
04-24-2010, 12:52 PM
I am having the exact same issue. I decided to import only 4 months (since the start of 2010) of hh to my new database after I installed postgresql 8.4.3 in a new dedicated hard drive (WD Black Caviar) in order to be done quickly but to my amazement boy was I wrong:/ After trying to import all at once (and from the archived folder not from an export file mind you) and watching the speed dropping to 25h/sec from 108h/sec I switched to importing specific days of mined hh. I started with the month of April and I have imported 15 days worth of hh in 4 days..

As a side note I have already done almost all the database improvement tips (increase buffer size etc) and my hardware specs are decent (quadcore 6700 overclocked to 3,2GHz) and after reading this thread all I can say is that I am of the opinion that HEM just cannot handle large imports maintaning a normal/acceptable speed. I would appreciate if something was done about it from the developers.

All in all, HEM is a wonderful program and the support was/is great whenever I had any issue but there are some areas that HEM definitely could use an improvement (HUD should be faster when using multiple panels, HEM could have the option to create custom stats for use in the HUD as well as in the report sections, speed increase in importing large amount of hh in a new database and selective export of hh to name the important ones IMHO).

mYgu
08-07-2010, 04:36 PM
I know its a bit old this thread but i ran some few tested with the following configuration
-windows xp sp2 (i did installed it 2 yrs ago)
-intel c2d 6750 @2.66ghz (not OC-ed)
-3.25gb ram @666mhz (i have 4gb but my sistem only sees 3gb cuz of 32bit version)
-7200rpm hdd sata 20gb of 150 free, didn't defrag it for long time ago

So, new db, file that is imported has ~10mb, speed ~180h/s.
New db again, i am importing 7 files of 1mb each, speed ~180h/s.
Old db with 1000k hands purged but the stats remained, and trying to import some 1mb files from the same limit that all hands are from and i am getting ~40h/s. I tryied this before vaccum+reindex and i hand't see any big performance +/-.

The main thing is that the HH that i am trying to import are from the same limit that my majority of my db has stats on those players, maybe if i will import an 100mb file from another limit i should get a speed of 150h/s.

Hope this helps.

Patvs
08-07-2010, 06:34 PM
40 hands/s is still pretty good if you have a large database.

"7200rpm hdd sata 20gb of 150"
You have a 200 GB harddisk with 20 GB free, or a 2 TB harddisk with 150 GB free?

read:
http://forums.holdemmanager.com/manager-general/11194-top-10-hem-performance-increase-tips.html

mYgu
08-08-2010, 02:30 AM
20gb free, total size 150gb, w/ 4k read speed of 0.6mb/s so its kinda sluggish.

Very important is the AV that should be deactivated cuz it keep analysing the files everytime they get processed.

LE: i knew about the top10 perf tips.

chrisadam12
09-07-2010, 01:09 AM
You can use tools such as bigdump to import large databases as phpmyadmin will only accept relatively small databases to import, but I prefer this on a localhost install as you already have easy access already and its quick and easy to do.

Download your database from your live site (if that's the one you want to use). Extract the .gz so you have just a .sql file.

For this example (remember yours will be different) , you need the following...

The root users name and password (for mysql) , its normally just "root" and no password.

The name of the new blank database (if you have not already created one, visit your localhost phpmyadmin and create a new blank database now)

Use windows explorer or a command prompt. Navigate your way to the folder where wamp is installed. You need to go into the folder where mysql.exe is. In my case it is D:\_DEV\bin\mysql\mysql5.1.30\bin . Most cases it will be c:\program files\wamp\bin\mysql\mysql version\bin etc...

Copy the .sql file into here. It must be in the same directory as mysql.exe

Open a command prompt in this directory if you have not already. If you have the "command prompt here" powertoy, this can be quicker than having to navigate via a command prompt to here. You can download this from Microsoft if you want it. Failing that, simply navigate your way to the appropriate directory using cd.

For this example, the database I am importing is called mybackup.sql and the new empty database I created is called test

Assuming you do NOT have a root password, type in the following...

mysql.exe -u root test < mybackup.sql


This may take a few minutes depending on the database size and your CPU power, your firewall may prompt you so you will have to allow temporary access (should only be mysql wanting localhost access)

When its finished, remove the mybackup.sql file you put in that directory.

Done

As an extra note, if you have a root password for mysql, you will need to use this instead (changing pass for the password)

mysql.exe -u root -p pass test < mybackup.sql

Crexis
12-04-2010, 02:32 PM
I just wanted to chime in and say i also get b/w 3 to 7 h/s, when I first started importing it was much faster, b/w 20-30 range but after importing a ton of hands it slowed down and it has stayed there. Also the one friend I had also has it importing at this speed.

We just figured it was our computers but after reading this thread I'm thinking otherwise.

There is definitely way more then '3' people with this problem. More are just lazy about it and figure that's how it is. With importing of large # of datamined hands I'm pretty sure a lot of people are getting these ridiculously slow speeds.

I'll try some of the suggestions and see if it speeds it up.