I tried contacting Wowhead a couple of times about scraping their data without much luck (never got a response) and in the end I just decided to go ahead and do it since I wasn't going to use it for any nefarious purpose.
In my case I need item information, so it's a far bigger scrape than what you're doing. I need both the XML and HTML versions of each page, and there are a lot of items in WoW (almost 100,000). So I end up making about 200,000 requests. I wrote a custom Ruby script to download them all that can use any number of simultaneous threads. On my cable connection and if I set the thread count to 20, the XML half only takes about an hour but the HTML half takes about 3 hours.
Then I parse them all using another Ruby script using Nokogiri (XML/HTML parser) and simple regular expressions. The site uses AJAX often so there are often JSON objects sitting around on various pages that you can load into memory with a JSON parser, too. These are sometimes more convenient than scraping the HTML.
|