View Single Post
04-06-15, 11:33 AM   #1
Dimmulux
A Deviate Faerie Dragon
Join Date: Feb 2014
Posts: 17
Post How can I obtain an offline csv of all battle pets, their statistics and abilities?

Note that this question does not relate to an addon. As part of a university project (concerning game playing algorithms), I find that I need an offline information store containing a listing of the battle pets, their statistics and their abilities (including all effects of their abilities in some sensible format). The focus of the project is running algorithms (in Java) on the information, not the collection of the information itself. A listing of all pets (rather than just manually copying across the information about a few pets) is desirable so that I can test my algorithms on a large number of different combinations. I have read through the thread http://www.wowinterface.com/forums/s...ad.php?t=49083 and tried the relevant suggestions as discussed below.

Question: How can I get an offline database/csv containing a listing of all battle pets, their statistics and their abilities?

What I've tried so far:


1. Web scraping from wowhead and warcraftpets. In both cases, I was only attempting to scrape a single page as a test case (http://www.wowhead.com/petspecies and http://www.warcraftpets.com/wow-pets/filter/ respectively). In neither case did I retrieve the information I wanted (a partial listing of pets). I tried using Jsoup first, but this was unsuccessful as both pages load their content using ajax. Online, I found that htmlunit was recommended for working with pages using ajax, but the following exceptions occurred:
on wowhead:
com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Exception invoking close
on warcraftpets:
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "replace" of undefined (http://j.adlooxtracking.com/ads/js/t...one62v_1.js#14)

I can copy/paste the entire stack traces if they would be helpful, but they are hundreds of lines long.


2. Appending &xml to wowhead URLs I plan to scrape. This does not seem to work outside of item pages. Importantly, it doesn't work with NPC or pet ability pages.


3. Using the official pet API: https://github.com/Blizzard/api-wow-docs#battlepet-api. This seems to miss some necessary information:
  1. What are the valid values of species?
  2. Given a species, what are the valid values of breed?
  3. What effects does an ability have? (something of a form similar to: damage(20 + power)).

I have not yet done so but, if I find a method that works, I intend to retrieve information in that way for each pet and each ability. This would mean making roughly 1500-10000 requests (a very rough estimate, but should give an idea of order of magnitude) sequentially over a residential network connection. Introducing an artificial delay between the requests would be possible. Would this go against the terms of use of any of the sites in question? I cannot find any mention of this on either sites' TOS: http://www.wowhead.com/tos or www.warcraftpets.com/help/.Would it be advisable to contact the sites' owners to check that it's okay?

Any suggestions on how to proceed from here would be very helpful. I also welcome any comments on what I have done (wrong) so far.
  Reply With Quote