Thread Tools Display Modes
02-03-16, 11:36 AM   #1
Rainrider
A Firelord
AddOn Author - Click to view addons
Join Date: Nov 2008
Posts: 454
Data preparation

Hello,

i'm working on a library for raid encounter/dungeon related achievements. The idea is to be able to get the achievements for a given encounter by the encounterID (as by ENCOUNTER_START) or the mapID (as by GetCurrentMapAreaID()).

I'm pulling the data through the encounter journal and the achievements API and currently the raw data has the following format (for now raid data only):
lua Code:
  1. [6] = {
  2.     {
  3.         {
  4.             ["map"] = 1026,
  5.             ["name"] = "Hellfire Assault",
  6.             ["category"] = 15231,
  7.             ["encounter"] = 1426,
  8.             ["achievements"] = {
  9.             },
  10.             ["instance"] = 669,
  11.         }, -- [1]
  12.         {
  13.             ["map"] = 1026,
  14.             ["name"] = "Iron Reaver",
  15.             ["category"] = 15231,
  16.             ["encounter"] = 1425,
  17.             ["achievements"] = {
  18.             },
  19.             ["instance"] = 669,
  20.         }, -- [2]
  21.     },
  22. },

The nesting is by game expansion (6 is for Warlords of Draenor), then instance, then encounter. I keep it structured like that so that the boss and instance order are preserved. I think that way it would be easier to keep it in source version control and verify the data by a human.

However this presents look-up speed problems. So the question is how I should prepare the data for further use based on the above mentioned use scenarios while still being able to meaningfully keep the whole under version control (the preparation output should always have the same data order)?
  Reply With Quote
02-03-16, 05:43 PM   #2
MunkDev
A Scalebane Royal Guard
 
MunkDev's Avatar
AddOn Author - Click to view addons
Join Date: Mar 2015
Posts: 431
Instead of using integers as table keys, use the names or encounter IDs. Table lookup by using keys instead of iteration is extremely fast, comparatively. Now, I'm not sure about the confinements of the data you're working with, but remember to utilize the keys in a table as much as you can. If you have a function that returns the name of the instance, it will be much faster to get the instance name and use that key in the table to get your data instead of iterating across nested tables to find the encounter you're looking for.

Here's an example:

Lua Code:
  1. [6] = {
  2.     ["Hellfire Citadel"] = {
  3.         ["Info"] = {
  4.             ["map"] = 1026,
  5.             ["instance"] = 669,
  6.             ["category"] = 15231,
  7.         },
  8.         ["Hellfire Assault"] = {
  9.             ["encounter"] = 1426,
  10.             ["achievements"] = {
  11.             },
  12.         },
  13.         ["Iron Reaver"] = {
  14.             ["encounter"] = 1425,
  15.             ["achievements"] = {
  16.             },
  17.         },
  18.     },
  19. },
  20.  
  21. local raid = data[raidID]
  22. local map = raid and raid.Info.map
  23. local encounter = raid and raid[encounterID]
  24. local achievements = encounter and encounter.achievements
__________________

Last edited by MunkDev : 02-03-16 at 05:55 PM.
  Reply With Quote
02-03-16, 06:54 PM   #3
Rainrider
A Firelord
AddOn Author - Click to view addons
Join Date: Nov 2008
Posts: 454
This would present a problem with version control as Lua does not guarantee the keys order for associative arrays. A diff would show a lot of changes when someone re-generates the data, even without any new data at all.

I think it would be possible to parse the file containing the data and write the prepared version to a new file as this would keep the relative order. But I don't know how to do that as I suppose I will be regex heavy, which is not exactly my strong side.

As for the structure of the prepared data, I share your opinion. For my current use scenario it could be compressed to just:
lua Code:
  1. [1026] = {
  2.     ["instance"] = 669,
  3.     [1426] = {}, -- Hellfire Assault
  4.     [1425] = {}, -- Iron Reaver
  5. }
as I don't need the category once I populate the achievement tables (which I can do on the raw data).

So, if processing it on the file level is the recommended way here, I need something that could produce the above table format while maintaining the same keys order.
  Reply With Quote
02-03-16, 07:15 PM   #4
Banknorris
A Chromatic Dragonspawn
 
Banknorris's Avatar
AddOn Author - Click to view addons
Join Date: Oct 2014
Posts: 153
The solution I use for this "problem" (most of times it happens because I want to open a saved variables file in a text editor and find table keys ordered) is to put those keys nested into array indexes. So instead of:

Lua Code:
  1. table = {
  2.       ["map"] = 1026,
  3.       ["name"] = "Hellfire Assault",
  4.       ["category"] = 15231,
  5.       ["encounter"] = 1426,
  6.       ["achievements"] = {},
  7.       ["instance"] = 669,
  8. }

you would have:

Lua Code:
  1. table = {
  2.       [1] = {
  3.               ["map"] = 1026,
  4.           },
  5.       [2] = {
  6.               ["name"] = "Hellfire Assault",
  7.           },
  8.       [3] = {
  9.               ["category"] = 15231,
  10.           },
  11.       [4] = {
  12.               ["encounter"] = 1426,
  13.           },
  14.       [5] = {
  15.               ["achievements"] = {},
  16.           },
  17.       [6] = {
  18.               ["instance"] = 669,
  19.           },
  20. }

The problem is that you need to use things like table[3]["category"] instead of table["category"]. You can use metamethods to hide the key index. I personally don't do this but it is possible.
Lua Code:
  1. --put all possible keys in this list
  2. local key_index = {["map"]=1, ["name"]=2, ["category"]=3, ["encounter"]=4, ["achievements"]=5, ["instance"]=6}
  3.  
  4. local mt = {["__index"]    = function(t,k)   return rawget(t,key_index[k]) [k]     end,
  5.             ["__newindex"] = function(t,k,v) return rawset(t,key_index[k],{[k]=v}) end}
  6.  
  7. local table = {
  8.             {["map"] = 1026},
  9.             {["name"] = "Hellfire Assault"}
  10.         }
  11. --without metamethods
  12. print(table["name"]) --nil (error)
  13. print(table[2]["name"]) --"Hellfire Assault"
  14.  
  15. setmetatable(table,mt) --now you can omite the key index when accessing data
  16.  
  17. print(table["name"]) --"Hellfire Assault", this is actually table[2]["name"]
  18.  
  19. table["category"] = 15231
  20. print(table["category"]) --15231, this is actually table[3]["category"]
__________________
"In this world nothing can be said to be certain, except that fractional reserve banking is a Ponzi scheme and that you won't believe it." - Mandrill

Last edited by Banknorris : 02-04-16 at 08:31 AM.
  Reply With Quote
02-04-16, 09:02 PM   #5
Rainrider
A Firelord
AddOn Author - Click to view addons
Join Date: Nov 2008
Posts: 454
This is not a solution to my problem. Firstly you add a further nesting, which means one more table look-up and secondly you add two further function calls through the metatables. I want to go in the opposite direction.

I currently have a data extraction addon, that generates the raw data and a library in the planning which will use the prepared data. I need a tool to parse the raw data and generate the prepared one so that it is fast to access and it keeps the relative order of the raw set (meaning the instance order and the encounter order should be preserved).

I'd appreciate it if someone shares their experience and points me to a proper approach or even share their tools for that. I will share the extraction addon once I verify its results (which will be by the end of the week I hope).
  Reply With Quote
02-05-16, 08:20 AM   #6
semlar
A Pyroguard Emberseer
 
semlar's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2007
Posts: 1,060
You haven't mentioned how you're exporting this data from the game, but the code from your first post is presumably from a saved variables file because of the way it's commented.

You will not be able to write an associative array to your saved variables in a way that maintains the order of your keys, but you can create a text box in-game that you can copy your database out of and format that however you want.

Since associative arrays inherently have no order, you'll need to index your keys with a second table if you want to be able to arrange them in a specific way for your output.

From your example earlier:
Lua Code:
  1. [1026] = {
  2.     ["instance"] = 669,
  3.     [1426] = {}, -- Hellfire Assault
  4.     [1425] = {}, -- Iron Reaver
  5. }
Would have its order stored in a second table which would be used to sort the keys when you're generating your output for the text box:
Lua Code:
  1. [1026] = {
  2.     [1] = 1426, -- Hellfire Assault
  3.     [2] = 1425, -- Iron Reaver
  4. }

This can also be done in the same table, provided you do it in a way that doesn't cause conflicts:
Lua Code:
  1. [1026] = {
  2.     ['instance'] = 669,
  3.     ['1426'] = {}, -- Hellfire Assault
  4.     ['1425'] = {}, -- Iron Reaver
  5.     [1] = '1426', -- Hellfire Assault
  6.     [2] = '1425', -- Iron Reaver
  7. }
  Reply With Quote
02-05-16, 08:47 AM   #7
MunkDev
A Scalebane Royal Guard
 
MunkDev's Avatar
AddOn Author - Click to view addons
Join Date: Mar 2015
Posts: 431
Accurately reproducing the order of an associative array can be done by using a custom pairs function that sorts the table keys before returning the key and value to you. You can use this function to move the keys into their respective table as a value and then append the output of the data with tinsert. I'm dumping all my table shenanigans here in case you have need of the other stuff too:
Lua Code:
  1. ---------------------------------------------------------------
  2. -- Table.lua: Extra table functions for various uses
  3. ---------------------------------------------------------------
  4. -- These table functions are used to perform special operations
  5. -- that are not natively supported by the Lua standard library.
  6. ---------------------------------------------------------------
  7. local _, db = ...
  8. local Table = {}
  9. ---------------------------------------------------------------
  10. db.Table = Table
  11. ---------------------------------------------------------------
  12. ---------------------------------------------------------------
  13. -- Copy: Recursive table duplicator, creates a deep copy
  14. ---------------------------------------------------------------
  15. Table.Copy = function(src)
  16.     local srcType = type(src)
  17.     local copy
  18.     if srcType == "table" then
  19.         copy = {}
  20.         for key, value in next, src, nil do
  21.             copy[Copy(key)] = Copy(value)
  22.         end
  23.         setmetatable(copy, Copy(getmetatable(src)))
  24.     end
  25.     return copy or src
  26. end
  27. ---------------------------------------------------------------
  28. ---------------------------------------------------------------
  29. -- Flip: Flips the table associations. (only for unique values)
  30. ---------------------------------------------------------------
  31. Table.Flip = function(src)
  32.     local srcType = type(src)
  33.     local copy
  34.     if srcType == "table" then
  35.         copy = {}
  36.         for key, value in pairs(src) do
  37.             if not copy[value] then
  38.                 copy[value] = key
  39.             else
  40.                 return src
  41.             end
  42.         end
  43.     end
  44.     return copy or src
  45. end
  46. ---------------------------------------------------------------
  47. ---------------------------------------------------------------
  48. -- Compare: Recursive table comparator, checks if identical
  49. ---------------------------------------------------------------
  50. Table.Compare = function(t1, t2)
  51.     if t1 == t2 then
  52.         return true
  53.     elseif t1 and not t2 or t2 and not t1 then
  54.         return false
  55.     end
  56.     if type(t1) ~= "table" then
  57.         return false
  58.     end
  59.     local mt1, mt2 = getmetatable(t1), getmetatable(t2)
  60.     if not Compare(mt1,mt2) then
  61.         return false
  62.     end
  63.     for k1, v1 in pairs(t1) do
  64.         local v2 = t2[k1]
  65.         if not Compare(v1,v2) then
  66.             return false
  67.         end
  68.     end
  69.     for k2, v2 in pairs(t2) do
  70.         local v1 = t1[k2]
  71.         if not Compare(v1,v2) then
  72.             return false
  73.         end
  74.     end
  75.     return true
  76. end
  77. ---------------------------------------------------------------
  78. ---------------------------------------------------------------
  79. -- PairsByKeys: Sort by non-numeric key, handy for string keys
  80. -- Accurately reproduces the order of an associative array.
  81. ---------------------------------------------------------------
  82. Table.PairsByKeys = function(t, f)
  83.     local a = {}
  84.     for n in pairs(t) do tinsert(a, n) end
  85.     table.sort(a, f)
  86.     local i = 0      -- iterator variable
  87.     local function iter()   -- iterator function
  88.         i = i + 1
  89.         if a[i] == nil then return nil
  90.         else return a[i], t[a[i]]
  91.         end
  92.     end
  93.     return iter
  94. end
  95. ---------------------------------------------------------------
__________________

Last edited by MunkDev : 02-05-16 at 07:52 PM.
  Reply With Quote
02-05-16, 09:39 AM   #8
Resike
A Pyroguard Emberseer
AddOn Author - Click to view addons
Join Date: Mar 2010
Posts: 1,290
Here is a bit better solution for your last string table iteration:

Lua Code:
  1. local function spairs(t, order)
  2.     local keys = { }
  3.     for k in pairs(t) do
  4.         keys[#keys + 1] = k
  5.     end
  6.     if order then
  7.         table.sort(keys, function(a, b)
  8.             return order(t, a, b)
  9.         end)
  10.     else
  11.         table.sort(keys)
  12.     end
  13.     local i = 0
  14.     return function()
  15.         i = i + 1
  16.         if keys[i] then
  17.             return keys[i], t[keys[i]]
  18.         end
  19.     end
  20. end

You can use it the same way as pairs or ipairs.
  Reply With Quote
02-06-16, 01:43 AM   #9
Rainrider
A Firelord
AddOn Author - Click to view addons
Join Date: Nov 2008
Posts: 454
The extraction addon can be found under https://github.com/Rainrider/ExtractAchievements

There are still some edge cases I have to figure out. It also lacks data validation for the most part and some of it will have to happen manually. Draenor, Pandaria and Cataclysm are partially verified (some exceptions for Cataclysm are still not handled, but I have the list). By partially I mean all bosses get at least one achievement except those that have none. If those are the correct ones and if all achievements are included, has to be verified manually (to an extent, I'll include some more checks for "lone" achievements).
  Reply With Quote
02-06-16, 09:06 AM   #10
Rainrider
A Firelord
AddOn Author - Click to view addons
Join Date: Nov 2008
Posts: 454
I need to write the prepared data to a new file. I think one could read the file containing the raw data sequentially and write its contents in the same order. It should read the lines between two corresponding curly brackets and "shuffle" their contents so that:
  1. the value of the "map" key becomes a key with a new table as a vlaue (if a table for the same map isn't already open)
  2. copy the line `["instance"] = number`, as it is (if a table for the same map isn't already open)
  3. make the value of the "encounter" key a key with a new table as a value
  4. copy the "achievements" table as the value of the "encounter" key in the output file.
  5. copy the value of the "name" key as a comment after the the achievements table in the output file
  6. close the map table if a new mapID or a new tier is encountered.

Would this be the best way to go for what I would like to do, or am I overcomplicating it?
Attached Thumbnails
Click image for larger version

Name:	dataprep.png
Views:	249
Size:	55.5 KB
ID:	8690  
  Reply With Quote
02-06-16, 12:30 PM   #11
Nimhfree
A Frostmaul Preserver
AddOn Author - Click to view addons
Join Date: Aug 2006
Posts: 267
The only real problem you pose is attempting to keep the data in the data file ordered. What I would do it generate the data yourself, and create an ordered data file. Once you have generated it, you need not generate it again. If you have an automatic update system that records new information you put that into a different table that would then separately process (outside of the game) to introduce into your master ordered data file. I would keep the master data in as much of a normalized form as possible. Then, in game, when your addon loads you can process this data to create the fast access tables that get to your data by map ID/encounter/etc.
  Reply With Quote
02-06-16, 03:01 PM   #12
semlar
A Pyroguard Emberseer
 
semlar's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2007
Posts: 1,060
Actually, you could probably keep the data consistently formatted in your saved variables file if you stored your table inside of a giant string.

You'd still have to use additional tables in memory to keep track of the order of your indexes, but you could write out the saved variables in a human-readable format and use loadstring to unpack it in-game.

It would be a ridiculously complicated method just to support diffs, especially when it should be fairly simple to make the addon keep track of what's been added or changed.
  Reply With Quote
02-06-16, 06:26 PM   #13
Rainrider
A Firelord
AddOn Author - Click to view addons
Join Date: Nov 2008
Posts: 454
Originally Posted by semlar View Post
[...]especially when it should be fairly simple to make the addon keep track of what's been added or changed.
Could you please give me some hints in that direction?

@Nimhfree
That is what I would like to do. However I would need to regenerate the data on every new tier and every new expansion, because some of the achievements become Feats of Strength, which means changes may occur in the middle, not just at the end. This is also part of the reason I want the data to be diff'able.

I also stumbled on another problem: there are two different versions of the same dungeons (e.g. Deadmines is once a classic dungeon and also one in Mists of Pandaria), however both use the same encounter, instance and map IDs. How would a user be able to differentiate between them so that they send an unambiguous request to the library?
  Reply With Quote
02-06-16, 08:29 PM   #14
semlar
A Pyroguard Emberseer
 
semlar's Avatar
AddOn Author - Click to view addons
Join Date: Sep 2007
Posts: 1,060
Originally Posted by Rainrider View Post
Could you please give me some hints in that direction?
There are multiple ways to track changes, and how you do it depends on your goals.

You could store a copy of the state of the database when you log in and compare it to the database when you log out or the next time you log in. This would only tell you what has changed since the last time you logged in.

You could create a blank table that you insert changes into and write to the SV file on log out; similar to the first solution it would only be temporary.

You could create a saved variable table which stores a record with the timestamp every time you make a change to the database, which would allow you to generate something like a changelog but it would take up as much space as the database itself. This is probably the one I would go with if I didn't want to write a script to compare the files outside the game for some reason.

Originally Posted by Rainrider View Post
I also stumbled on another problem: there are two different versions of the same dungeons (e.g. Deadmines is once a classic dungeon and also one in Mists of Pandaria), however both use the same encounter, instance and map IDs. How would a user be able to differentiate between them so that they send an unambiguous request to the library?
I believe all you need to differentiate between 2 instances are their instance ID and difficulty.
  Reply With Quote
02-06-16, 09:50 PM   #15
Nimhfree
A Frostmaul Preserver
AddOn Author - Click to view addons
Join Date: Aug 2006
Posts: 267
Originally Posted by Rainrider View Post
Could you please give me some hints in that direction?

@Nimhfree
That is what I would like to do. However I would need to regenerate the data on every new tier and every new expansion, because some of the achievements become Feats of Strength, which means changes may occur in the middle, not just at the end. This is also part of the reason I want the data to be diff'able.

<SNIP>
I will describe a bit of how my addon Grail works. I have entries for each quest arranged by questId in numeric order. Here is an example. Each of the entries is a set of codes, so in the following the K represents quest level, the L the required level, the A: the NPC that gives the quest, etc.
Code:
G[35685]='K1000 L100 A:75028 T:75028 P:35683 E18912'
G[35686]='K0930 L092 A:75127 T:82610 P:35063+35064 E18546'
This information is loaded and then processed so I create a number of tables that allow me to access information quickly. Grail is VERY large, but it is fast because everything is cached for quick access. For example, the P: codes represent prerequisites. They are analyzed so any change in the status of a prerequisite causes any quest relying on it to have its status invalidated (so it can be recomputed).

Now, when the player is accepting or turning in quests, Grail makes sure its database has the name of the quest, and the NPC id (and location in the world) for the quest giver or the turnin NPC. If it does not, Grail records into its saved variables the missing information. If a user sends this saved variables file to me I can process it.

I have a separate process (outside of WoW) that reads saved variable files and compares the new data to my master list of quests. It can then update that master list (keeping all the quests in numeric order, and all the codes in my preferred order (solely for ease of use/diffing)).

And the final bit is upon startup Grail will clean the saved variables file based on the information in the database. So the thought it as Grail's database is updated with new releases, the missing/incorrect information Grail has recorded for a user will be removed properly.

So, what you would need to do is determine what information you want to record for each of your achievements, and how to use the Blizzard API to compare your data to what the game has, such that you can record changes. This would allow players I'm dungeons to have new things automatically recorded, and assuming the changes get sent to you, you can incorporate them into new addon versions. And you need to write some tool that allows you to take these user-submitted saved variable files and merge them into your master data.
  Reply With Quote
02-07-16, 11:03 AM   #16
Rainrider
A Firelord
AddOn Author - Click to view addons
Join Date: Nov 2008
Posts: 454
I'll take a look at Grail, thank you very much for sharing
  Reply With Quote

WoWInterface » Developer Discussions » General Authoring Discussion » Data preparation


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off