Coding an efficient serializer
I've been working on an advanced serializer lib for a while that offers the least overhead possible. It achieves this by doing a base conversion on the entire data then storing it in a natural form. For numbers, this is done by taking a number as a base-256, where each byte is a theoretical digit, and reconverting it to a proper base that avoids forbidden bits like embedded zeroes. For example, this would convert the number as a base-254, also making room for a terminating byte. Functions and userdata are unsupported as there is no way to recreate these on the receiving end. Tables are only serialized once and further references to the same table are assigned the table's reference ID.
As for strings. This is where I'd like some assistance on. The idea is to use the same base conversion used on numbers as a byte-by-byte streaming process. This would be easy if the base were a power of 2. A conversion to a base-128 would mean an overhead of 12.5% instead of the theoretical 7.8% a base-254 conversion would offer.
For now and for the sake of speed, strings are encoded and decoded through use of escape lookups and a custom pattern passed to a single call to string.gsub().
__________________
WoWInterface AddOns
"All I want is a pretty girl, a decent meal, and the right to shoot lightning at fools."
-Anders (Dragon Age: Origins - Awakening)
|
|