Coding an efficient serializer
I've been working on an advanced serializer lib for a while that offers the least overhead possible. It achieves this by doing a base conversion on the entire data then storing it in a natural form. For numbers, this is done by taking a number as a base-256, where each byte is a theoretical digit, and reconverting it to a proper base that avoids forbidden bits like embedded zeroes. For example, this would convert the number as a base-254, also making room for a terminating byte. Functions and userdata are unsupported as there is no way to recreate these on the receiving end. Tables are only serialized once and further references to the same table are assigned the table's reference ID.
As for strings. This is where I'd like some assistance on. The idea is to use the same base conversion used on numbers as a byte-by-byte streaming process. This would be easy if the base were a power of 2. A conversion to a base-128 would mean an overhead of 12.5% instead of the theoretical 7.8% a base-254 conversion would offer. For now and for the sake of speed, strings are encoded and decoded through use of escape lookups and a custom pattern passed to a single call to string.gsub(). |
Any comments as well would be appreciated, such as comparison to the ideas of existing serializer libs like AceSerializer.
For additional info, here's a copy of the serialization specification created and used for my serializer code. Code:
SerialLib was designed to form serialized strings that had the least overhead possible. |
This is relevant to my interests.
|
In my infinite wisdom, I had forgotten to write in the boolean codes under Simple Identifiers. :p
This is now fixed. |
All times are GMT -6. The time now is 06:22 AM. |
vBulletin © 2024, Jelsoft Enterprises Ltd
© 2004 - 2022 MMOUI