Thread Tools Display Modes
09-16-11, 11:25 PM   #1
SDPhantom
A Pyroguard Emberseer
 
SDPhantom's Avatar
AddOn Author - Click to view addons
Join Date: Jul 2006
Posts: 2,327
Coding an efficient serializer

I've been working on an advanced serializer lib for a while that offers the least overhead possible. It achieves this by doing a base conversion on the entire data then storing it in a natural form. For numbers, this is done by taking a number as a base-256, where each byte is a theoretical digit, and reconverting it to a proper base that avoids forbidden bits like embedded zeroes. For example, this would convert the number as a base-254, also making room for a terminating byte. Functions and userdata are unsupported as there is no way to recreate these on the receiving end. Tables are only serialized once and further references to the same table are assigned the table's reference ID.

As for strings. This is where I'd like some assistance on. The idea is to use the same base conversion used on numbers as a byte-by-byte streaming process. This would be easy if the base were a power of 2. A conversion to a base-128 would mean an overhead of 12.5% instead of the theoretical 7.8% a base-254 conversion would offer.

For now and for the sake of speed, strings are encoded and decoded through use of escape lookups and a custom pattern passed to a single call to string.gsub().
__________________
WoWInterface AddOns
"All I want is a pretty girl, a decent meal, and the right to shoot lightning at fools."
-Anders (Dragon Age: Origins - Awakening)
  Reply With Quote
09-18-11, 02:17 PM   #2
SDPhantom
A Pyroguard Emberseer
 
SDPhantom's Avatar
AddOn Author - Click to view addons
Join Date: Jul 2006
Posts: 2,327
Any comments as well would be appreciated, such as comparison to the ideas of existing serializer libs like AceSerializer.

For additional info, here's a copy of the serialization specification created and used for my serializer code.
Code:
SerialLib was designed to form serialized strings that had the least overhead possible.

Most simple data types only take up one byte using specific type identifiers.
Simple data is any data that can be defined as a constant.

Numbers are encoded using a number conversion system that converts between theoretical bases.
The original base is looked at as a base-256 system in which each byte is a single digit.
The converter changes this to something like a base-248 system that allows number-shifting to occur to avoid unusable bytes.
The encoder is also designed for variable number length, using the same method systems use to encode strings.
This is so it can encode/decode several hundred bytes as easily as it can a single byte.
This is depending on system limits of course.

Simple data and their identifiers
	+	+Zero
	-	-Zero
	B	Boolean True
	b	Boolean False
	I	+Infinity
	i	-Infinity
	N	NaN
	n	nil
	s	Empty String
	|	Empty Table (Reference Pointers are created for this object type)

Complex data and their identifiers
	C	Binary-Encoded Lua Function (uses string.dump() and processes string result, Reference Pointers are created for this object type)
	D	+Integer
	d	-Integer
	E	+Exponential (Floating-Point)
	e	-Exponential (Floating-Point)
	F	+Fractional (Floating-Point)
	f	-Fractional (Floating-Point)
	r	Reference Pointer (points to previously-encountered object data)
	S	String
Note: On exponential floating-point numbers, the encoder tries to encode as an integer.
It'll encode it as an exponential number if it exceeds the point in which the internal data loses precision on an integer level.
This threshold is equal to ±2^53 and is a limit on the data type Lua uses (64-bit double-precision floating-point), not the serialization format.

Structural data and their start/end identifiers
	{	}	Tables  (Reference Pointers are created for this object type)

Serialized data is created by stringing multiple pieces together.
Deserializers are expected to have a variable number of returns and return exactly the same number of serialized pieces in a string.
Serializers are expected to encode tables as pairs of pieces representing key/value pairs and numerical indices to appear in ascending order before other indices.
Serializers are also expected to write a nil identifier for any unsupported data. nil identifiers are forbidden in table data and any key/value pair containing at least one should be omitted from entry.
Note the specification does list a format for Lua functions, but the WoW API blocks the recreation of functions using binary data. For now, it's recommended to write a nil identifier instead. Also note negative zeroes do exist in Lua, see specification IEEE 754-1985.
__________________
WoWInterface AddOns
"All I want is a pretty girl, a decent meal, and the right to shoot lightning at fools."
-Anders (Dragon Age: Origins - Awakening)

Last edited by SDPhantom : 09-19-11 at 12:37 PM.
  Reply With Quote
09-18-11, 02:27 PM   #3
Vlad
A Molten Giant
 
Vlad's Avatar
AddOn Author - Click to view addons
Join Date: Dec 2005
Posts: 793
This is relevant to my interests.
  Reply With Quote
09-19-11, 12:39 PM   #4
SDPhantom
A Pyroguard Emberseer
 
SDPhantom's Avatar
AddOn Author - Click to view addons
Join Date: Jul 2006
Posts: 2,327
In my infinite wisdom, I had forgotten to write in the boolean codes under Simple Identifiers.
This is now fixed.
__________________
WoWInterface AddOns
"All I want is a pretty girl, a decent meal, and the right to shoot lightning at fools."
-Anders (Dragon Age: Origins - Awakening)
  Reply With Quote

WoWInterface » Developer Discussions » Lua/XML Help » Coding an efficient serializer


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off