Thread Tools Display Modes
05-14-06, 10:55 PM   #1
chuckg
A Fallenroot Satyr
 
chuckg's Avatar
AddOn Author - Click to view addons
Join Date: Jul 2005
Posts: 26
Garbage Collection and Local Variables

I don't consider myself an expert at LUA, code, or general coding practice so I lay myself at your mercy to help show me the way. In my experience, garbage collection occurs when local variables (used within a function, class, etc) become no longer 'needed' by the system. As such, it's information is dropped into the garbage bin to be recycled later.

From what I've read, everything in LUA is linked in some way (I think). So a:

Code:
variable = nil;
and
Code:
variable2 = nil;
have in a sense, a pointer to the same data. In this way, LUA seeks to optimize data collection? Am I wrong? My problem is this, lets say I have the following psuedo function:
Code:
function x()

local array = {};
local arraySize = 20;

for i = 0, arraySize do
     array[i] = "This is some data on line " .. i .. " in array{}.";
end

end
At the end of this function, the data in the local array is dumped into the garbage collection. My first instinct would be to just simply set the "array = nil;", but what would this create? Garbage in the same fashion? I guess what I am seeking is a method of clearing out local variables in the cleanest way so as to create the littlest impact on GC.
How can I minimize the the amount data that is GC'ed from the array, or any variable for that matter, so that the amount of data dumped comes at a minimal cost to the user?

Sorry for the scrambled thoughts and thank you for your help.

Last edited by chuckg : 05-14-06 at 11:28 PM.
  Reply With Quote
05-15-06, 03:59 AM   #2
Floodly
A Kobold Labourer
Join Date: May 2006
Posts: 1
Exclamation

Originally Posted by chuckg
I don't consider myself an expert at LUA, code, or general coding practice so I lay myself at your mercy to help show me the way. In my experience, garbage collection occurs when local variables (used within a function, class, etc) become no longer 'needed' by the system. As such, it's information is dropped into the garbage bin to be recycled later.

From what I've read, everything in LUA is linked in some way (I think). So a:

Code:
variable = nil;
and
Code:
variable2 = nil;
have in a sense, a pointer to the same data. In this way, LUA seeks to optimize data collection? Am I wrong? My problem is this, lets say I have the following psuedo function:
Code:
function x()

local array = {};
local arraySize = 20;

for i = 0, arraySize do
     array[i] = "This is some data on line " .. i .. " in array{}.";
end

end
At the end of this function, the data in the local array is dumped into the garbage collection. My first instinct would be to just simply set the "array = nil;", but what would this create? Garbage in the same fashion? I guess what I am seeking is a method of clearing out local variables in the cleanest way so as to create the littlest impact on GC.
How can I minimize the the amount data that is GC'ed from the array, or any variable for that matter, so that the amount of data dumped comes at a minimal cost to the user?

Sorry for the scrambled thoughts and thank you for your help.
Setting 'array = nil;' will not accomplish anything. LUA, as all pure-reference languages, uses reference counting to determine if an object is in use. This means (although it seems you may be struggling with the concept a bit) that variables and data are inherently separated. Every time you set "A = B", what you're really doing is "make A reference the same object that B does, incrementing this object's reference count by one." In languages like LUA and Python, there is no real way to "assign" data to a variable; everything is a reference.

Some languages collect immediately when an object's reference count decrements to 0. I believe LUA's GC occurs at some indeterminate point later, however the basic philosophy remains the same. When an object (and *everything* is an object) is no longer referenced by any variables it will (at some point) be collected (excepting interned constants which just hang out forever because they get used so often).

In the case of local variables, an automatic reference is bound by the call stack frame. When your function exits, as long as you are not returning a reference to a local, the count is decremented to 0 and thus automagically eligible for collection.

Keep in mind that, as a logical requirement of purely referencing languages, all strings are immutable. This means that:

Code:
for i = 0, 100 do
  foo[i] = "Bar_" .. i;
end
Produces up to 200 unique string objects (auto-conversion of i included -- however in reality its more like 100, as most low order integers are already "stringified") which are separate from the "foo" array; residing in the string pool. When a new string is "created", the string pool (depending on application optimization) may be searched so that a new string need not be created. This pool is, likewise, reference counted but may fragment more easily and/or collect on a different schedule than fixed-sized objects. Unnecessary ad-hoc string creation is very expensive GC-wise.

The other side-effect of immutable strings is that:

Code:
foo = "bar";
foo = foo .. "baz";
Results in the creation of two strings because the first ("bar"), once created, cannot be changed so a brand new one must be created on each iteration.

If possible, when designing functions that will return complex/large strings, store data components in arrays until all work is completed and then join the final output (rather than iteratively appending pieces with the ".." operator).
  Reply With Quote
05-15-06, 11:11 AM   #3
Iriel
Super Moderator
WoWInterface Super Mod
Featured
Join Date: Jun 2005
Posts: 578
Originally Posted by Floodly
Setting 'array = nil;' will not accomplish anything. LUA, as all pure-reference languages, uses reference counting to determine if an object is in use. This means (although it seems you may be struggling with the concept a bit) that variables and data are inherently separated. Every time you set "A = B", what you're really doing is "make A reference the same object that B does, incrementing this object's reference count by one." In languages like LUA and Python, there is no real way to "assign" data to a variable; everything is a reference.
It's true that the explicit nil is unnecessary when you're about to descope the local variable, though it can be a handy technique for long lived locals and globals if you want to release an object to the collector.

It's NOT correct, however, to say that Lua is a pure reference language, depending on your data type you may have a value or a reference, though for a given type something is always a value or always a reference (The 'value' types are: nil, boolean, number, light userdata).

It's also not true to say it uses reference counting, the garbage collector is a mark-and-sweep model, since reference counting fails to compensate for self-referential-but-otherwise-unreferenced objects.

Originally Posted by Floodly
Some languages collect immediately when an object's reference count decrements to 0. I believe LUA's GC occurs at some indeterminate point later, however the basic philosophy remains the same. When an object (and *everything* is an object) is no longer referenced by any variables it will (at some point) be collected (excepting interned constants which just hang out forever because they get used so often).
While it is true that Lua's collector only runs when the memory use hits the current GC threshold (or a GC run is forced with collectgarbage()), the criteria for collection is actually that the object becomes 'unreachable'; collected objects can still have references, but those references will be from other collected objects. There's a slight extra wrinkle on this with respect to 'weak' tables, which are those where the keys and/or values can be collected if there are no reachable non-weak references.

Originally Posted by Floodly
In the case of local variables, an automatic reference is bound by the call stack frame. When your function exits, as long as you are not returning a reference to a local, the count is decremented to 0 and thus automagically eligible for collection.
This is true for locals which are not bound as upvalues by any referenced closures, but it's possible to make local values in Lua that stick around for some time, consider:

Code:
function a()
  local y = 7;
  return function() return y; end, function(x) y=x; end;
end
Originally Posted by Floodly
This pool is, likewise, reference counted but may fragment more easily and/or collect on a different schedule than fixed-sized objects. Unnecessary ad-hoc string creation is very expensive GC-wise.
The string table is collected at the same time as the rest of the environment, though the table itself has limits on how much it will shrink in a single GC pass (to avoid constant re-growth in a highly volatile environment)

Originally Posted by Floodly
If possible, when designing functions that will return complex/large strings, store data components in arrays until all work is completed and then join the final output (rather than iteratively appending pieces with the ".." operator).
Very true, BUT note that Lua optimizes statements like

y = a .. b .. c .. d .. e .. f;

To not use intermediates, but instead perform a single concatenation (Provided the resulting string is shorter than 8K or so, otherwise it needs to allocates working buffers)

Last edited by Iriel : 05-15-06 at 11:13 AM. Reason: Removed accidental extra quote, fixed negation
  Reply With Quote

WoWInterface » Developer Discussions » General Authoring Discussion » Garbage Collection and Local Variables


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off