Friday, July 24, 2009

Garbage in and Rotting

Remember garbage in, garbage out? With software, it is sometimes memory in and rot. Although memory leaks are prevalent, there is another class of memory problems that will cause us to consume a lot of the heap space and overwhelm the garbage collector mechanism.


One truth of computer science is that innovation usually causes new problems. With a Garbage Collector (GC), we save a lot of time not releasing memory by hand. There are new issues which can cause a different type of memory leak. Basically long lived temporary variables or large numbers of temporary variables that cause churn of the GC or long term memory use that could be much shorter than is required.


We generally understand and can spot big GC memory leaks, but the temporary ones and those that overwhelm the GC are harder to find. There is also a class of GC problem that can be almost as bad as a memory leak: Long-life temporary objects (LLTO).

An LLTO is something created, but not GC'd for a very long time. The issue usually arises in loops or recursive or deeply nested method calls. Because an object does not get dereferenced until it goes out of context or explicitly nulled, a longer life is given than is necessary. The problem gets worse as nested data is used deep in calls or there are long nested loops were the virtual machine and compiler really can't figure out exactly when objects are ready for a GC.

Loops can be long lived operations. As developers we don't think about memory management that often because Java does a good job, however, the time before memory sees GC can cause a problem. If there is not an explicite setting of a value to null, the program can not assume that the memory is ready for GC.


Our first example seems ok. but it has quite a few problems.



void main(Blob table[]){
Blob xyz = null;
for(int i = 0;i < table.length; i++){
xyz = table[i];
... do something with xyx
... do something else
}
}

In our rewrite, we just simply null the content of the array. This allows the object to be ready for GC as soon as we complete the operation. This also ensures that xyz is ready for GC before we continue to do other things in the loop.

void main(Blob table[]){
Blob xyz = null;
for(int i = 0;i < table.length; i++){
xyz = table[i];
table[i]= null;
... do something with xyx
xyz = null;
... do something else
}
}


Here is another example of poor GC-able code:



void main(){
BigZ z = new BigZ(true);
... things to do with z
boolean status z.getStatus();
space.foo(z);// Long operation
// z may now be GC'd
}
class Space{
void foo( BigZ z){
if (x.getStatus() = true){
... do long operation
}else{
... do a different long operation
}
}
}

The problem is that the BigZ data is not garbage collected until the foo method is called. In addition, if foo() were badly written and the reference to z was put into a local container, we could easily have a memory leak.



void main(){
BigZ z = new BigZ(true);
... things to do with z
boolean status z.getStatus();
z = null;// z is now ready for GC
space.foo(z);// Long operation
}
class Space{
void foo( boolean status){
if (status = true){
... do long operation
}else{
... do a different long operation
}
}
}

The result of the new code is that z is GC'd before the call, rather than some time after the operation. You might be tempted to null it in foo(), but there would still be a usage count on z until the call returns.
This example can not apply to all cases, but it can apply to many. The hard part is examining the usage of the contents of complex objects, especially when dealing with state that may not be related to a need to keep the object in memory for another task.


Best practices are :
1) Set objects to null when your context no longer needs the value.
2) Set array indexes to null when done with them.
3) If possible, use prune-able structures (trees) to further reduce memory as objects are no longer required.
4) Re use of instance variables is not enough, null them as soon as their values are no longer needed as loops and calls could delay the GC until it is reassigned.
5) Avoid allocating large hunks of memory as it is better to lock and iterate rather than clone to avoid corruptions. Balance memory use over multi-tasking.
6) Avoid creating containers just for the sake of making loops easier. If you have a complex structure, create a specific thread safe iterator.



Things to think about:

1) GC can be quicker if memory is not released in huge chunks
2) The sooner memory becomes a candidate, the sooner it can be scavenged in GC.
3) If there is too much memory to recover in one cycle, GC may cause heap to grow.
4) Reducing memory consumption with temporary objects will reduce heap growth.

No comments:

Post a Comment