Boys Book of Computer Science: July 2009

Thursday, July 30, 2009

Build software from straws

Think about a building architect and a builder. The customer wants a ten story building. The architect design the building on paper with a pen. The drawing, to the builder looks like a lot of lines. You could use string, straws, or toothpicks to build the building.

Sadly, this is how many developers write software. They should use steel-like programming, but they substitute fragile code because it does the job.

Building a building out of straws will do the job, but it fails eventually.

Tuesday, July 28, 2009

UML in the Post XP World

I was catching up on UML this week. I'm a bit amazed by what I read. Most of it sounded like the short sighted and outright fear-mongering. A lot of FUD too. The good news is that there is less FUD and more people are seeing that Agile as it is practiced (because 10 to 1 your environment won't allow it).

Let's start with what I found in a Slashdot thread :

The way I use UML is as way to select projects I want to participate in. If it uses UML, I'm out. The correlation of using UML with rigid authoritarian organization and fighting with "productivity enhancers" rather than developing software is too high.

Yikes! What this tells me us that the guy has a fear of tools based on the failure of other tools. I've used crappy tools myself, but I don't let that paralyze me with fear. Funny he uses the word 'correlation' as if he actually has definitive data. If only the world was better at critical thinking.

Next was the response from one fellow in the thread that said this:

...I won't hire any developer who refuses to use UML since I'll assume that s/he is lacking in essential software engineering skills and is a "code first, understand the problem later" sort of person.

You can respect a developer for what tools and techniques they will or will not use. Some folks are hackers and its not necessarily compatible with employment with those of us that like to think and plan before doing.

But what caused one developer to hate UML and the other to love it? Do a quick search on "UML sucks" on google and you will see a lot of arguments for and against. My thought is that this is all about a combination of silver bullets and the pressure to succeed at work.

The silver bullet is of course the idea that UML contributes to better designs and, we would hope, reduce the development cycle. UML, by capturing a design's state, should help with both long term support and reuse too. It all sounds so good!

The truth is that UML can do all these things, but you have to work for it. Nothing is free. Back to that original comment. The author probably was less disappointed in the tools than the fact that there is no free lunch. Tools can only help, they cannot eliminate thinking through hard problems.

Silver bullet or not, you have to have a good gun to shoot it with and an excellent marksman. You also can't kill an elephant with a small caliber bullet.

UML does not create designs on its own, so you need experienced designers. Not all tools are equal to the task, for example, Visio can't help you generate code and some high end UML tools are frankly hard to use or let you do stupid things. More importantly, there is no one diagram that will represent a complex design, you have to create multiple at several layers of abstraction and modes (requirements, static, time/order,etc).

Another issue that is raising its ugly head are movements to UML 2.0 and beyond. I've been to meetings of the OMG to see the folks there work on the standard. It is as much about the base spec as it is to create profiles that solve specific problems in science and engineering. The trend seems to have the constant undercurrent of CASE (Computer Aided Software Engineering) and its new buzzword MDA (Model Driven Architecture).

MDA has a lot of base assumptions and some amazing leaps of faith. First though, let me say that MDA can and does work for very specific applications. On the other hand, MDA is not something the average developer will use and should stick to the world we know of design and code. MDA is modeling in its purest form and by definition, models are precariously difficult to represent in the real world. The successful MDA tools work with models that are easily transformed.

Why UML?

Why not? Why English? English is not perfect, but we seem to muddle by. The key is that UML has most of what we need to describe the important bits.

What about MDA, creating complex diagrams?

Well, here is a rabbit hole that a lot of folks go down, Model Driven Architecture (MDA) or not. It is as old as modeling itself. The issue often is that designers try to model 'all' of their design. The problem is that complexity does not always aid understanding. The ultimate MDA is to model to the point that you produce 'all' of your code.

'All' of your code? From diagram to code at a push of a button? MDA is not that mature - unless you listen to an MDA vendor :o) However, for a certain subset of your code this is perfectly logical. There are some things that are mundane enough to generate. The reality though is that in many cases, code is more appropriate than the complexity of diagrams required for MDA. It is also true that UML is not really a good medium for certain details that do read better in code than diagrams. The key however is to mix the two.

So what do you model? The key is that you model the coding activities. Primarily we want to capture use cases and sometimes activity and certainly class diagrams. We want to do a lot of the static structure in UML and more dynamic pieces in code. We also want to automate the POJO creation and properly document interactions. For complex activities, we want to create multiple views. The complex systems we should model the dynamic nature of behavior in the activity , state, sequence, and other diagrams.

But this is more than design leading to code. That would be fine, but it is how you get to code and ensuring it is good code. Visual diagraming can really add value to your process simply because it is easier to see certain aspects. I have reversed a lot of code into UML and you can imagine how many times I saw issues invisible to on-the-ground coders or the Agile leads drinking coffee by the burn barrel.

Here is one that puzzles me from another thread:

For me it's better to draw a class (sequence) diagram on a sheet of paper and (burn it :) explain the rest in conversations.

Burn? Ok, so the guy likes class and sequence diagrams, but burn it when you are done? Wow! Imagine you are designing a car or building, do you think those folks burn the design? Crazy, right? So why would this guy burn good hard work? Again this is all down do a developer not understanding the value of the work he has done in the long term and understanding the tools.

Is this to hide bad design? Job security? Some other deep psychological problem? Some kind of narcissism to give the illusion of control? I have heard this statement many times, so it is not just one person. Luckily it is a trend that is losing traction of late because it is fine for the developers, but seems to loose traction up hill from the businesses. That's good news. More people are buying tools and actually buying the training to use them.

The case for MDA is reducing the resistance to tools. MDA is not a silver bullet, but it does help. The second area is tracing the business requirements to execution. Quite simply businesses are more comfortable with what they can see. It slow helps that the process is easier for the developers at the end of the line to use tools to feed back up the chain.

Please understand that Agile is good in many respects. However it breeds another type of Agile that is sloppy, inaccurate, and usually requires rewriting of code (luckily the designs were already disposed of or never existed). It is one thing to have one week iterations and quite another to do so based on a structured plan with forward looking designs and tracking to your requirements. Perhaps we need a word for bad Agile? Cowboy-Agility?

Cowboy-Agility - The reduction or elimination of all possible development process to reduce the burden on the programmer to think no farther than writing the next line of code.

According to the Agile philosophy, one is supposed to stop parts of the process if they are not working. The problem is that in most organizations there is a blind spot that causes many processes to fail even if they are good. Agile didn't start with burning designs, the cowboys trying to ride roughshod to succeed did that. The problem is that success in the short term (or rather code for code's sake) is not good in the long term. Even saying we will 'refactor' the hacking later, misses the point.

Software design is still hard work. There are no silver bullets, but there are bullet molds and guns that follow the standards for those bullets.

Immutable Objects

An immutable object is an object, that once created, stays in the same state. Immutable objects improve thread safety, aid security, and avoid unauthorized changes to state.

Immutable objects are objects, that once created, stays in the same state. Immutable objects improve thread safety, aid security, and avoid unauthorized changes to state. There are many immutable types in Java: String, Integer, Float, etc.

Thread safety is assured for such objects as the only one thread can create these objects and the state of the object is static and unchangeable until the object is dereferenced (i.e. ready for garbage collection).

Please note that some make a mistake in believing something is immutable but has a reference to a changeable object. For example, the following class is not thread safe because the creator of Foo or any thread that calls getList() can add/remove contents of the list. Note also that the synchronized method is both useless and inappropriately reducing liveness.


@notthreadsafe
public class Foo{
  private ArrayList list;
  public Foo( ArrayList list){
    this.list = list;
  }
  public synchronized ArrayList getList(){
    return list;
  }
}

Friday, July 24, 2009

Garbage in and Rotting

Remember garbage in, garbage out? With software, it is sometimes memory in and rot. Although memory leaks are prevalent, there is another class of memory problems that will cause us to consume a lot of the heap space and overwhelm the garbage collector mechanism.

One truth of computer science is that innovation usually causes new problems. With a Garbage Collector (GC), we save a lot of time not releasing memory by hand. There are new issues which can cause a different type of memory leak. Basically long lived temporary variables or large numbers of temporary variables that cause churn of the GC or long term memory use that could be much shorter than is required.

We generally understand and can spot big GC memory leaks, but the temporary ones and those that overwhelm the GC are harder to find. There is also a class of GC problem that can be almost as bad as a memory leak: Long-life temporary objects (LLTO).

An LLTO is something created, but not GC'd for a very long time. The issue usually arises in loops or recursive or deeply nested method calls. Because an object does not get dereferenced until it goes out of context or explicitly nulled, a longer life is given than is necessary. The problem gets worse as nested data is used deep in calls or there are long nested loops were the virtual machine and compiler really can't figure out exactly when objects are ready for a GC.

Loops can be long lived operations. As developers we don't think about memory management that often because Java does a good job, however, the time before memory sees GC can cause a problem. If there is not an explicite setting of a value to null, the program can not assume that the memory is ready for GC.

Our first example seems ok. but it has quite a few problems.


void main(Blob table[]){
   Blob xyz = null;
   for(int i = 0;i < table.length; i++){
      xyz = table[i];
      ... do something with xyx
      ... do something else
   }
}

In our rewrite, we just simply null the content of the array. This allows the object to be ready for GC as soon as we complete the operation. This also ensures that xyz is ready for GC before we continue to do other things in the loop.


void main(Blob table[]){
   Blob xyz = null;
   for(int i = 0;i < table.length; i++){
      xyz = table[i];
      table[i]= null;
      ... do something with xyx
      xyz = null;
      ... do something else
   }
}

Here is another example of poor GC-able code:


void main(){
   BigZ z = new BigZ(true);
   ... things to do with z
   boolean status  z.getStatus();
   space.foo(z);// Long operation
   // z may now be GC'd
}
class Space{
   void foo( BigZ z){
      if (x.getStatus() = true){
          ... do long operation
      }else{
        ... do a different long operation
      } 
   }
}

The problem is that the BigZ data is not garbage collected until the foo method is called. In addition, if foo() were badly written and the reference to z was put into a local container, we could easily have a memory leak.


void main(){
   BigZ z = new BigZ(true);
   ... things to do with z
   boolean status  z.getStatus();
   z = null;// z is now ready for GC
   space.foo(z);// Long operation
}
class Space{
   void foo( boolean status){
      if (status = true){
         ... do long operation
      }else{
         ... do a different long operation
      }
   } 
}

The result of the new code is that z is GC'd before the call, rather than some time after the operation. You might be tempted to null it in foo(), but there would still be a usage count on z until the call returns.
This example can not apply to all cases, but it can apply to many. The hard part is examining the usage of the contents of complex objects, especially when dealing with state that may not be related to a need to keep the object in memory for another task.

Best practices are :
1) Set objects to null when your context no longer needs the value.
2) Set array indexes to null when done with them.
3) If possible, use prune-able structures (trees) to further reduce memory as objects are no longer required.
4) Re use of instance variables is not enough, null them as soon as their values are no longer needed as loops and calls could delay the GC until it is reassigned.
5) Avoid allocating large hunks of memory as it is better to lock and iterate rather than clone to avoid corruptions. Balance memory use over multi-tasking.
6) Avoid creating containers just for the sake of making loops easier. If you have a complex structure, create a specific thread safe iterator.

Things to think about:

1) GC can be quicker if memory is not released in huge chunks
2) The sooner memory becomes a candidate, the sooner it can be scavenged in GC.
3) If there is too much memory to recover in one cycle, GC may cause heap to grow.
4) Reducing memory consumption with temporary objects will reduce heap growth.

Wednesday, July 1, 2009

Anthropomorphic Software

Software is seen as living, and thus responsible for its own bad behavior.

People talk to their dogs. Dogs know a few commands, but the average 2 year old is way smarter.

People talk to their cats... Enough said about that!

But programmers really talk a lot with their software.

"There is something wrong with the code." That is talking about code like a living thing.

What you should say is, "There is something wrong with the code I wrote."

This isn't just semantics or being a language nazi. Think about how you feel when you say each sentence. Think about good programmers and bad programmers. What type of sentence structure did they use?

The more you think of software as 'alive', the less you will look to the real issue: The programmer. The programmer is alive. The programmer is a human and full of faults, assumptions, and general sloppiness.

Don't sit there studying the manic depressive software bugs. Study people because that's why software looks manic depressive.

Java Exceptions and Cognitive Dissonance

Cognitive dissonance is expecting one thing that you hold as a deep belief and seeing something else that flies in the face of your belief and the uncomfortable feeling you get when it happens. Cognitive dissonance is a coping mechanism of the brain that you might imagine is meant to help us not believe in Santa, the Tooth Faerie, and the End of the World. At some point you see evidence and your fantasies go 'pop' and you are back to reality.

Sadly, belief is stronger than cognitive dissonance. It is why cults still exist. It is why deprogrammers are still in demand. It is why there is so much pseudoscience in the world.

As an example, many end of the world cults are so messed up when the world does not actually end, they go a little nutty. They just can't believe that they were wrong. Many just create a new fantasy and set a new date for the end of the world. Sure, a few wake up and leave the cult, but most hang on.

Programmers have the same problem. Take exceptions. Please, take them! They are great... except when caught and ignored. 99% of most crappy code is caused by developers not properly catching and 'handling' errors. They see something like exception handling and assume it is somehow rare or that because the catch does not force you to write code, there is no need to write the code.

The cognitive dissonance happens when the application becomes unstable and/or crashes. They may even log the exceptions, but for some reason, they can't understand why the code fails. They justify their decisions with fantasies like the exceptions being rare or impossible or better yet, un-handleable. They feel uncomfortable, but they just can't seem to come to terms with the idea that they failed to add error handling.

Like an end of the world cult, they just set another date when their application will work and wait for it to happen. Sadly the day never comes. They still believe in a fantasy, despite the evidence in front of their face.

What if we changed the keyword from 'catch' to 'handle'? Would the world would be a better place? Maybe that is the first thing you should say at a code review? Think of it as the first tool in your programmer deprogrammer toolkit!

Boys Book of Computer Science