Posted on June 22nd, 2009

Recently there has been an explosion of “Open Space” events and sessions at conferences across the world. Suprisingly (or not-so-suprisingly) the results of these aren’t always this wonderful touchy-feely explosion of experience and sharing that open space events are made out to be. Why is that?

First, there are actually two types of events labeled in the “Open Space” realm:

  • A big open space with a sign-up board where anyone walking by can jump into a conversation with whiteboards, markers, etc. Sessions in this type of event are extremely transient, and very much a “right-place, right-time” type of experience.
  • Open Space Technology” (OST) which is a much more focused event. OST is termed as a method for “running meetings of any size” which should give a clue to the focus.

The problems come when people fail to recognize what type of event they are trying to run, and slap a label on it without understanding the connotations of the name.

Let’s give some scenarios:

  1. At a recent global technology conference, there were specific topics being discussed. There were pockets of groups wanting to collaborate, but struggling to fit into the session schedules provided. In this context, since there was a defined business purpose, an OST type session would be appropriate. A half to full day would be set aside to have the participants understand the purpose, devise sessions, and ebb and flow to explore the topics and come to solutions.
  2. At a recent conference, an “open space” was provided for the several hundred attendees. The open space was run as an OST event, with formalities such as Walking the Circle, Closing Circle, and other ceremonies. However, since the group did not have a defined purpose, the OST style was not considered a success since the collaborative nature of an OST event couldn’t come into play.
  3. At another recent conference, an “open space” area was provided without predefined sessions. Stations were provided with tables, whiteboards, markers, flip charts and other items. A centralized “schedule board” was available for those who wanted to schedule something, but mostly it was used as a collaboration place in between and after sessions, and worked very well.
  4. At KaizenConf in Austin, TX, the entire conference was run as an OST-style event. Pre-sessions were given to give the participants a base level of knowledge, and a key theme was provided throughout the conference. The opening and closing circles helped hone in on the topics at hand and unite the threads that were formed.

With just a little bit of context, one can choose the right type of event for the crowd at hand. Sadly, many times organizers prefer to just provide something without the understanding of the purpose, or miss out on wonderful collaboration opportunities because of bad experiences with mis-named events.

So the next time you are planning on an “Open Space”, give thought to what your goals are, and really consider if what you are doing is appropriate to your attendees, your conference, and the topics at hand.

2 Comments


Posted on June 16th, 2009

Our application supports both SQL and Oracle users. To do this, there are certain things that we need present at runtime – like an Oracle Provider. But, in our case, we require a certain version or higher due to the API calls we need to make.

During testing for our upcoming release, we found a bug where if someone tried to create a connection to an Oracle database, and didn’t have the Oracle Client tools, the app would crash – at some random time. Sometimes it was right after they tried the test from our app. Sometimes it wasn’t until they closed the app. But all points showed exactly where and why it was happening – sort of.

Firing up my trusty WinDBG and connecting to our app, I saw that when it crashed, we got the following CLR exception:

(9e4.bc0): CLR exception - code e0434f4d (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00c0f524 ebx=e0434f4d ecx=00000000 edx=7c8285ec esi=00c0f5b0 edi=0016dd50
eip=77e4bee7 esp=00c0f520 ebp=00c0f574 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
KERNEL32!RaiseException+0x53:
77e4bee7 5e              pop     esi

0:002> !printexception
Exception object: 0bb1e114
Exception type: System.TypeInitializationException
Message: The type initializer for 'Oracle.DataAccess.Client.OracleConnection' threw an exception.
InnerException: Oracle.DataAccess.Client.OracleException, use !PrintException 0bb169cc to see more
StackTrace (generated):

StackTraceString:

HResult: 80131534

0:002> !clrstack
OS Thread Id: 0xbc0 (2)
ESP       EIP    
00c0f628 77e4bee7 [GCFrame: 00c0f628]
00c0fc10 77e4bee7 [PrestubMethodFrame: 00c0fc10] Oracle.DataAccess.Client.OracleConnection.Dispose(Boolean)
00c0fc20 7a572eb5 System.ComponentModel.Component.Finalize()

The interesting part of the above is that last line. The exception just before the crash is happening on the Dispose method which is being called by the Finalizer. Recall that in .NET 2.0 and higher, unhandled exceptions on the finalizer thread kill the thread, taking down the runtime with it, since there is no longer a thread to handle garbage collection finalization.

Our app was basically just doing a return new OracleConnection(); so that meant that the exception was in the Oracle API itself. And sure enough, here’s what happens in the constructor of OracleConnection (according to Reflector):

public OracleConnection()
{
    if (!OracleInit.bSetDllDirectoryInvoked)
    {
        OracleInit.Initialize();
    }
    if (!OraTrace.m_RegistryRead)
    {
        OraTrace.GetRegistryTraceInfo();
    }
    if (OraTrace.m_TraceLevel != 0)
    {
        OraTrace.Trace(1, new string[] { " (ENTRY) OracleConnection::OracleConnection(1)\n" });
    }
    this.Initialize();
    if (OraTrace.m_TraceLevel != 0)
    {
        OraTrace.Trace(1, new string[] { " (EXIT)  OracleConnection::OracleConnection(1)\n" });
    }
}

Peeking at the exceptions that get thrown when we try to test the connection, I can see that OpsInit.CheckVersionCompatability throws a TypeInitializerException, which gets caught by the Initialize method, which then turns around and throws an OracleException. Which then caused the constructor to throw an exception, and the object not to get created.

But there’s something subtle there. To my app, the OracleConnection object was never created. In fact, the return value is null. But, what do we see on the Heap?

0:002> !dumpheap -type Oracle
Address       MT     Size    
0bb153e0 084b7e10      112    
0bb169cc 084b8318       76    
0bb16b14 084b87c4       24    
0bb16b2c 084b89a4       32    
total 4 objects
Statistics:
      MT    Count    TotalSize Class Name
084b87c4        1           24 Oracle.DataAccess.Client.OracleErrorCollection
084b89a4        1           32 Oracle.DataAccess.Client.OracleError
084b8318        1           76 Oracle.DataAccess.Client.OracleException
084b7e10        1          112 Oracle.DataAccess.Client.OracleConnection
Total 4 objects

There, on the heap, is an Oracle.DataAccess.Client.OracleConnection. Which means that at some point, the Garbage Collector will clean it up, and in doing so, call the Dispose() method. But the object hasn’t cleaned itself up properly, so it throws an exception in Dispose.

The problem at this point is 3-fold

  1. If you are going to throw an exception in your constructor, you must ensure that you’ve cleaned yourself up properly
  2. An unhandled exception on the Finalizer thread takes down the runtime, and because the object isn’t following number 1, this will happen
  3. Because the exception is in the constructor, we never get a handle to the object, and thus can’t do anything

The problem is further compounded by OracleConnection being sealed, so we can’t just subclass it. It seems like at this point we are just stuck. Or are we?

We know from Reflector that the exception happens on the call to OracleInit.Initialize(). So perhaps we could just call that, and if it throws an exception, we know that the Oracle Client isn’t set up properly, and thus we don’t try to construct an OracleConnection object. But OracleInit is an internal class. So how do we get around that?

Reflection.

public static bool Oracle102OrHigherIsInstalled()
{
    try
    {
        Assembly oracleAssembly = Assembly.LoadWithPartialName("Oracle.DataAccess");
        if (oracleAssembly == null) { return false; }

        Type oracleInit = oracleAssembly.GetType("Oracle.DataAccess.Client.OracleInit");
        if (oracleInit == null) { return false; }

        MethodInfo initialize = oracleInit.GetMethod("Initialize", BindingFlags.Public | BindingFlags.Static);
        if (initialize == null) { return false; }

        initial
ize.Invoke(null, null);

    }
    catch (Exception ex)
    {
        return false;
    }
    return true;
}

Internal or not, we can use reflection to get to that class. And since Initialize is a static method, we don’t even have to deal with getting an instance. We simply follow the same path that the OracleConnection constructor does, and if it fails, we know to not even try creating an instance of it.

The takeaway really is to make sure you are cleaning yourself up, and to absolutely ensure that Dispose never throws an unhandled exception. But even when you are dealing with third parties who don’t have that, you can still use tools like WinDBG, Reflector and .NET Reflection to find a workaround.

1 Comment


Posted on June 10th, 2009

To many people, software projects are a lot like black magic. You are trying to build something that no one knows the contents of until it’s been built, in a way that responds to change, in some semblance of a reasonable timeframe and budget. Everything from the estimation process, to the status calls, to the change control, to the releases themselves vary so wildly that we actually applaud when a team "makes it". "Woo!" we say, holding parties, and sending mass emails, "We did our job!"

Because of this, we for some reason forget that there are very real strategies for dealing with the inherent risks in a software project. And I don’t just mean the "x = Programmer’s Estimate; ProjectTime = x*2.3 + 42" trick.

Frequent Releases of Working Software

Since there are so many variables in getting a customer what they want, a wonderful strategy is to frequently release working software, letting the customer hold it in their hands, push all the buttons and generally see how wrong they were about what they wanted. And I’m not talking about a "beta" here – I’m talking about one week in, shipping something. Of course, there are two reasons why you don’t see this more often:

  • Fear. "If the customers see the product, they will want changes! Nay! Demand them!" Or, "The customers will see it crash, and lose all hope in the product!" This can be overcome by, well, doing it more often. You want your users on your side, and I’d much rather find out after the first week that the main UI screen will never work because we have 80,000 users trained to a green screen, and they really need it to look just like that.
  • Skill Deficiencies – I recently heard someone say that their project was going to take 6 months before anyone could test anything. I don’t even remember the reason why. 6 months of coding? Really? That’s a skill deficiency, and if you are a manager, shouldn’t stand for it. Yes, there will be rare times when something isn’t building, but that shouldn’t last more than a day, tops. Our current product is 4.5 million lines of code, and we have had nightly releases every night for the past 9 months. Releases that were in the hands of the business the next morning that they could play with.

Frequent, Open and Honest Communication

One of the things that can really trip up a team is the Lack of Trust (as documented in the wonderful book 5 Dysfunctions of a Team). The funny thing is, many teams adopting agile practices start with perhaps the worst practice of all – stand-ups. Effective stand-ups rely on a high level of trust and a willingness to put yourself out there, and to challenge others when they aren’t. A team who doesn’t have that typically finds stand-ups that end up not actually exposing impediments.

But it’s more than just stand-ups. People need to feel empowered to stop the line, and to report when things aren’t going well. For example, if you are using the Scrum methodology, and have iterations, the team commits at the beginning of the sprint of what they are going to accomplish. If they find they can’t accomplish it, and don’t raise whatever is causing them to not get there until the end, then there is a serious organizational impediment to communication that needs to be resolved.

Of course, communication is a two-way street. In addition to the teams feeling comfortable about talking, the organization – especially management – needs to be willing to listen, and to trust what they are hearing. And when the organization does run into trouble, to stay away from The Blame Game, and focus on feedback loops which actually improve the process.

Team members as well need to be willing to listen, both to what is being said to them, but also to the little clues that indicate something is wrong. Mitch Lacey has a great article about a team who messed up a demo, and found out that someone on the team knew it was going to happen – but it wasn’t raised due to a lack of trust and communication.

Strategic Directions

In any agile team, or any software team, one key tenant is a common goal. Without a clear strategic direction the team can rally behind, the work they are doing might be lost. On a smaller scale, this is why it is so vital to frame the work you are doing in the context of what you are delivering to a customer, either in the form of User Stories, or Minimal Marketable Features. Someone needs to be keeping their head up of how it ties together.

In fact, in larger scale agile adoptions, you typically do find it necessary to have a strategic team that helps organize the work. One must be cautious that they don’t turn into an ivory tower, but teams that do this well help weave that common thread which lets everything come together.

An example of this role done well is the Chief Product Engineer at Toyota. They know both the business and production sides, and are empowered to make decisions – but know how to keep everyone focused on the bigger goal.

Exit Strategies

Even if you are communicating, and releasing, and have a common goal, there still may be times when things are out of your grasp. For example, working with another team, or a third party. In these cases, having an exit strategy in mind can make the difference between an Ok release and a failure. These are some of the hardest things to talk about, because no one likes to admit that failure is possible. Worse, sometimes the exit strategies require actions to be taken before a "final deadline", giving the impression that if we just kept working a little harder, we could make it.

In the article No Exit, Don Gray outlines beautifully the causes of this, and how to prevent it from happening. In general, anything that has a risk factor, which is vital to the release, should have an exit strategy. This can be everything from software components, to deployments, to configuration.

For example, at one organization, we deployed daily to our production servers. We had a highly visible web application, used by hundreds of thousands of people daily. No matter how much testing we did, we knew there was always a risk. So we had a strategy that with one button we could switch all traffic from 16 servers to 8. We’d then deploy to the servers which had no load, and test. If it seemed fine, we’d push another button, and everyone would be switched to the 8 servers we just deployed to. We’d let that run for a bit to make sure everything was fine, and if it was, we’d deploy on the other half, and balance back out the traffic. If anything failed, we could back the whole thing out.

Another strategy example was a deployment of real-time polling updates for a large county. This was in 2000, and it was vital that it go smoothly. We had tested the software extensively, and felt confident in it, but had a backup plan where two of us had a set of pages we could update by hand that the polling stations could call into us in case the automated system broke down. Luckily we were able to just sit at work until 1am playing cards because everything did go smoothly, but if it hadn’t, we were ready.

Unfortunately, we don’t do the same for systems we work on all the time. What if that component team can’t get their UI finished? What if the vendor can’t get the bug fix in time? It may not be vital early on, but the thought process behind an exit strategy should exist, and simply become more refined the closer we come to the risk event, instead of scrambling to make one appear when it is clear we aren’t going to make it.

Hope Isn’t a Risk Management Strategy

What all of this adds up to are clear ways you can begin to mitigate the risks of your projects. Sure, many other things will come up. A
nd there are lots of great papers on the subject of Risk Management. But, what it comes down to is shipping frequently, communicating often, and having an exit strategy for when things go wrong. Hoping that everything will be fine will only get you a bad delivery – or a new job.

What About Faith in the Team as a Strategy

When I posed the title of this to Twitter earlier, someone asked about Faith being a Risk Management Strategy. Faith is different than hope, although they get mixed in. Faith is saying, "Based on what I see or know, I’m confident that what I’m being told is true". It’s about trust. Hope is saying, "Even though it looks like things are horribly wrong, I’m sure we’ll make it".

In other words, faith is based on results. When those results falter, one loses faith in the results, and ultimately in what they are being told. Hope is suspending reality to make things seem better than they are. I hope I win the lottery, but I don’t have faith that I will.

The problem is that we’ve all heard, "I have faith you’ll do it" where "it" is some impossible thing. Teams hear this all the time, "I know you said this will take you 6 months, but I told management 3, and I have faith you all can do it by then." That’s not faith. Nor is it "motivation". It’s mismanagement.

There are many ways to ship software, and many teams who have had successful releases by shear will of force. But a responsible team will employ good risk management, communication and release strategies to ensure that no matter what change comes at them, they are ready for it.

5 Comments