
Recently there has been an explosion of “Open Space” events and sessions at conferences across the world. Suprisingly (or not-so-suprisingly) the results of these aren’t always this wonderful touchy-feely explosion of experience and sharing that open space events are made out to be. Why is that?
First, there are actually two types of events labeled in the “Open Space” realm:
The problems come when people fail to recognize what type of event they are trying to run, and slap a label on it without understanding the connotations of the name.
Let’s give some scenarios:
With just a little bit of context, one can choose the right type of event for the crowd at hand. Sadly, many times organizers prefer to just provide something without the understanding of the purpose, or miss out on wonderful collaboration opportunities because of bad experiences with mis-named events.
So the next time you are planning on an “Open Space”, give thought to what your goals are, and really consider if what you are doing is appropriate to your attendees, your conference, and the topics at hand.
Our application supports both SQL and Oracle users. To do this, there are certain things that we need present at runtime – like an Oracle Provider. But, in our case, we require a certain version or higher due to the API calls we need to make.
During testing for our upcoming release, we found a bug where if someone tried to create a connection to an Oracle database, and didn’t have the Oracle Client tools, the app would crash – at some random time. Sometimes it was right after they tried the test from our app. Sometimes it wasn’t until they closed the app. But all points showed exactly where and why it was happening – sort of.
Firing up my trusty WinDBG and connecting to our app, I saw that when it crashed, we got the following CLR exception:
(9e4.bc0): CLR exception - code e0434f4d (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00c0f524 ebx=e0434f4d ecx=00000000 edx=7c8285ec esi=00c0f5b0 edi=0016dd50
eip=77e4bee7 esp=00c0f520 ebp=00c0f574 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
KERNEL32!RaiseException+0x53:
77e4bee7 5e pop esi
0:002> !printexception
Exception object: 0bb1e114
Exception type: System.TypeInitializationException
Message: The type initializer for 'Oracle.DataAccess.Client.OracleConnection' threw an exception.
InnerException: Oracle.DataAccess.Client.OracleException, use !PrintException 0bb169cc to see more
StackTrace (generated):
StackTraceString:
HResult: 80131534
0:002> !clrstack
OS Thread Id: 0xbc0 (2)
ESP EIP
00c0f628 77e4bee7 [GCFrame: 00c0f628]
00c0fc10 77e4bee7 [PrestubMethodFrame: 00c0fc10] Oracle.DataAccess.Client.OracleConnection.Dispose(Boolean)
00c0fc20 7a572eb5 System.ComponentModel.Component.Finalize()
The interesting part of the above is that last line. The exception just before the crash is happening on the Dispose method which is being called by the Finalizer. Recall that in .NET 2.0 and higher, unhandled exceptions on the finalizer thread kill the thread, taking down the runtime with it, since there is no longer a thread to handle garbage collection finalization.
Our app was basically just doing a return new OracleConnection(); so that meant that the exception was in the Oracle API itself. And sure enough, here’s what happens in the constructor of OracleConnection (according to Reflector):
public OracleConnection()
{
if (!OracleInit.bSetDllDirectoryInvoked)
{
OracleInit.Initialize();
}
if (!OraTrace.m_RegistryRead)
{
OraTrace.GetRegistryTraceInfo();
}
if (OraTrace.m_TraceLevel != 0)
{
OraTrace.Trace(1, new string[] { " (ENTRY) OracleConnection::OracleConnection(1)\n" });
}
this.Initialize();
if (OraTrace.m_TraceLevel != 0)
{
OraTrace.Trace(1, new string[] { " (EXIT) OracleConnection::OracleConnection(1)\n" });
}
}
Peeking at the exceptions that get thrown when we try to test the connection, I can see that OpsInit.CheckVersionCompatability throws a TypeInitializerException, which gets caught by the Initialize method, which then turns around and throws an OracleException. Which then caused the constructor to throw an exception, and the object not to get created.
But there’s something subtle there. To my app, the OracleConnection object was never created. In fact, the return value is null. But, what do we see on the Heap?
0:002> !dumpheap -type Oracle
Address MT Size
0bb153e0 084b7e10 112
0bb169cc 084b8318 76
0bb16b14 084b87c4 24
0bb16b2c 084b89a4 32
total 4 objects
Statistics:
MT Count TotalSize Class Name
084b87c4 1 24 Oracle.DataAccess.Client.OracleErrorCollection
084b89a4 1 32 Oracle.DataAccess.Client.OracleError
084b8318 1 76 Oracle.DataAccess.Client.OracleException
084b7e10 1 112 Oracle.DataAccess.Client.OracleConnection
Total 4 objects
There, on the heap, is an Oracle.DataAccess.Client.OracleConnection. Which means that at some point, the Garbage Collector will clean it up, and in doing so, call the Dispose() method. But the object hasn’t cleaned itself up properly, so it throws an exception in Dispose.
The problem at this point is 3-fold
The problem is further compounded by OracleConnection being sealed, so we can’t just subclass it. It seems like at this point we are just stuck. Or are we?
We know from Reflector that the exception happens on the call to OracleInit.Initialize(). So perhaps we could just call that, and if it throws an exception, we know that the Oracle Client isn’t set up properly, and thus we don’t try to construct an OracleConnection object. But OracleInit is an internal class. So how do we get around that?
Reflection.
public static bool Oracle102OrHigherIsInstalled()
{
try
{
Assembly oracleAssembly = Assembly.LoadWithPartialName("Oracle.DataAccess");
if (oracleAssembly == null) { return false; }
Type oracleInit = oracleAssembly.GetType("Oracle.DataAccess.Client.OracleInit");
if (oracleInit == null) { return false; }
MethodInfo initialize = oracleInit.GetMethod("Initialize", BindingFlags.Public | BindingFlags.Static);
if (initialize == null) { return false; }
initial
ize.Invoke(null, null);
}
catch (Exception ex)
{
return false;
}
return true;
}
Internal or not, we can use reflection to get to that class. And since Initialize is a static method, we don’t even have to deal with getting an instance. We simply follow the same path that the OracleConnection constructor does, and if it fails, we know to not even try creating an instance of it.
The takeaway really is to make sure you are cleaning yourself up, and to absolutely ensure that Dispose never throws an unhandled exception. But even when you are dealing with third parties who don’t have that, you can still use tools like WinDBG, Reflector and .NET Reflection to find a workaround.
To many people, software projects are a lot like black magic. You are trying to build something that no one knows the contents of until it’s been built, in a way that responds to change, in some semblance of a reasonable timeframe and budget. Everything from the estimation process, to the status calls, to the change control, to the releases themselves vary so wildly that we actually applaud when a team "makes it". "Woo!" we say, holding parties, and sending mass emails, "We did our job!"
Because of this, we for some reason forget that there are very real strategies for dealing with the inherent risks in a software project. And I don’t just mean the "x = Programmer’s Estimate; ProjectTime = x*2.3 + 42" trick.
Since there are so many variables in getting a customer what they want, a wonderful strategy is to frequently release working software, letting the customer hold it in their hands, push all the buttons and generally see how wrong they were about what they wanted. And I’m not talking about a "beta" here – I’m talking about one week in, shipping something. Of course, there are two reasons why you don’t see this more often:
One of the things that can really trip up a team is the Lack of Trust (as documented in the wonderful book 5 Dysfunctions of a Team). The funny thing is, many teams adopting agile practices start with perhaps the worst practice of all – stand-ups. Effective stand-ups rely on a high level of trust and a willingness to put yourself out there, and to challenge others when they aren’t. A team who doesn’t have that typically finds stand-ups that end up not actually exposing impediments.
But it’s more than just stand-ups. People need to feel empowered to stop the line, and to report when things aren’t going well. For example, if you are using the Scrum methodology, and have iterations, the team commits at the beginning of the sprint of what they are going to accomplish. If they find they can’t accomplish it, and don’t raise whatever is causing them to not get there until the end, then there is a serious organizational impediment to communication that needs to be resolved.
Of course, communication is a two-way street. In addition to the teams feeling comfortable about talking, the organization – especially management – needs to be willing to listen, and to trust what they are hearing. And when the organization does run into trouble, to stay away from The Blame Game, and focus on feedback loops which actually improve the process.
Team members as well need to be willing to listen, both to what is being said to them, but also to the little clues that indicate something is wrong. Mitch Lacey has a great article about a team who messed up a demo, and found out that someone on the team knew it was going to happen – but it wasn’t raised due to a lack of trust and communication.
In any agile team, or any software team, one key tenant is a common goal. Without a clear strategic direction the team can rally behind, the work they are doing might be lost. On a smaller scale, this is why it is so vital to frame the work you are doing in the context of what you are delivering to a customer, either in the form of User Stories, or Minimal Marketable Features. Someone needs to be keeping their head up of how it ties together.
In fact, in larger scale agile adoptions, you typically do find it necessary to have a strategic team that helps organize the work. One must be cautious that they don’t turn into an ivory tower, but teams that do this well help weave that common thread which lets everything come together.
An example of this role done well is the Chief Product Engineer at Toyota. They know both the business and production sides, and are empowered to make decisions – but know how to keep everyone focused on the bigger goal.
Even if you are communicating, and releasing, and have a common goal, there still may be times when things are out of your grasp. For example, working with another team, or a third party. In these cases, having an exit strategy in mind can make the difference between an Ok release and a failure. These are some of the hardest things to talk about, because no one likes to admit that failure is possible. Worse, sometimes the exit strategies require actions to be taken before a "final deadline", giving the impression that if we just kept working a little harder, we could make it.
In the article No Exit, Don Gray outlines beautifully the causes of this, and how to prevent it from happening. In general, anything that has a risk factor, which is vital to the release, should have an exit strategy. This can be everything from software components, to deployments, to configuration.
For example, at one organization, we deployed daily to our production servers. We had a highly visible web application, used by hundreds of thousands of people daily. No matter how much testing we did, we knew there was always a risk. So we had a strategy that with one button we could switch all traffic from 16 servers to 8. We’d then deploy to the servers which had no load, and test. If it seemed fine, we’d push another button, and everyone would be switched to the 8 servers we just deployed to. We’d let that run for a bit to make sure everything was fine, and if it was, we’d deploy on the other half, and balance back out the traffic. If anything failed, we could back the whole thing out.
Another strategy example was a deployment of real-time polling updates for a large county. This was in 2000, and it was vital that it go smoothly. We had tested the software extensively, and felt confident in it, but had a backup plan where two of us had a set of pages we could update by hand that the polling stations could call into us in case the automated system broke down. Luckily we were able to just sit at work until 1am playing cards because everything did go smoothly, but if it hadn’t, we were ready.
Unfortunately, we don’t do the same for systems we work on all the time. What if that component team can’t get their UI finished? What if the vendor can’t get the bug fix in time? It may not be vital early on, but the thought process behind an exit strategy should exist, and simply become more refined the closer we come to the risk event, instead of scrambling to make one appear when it is clear we aren’t going to make it.
What all of this adds up to are clear ways you can begin to mitigate the risks of your projects. Sure, many other things will come up. A
nd there are lots of great papers on the subject of Risk Management. But, what it comes down to is shipping frequently, communicating often, and having an exit strategy for when things go wrong. Hoping that everything will be fine will only get you a bad delivery – or a new job.
When I posed the title of this to Twitter earlier, someone asked about Faith being a Risk Management Strategy. Faith is different than hope, although they get mixed in. Faith is saying, "Based on what I see or know, I’m confident that what I’m being told is true". It’s about trust. Hope is saying, "Even though it looks like things are horribly wrong, I’m sure we’ll make it".
In other words, faith is based on results. When those results falter, one loses faith in the results, and ultimately in what they are being told. Hope is suspending reality to make things seem better than they are. I hope I win the lottery, but I don’t have faith that I will.
The problem is that we’ve all heard, "I have faith you’ll do it" where "it" is some impossible thing. Teams hear this all the time, "I know you said this will take you 6 months, but I told management 3, and I have faith you all can do it by then." That’s not faith. Nor is it "motivation". It’s mismanagement.
There are many ways to ship software, and many teams who have had successful releases by shear will of force. But a responsible team will employ good risk management, communication and release strategies to ensure that no matter what change comes at them, they are ready for it.