Monday, June 16, 2008

Quick Decisions, in a Software Release and on the Staten Island Expressway

Caution: If you have a weak stomach and don’t like reading about baby puke and quick project decisions, you might not want to continue reading.

So, I guess my 18 month old daughter doesn’t like my driving. Since we don’t own a car and just rent a few times a year, her body probably isn’t prepared for the shock of it all. Regardless of how and why it happened, she had an unfortunate up-chuck while we were just slowing down in a huge mass of weekend traffic on our way back through Staten Island this past Sunday night. It was quite the event too… anyone who’s seen Monty Python’s “Life of Brian” can picture a similar scene. So, we’re stuck in the car with a wailing puke-stained baby and a 3 year old who keeps yelling, “stop crying, you make me sad!”

What to do? Do we try to pull over on the lack of a shoulder lane and try to change her into clean clothes? Do we pull off at an exit and see if we can find somewhere to stop and clean her up? Do we just try to keep driving and see how quickly we can get back to Brooklyn (granted, this is New York traffic on a Sunday night in June, seems like everyone left the city for the weekend and they’re all coming back now).

As I was weighing my options, I was reminded of similar situations that I’m in when I am managing software releases for my company. (Ok, so maybe I really didn’t drift off to such thoughts while in such a stressful predicament in the car, but it sounds funny so I’ll tell the story this way).

Our last large new feature launch was in the beginning of May, and we do regular bug releases and small feature launches. Inevitably things will go wrong, regardless of the amount of testing we do while on our QA and staging sites, there’s always something we find when we put the new work live. This can cause varying levels of panic, especially when it’s late at night and our project team wants to go home, or its early morning and we’re trying to finish up before our clients come into the office and want to start using their websites. When these things happen, we have a few different options –

1. Put a programmer on it ASAP and see if this new issue can be resolved and tested quickly and hope that no new related problems are created by the changes.

2. Roll-back the entire release and try again the next day, after spending time working on the new issue and resolving on the test sites.

3. Leave the issue on the live site, make the client aware of the issue and work on getting the fix thoroughly tested and live within the next couple of days.

It all depends on the severity of the issue, but what I can always count on is a project team looking to me to make the decision. I don’t always like making quick decisions, but I know it’s my job in these situations and I know my team is looking to me to make the call. It’s almost funny how much they count on me, on different occasions the programmers or the QA team and even the more senior members of the staff have shied away from making crucial decisions when it comes to software releases and potential impact on our clients. It’s as if nobody wants to be responsible if the decision made sends us into the deep dark sea, they’d prefer I be responsible and as the manager go down with my sinking ship. So far so good…we haven’t sunk yet. I wouldn’t call myself caption of the year, but we’ve been floating along somehow. This part of the job can keep things exciting, and terrifying at the same time.

So, what did we do with the sad sick baby in the car? We kept on driving, opened the windows up all the way to keep the smell from making me sick myself, and gave her a bottle of water. The bottle actually made her very happy, and within maybe 20 minutes she was calm and maybe even forgot that she was covered with slop. We figured since it was already late and she was tired, better to try to just get her home than risk getting stuck in an even worse crowd of traffic. We eventually got out of Staten Island and were able to get home to Brooklyn to get everyone hosed down. I was glad my husband and I (yes, he helped in the decision making, too) had decided not to stop because once I had started to clean the crime scene I found that it was a lot worse than her clothes, even in clean clothes she would have been sitting in a stink and gotten dirty all over again. Today she’s clean and happy and hopefully has forgotten the whole thing.

And can I even try to remember the show-stopping bug that plagued my project team the last time we did a release? Barely…

Thursday, June 5, 2008

The limits of quality, in a crane

I walked past a crane yesterday. I have to say, being New Yorker and hearing a little too regularly about the construction accidents that have been happening, makes me not want to walk so close to those cranes.

There were a few construction accidents earlier this spring, one of them involved a crane that was improperly secured to a building, falling onto a neighboring building. Very recently, a crane on 1st Ave and 91st Street collapsed onto a 23 story building and killed the crane operator.

Many critics will say that the problem is that there is too much construction going on and too much rushing to get the projects done. Seems to make sense to me, especially after reading about the repeated complaints and stop-work orders on that crane that collapsed most recently. If there was not such an urgency to get these new high rises up, then maybe the project could have stopped and the crane properly evaluated and replaced.

I've been especially sensitive to these types of issues lately because I've been reading the very interesting book titled "The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA", by Diane Vaughan. It's a slow but very interesting read, and from what I've got so far it's about the 'production' pressures faced in the organization and the 'amoral' judgement and behavior used by the managers there. In both of these cases, the team just wasn't cautious enough, or chose to ignore the warning signs.

What stunned me was the statement New York Mayer Bloomberg made after the accident, something like (not exact quote) "there are only so many inspections that can be made and so many inspectors, sometimes problems aren't caught." When it comes to ensuring the quality of your system when the failure of that system can mean loss of human life, then the warnings have to be taken extremely seriously. I'm not exactly an insider in the construction world, but it looks like the warning signs were not taken seriously enough.

In my software projects we take quality assurance and control seriously and are trying to find ways to improve what we're doing.

Do we miss things?
Of course!

Do we put new bugs into the system when we launch a new project, I think I've got a perfect record on that one, there's always something we have to fix post launch. I'm happy to be in the field where a bug in my system will not cause the loss of human life. I guess I expect a little more from the more dangerous fields of work out there.

Tuesday, June 3, 2008

The Good, the Bad, and the Ugly: The Reality of Risks in Your Projects

I just finished a great book, titled "Waltzing with Bears, Managing Risk on Software Projects" by Tom DeMarco and Timothy Lister. Risk is something we all deal with in our day to day lives, and probably don’t deal with enough in our professional work. The authors open with the idea that if a project doesn’t have any risk to it, it’s just not worth taking. This notion is especially applicable to the interactive design field that I work in at Flightpath.

The authors share the analogy presented by risk management expert Bob Charette, who proposes imagining your company and its competitors as a set of down escalators.

“You are obliged to climb up the escalator, which is moving against you. And your competitors are doing the same thing on theirs. The faster their stairs move, the faster everyone has to climb to stay even. If you pause, even for a moment, you begin to fall behind. And, of course, if you pause for too long, you will drop off the bottom, no longer able to compete…

Competitors get to enter their escalators halfway up. Falling behind, then, guarantees that the new competition will enter above you.

At the top of each escalator is a level that will allow you to control the speed of not just your escalator, but of everyone else’s as well. If you’re the first to reach the lever, that shows that you’re a better climber than your competitors. So, you can speed up all the stairs so that you can stay even but your competitors can not.

…This is an era in which risk-taking is rewarded, leaving companies that run away from risk as a plunder to be divided up by the others.”

But taking on the most innovative and thus risky projects is not all about being the sexiest agency out there; risks have to be managed. To take on a risky project without a formal plan in place to manage those risks is taking a major gamble with your project, your stakeholders, and maybe even your career.

The book states that risk management is like project management for adults. Something which adults have that children don’t is a willingness to confront the unpleasantness in life, from the little annoying things to the cataclysmic (examples such as putting a band-aid on a cut to keep it from getting infected, or taking out life or home-owners insurance policy as protection against the bigger challenges). Taking note of bad things that can happen and planning for them accordingly is a mark of maturity.

What can you do to make your project sexy (and of course risky) at the same time? Examples are trying out a new technology, working with a third party application that does some incredibly cool stuff, or doing something that nobody else out there has done before. Then there are the core risks that any project (no matter how un-sexy) can have, the book lists these five core project risks:

- schedule flaw, interruption

- scope creep

- turnover

- specification breakdown

- under performance

To ignore any of these is as disservice to your project team, your stakeholders, and yourself.
But risk discovery and planning can be an unpopular task and can make the risk identifier look like a ‘can’t-do’ person or a whiner. Risk management makes a limited amount of can’t-do thinking okay. It’s better to raise the issues early on and be prepared for them if and when they come, then to ignore them entirely.

So, I’m ready to become more of a can’t-do person, and ready to be more prepared for anything good or bad that will come my way. I hope that you’ll join me, and I’ll see you on the escalator!

Including Uncertainty in Estimates of Software Projects, Fort Building, and anything including a Toddler

Early in a project, so many of the specific details of the nature of the software being built, specific requirements, project plan and staffing details are all very unclear. Because there are so many variables early on in the project, it is crucial to include a large degree of uncertainty or variability in the project estimate. This is not about being purposely misleading or avoiding commitment to an exact number with your stakeholders, this is about accepting the reality of software projects that leave so much to be defined early on. To commit to an exact number at the very beginning would be misleading yourself and your stakeholders and presenting a false sense of confidence in something that still has so much yet to be defined.

Steve McConnell, CEO and Chief Software Engineer at Construx Software, presents the idea of a “Cone of Uncertainty” in his book “Software Estimation: Demystifying the Black Art (2006)”.

The horizontal axis shows significant project milestones. The vertical axis shows the degree of error that has been found in estimates created by skilled estimators at various points in the project. What is obvious from the diagram is that estimates created early on in the project are subject to a high degree of error (from .25x lower to 4x higher). As the details of the project become defined and understood, the cone narrows. Obviously the most accurate estimate is made at the very end of the project development, but the challenge in the software world is to find somewhere in between where we know enough about the project to make the best estimate possible while still allowing major stakeholders to plan financially. More about the Cone of Uncertainty, and other estimation resources can be found on the Construx website.
In his book Steve McConnell explains several different techniques used in making software estimates. He also made a very interesting and entertaining blog post recently, where he shared similarities between building a fort in his backyard and problems people run into with software estimates. The general idea here, and very humorously explained, is that in the beginning of a project it’s easy for us to assume that everything will go as planned and the project will proceed smoothly and in a timely manner, but it’s very common for things to take longer than expected. In his case, it was the little construction project in his backyard.

I haven’t built any forts lately, but I’ve managed many projects, and some that have taken longer than the original estimate, for one reason or another. But I also see this concept clearly illustrated in my day to day life outside of my work. I find that it’s almost impossible to make any type of time or schedule commitment when a toddler is involved. I’m fortunate to be the mother (or project manager) of two little girls, and have the pleasure of bringing the older one to preschool every morning. What should take only 15 minutes, can sometimes take up to 40…and this is why:

1. The 2nd and 3rd bowl of cereal (6 mins)
2. Trip to the potty before leaving home, which can sometimes include the mandatory reading of the Dr. Seuss book while waiting for the potty business to complete. (8 mins)
3. Sneakers that get taken off and put back on again, only to get taken off one more time (and of course put back on again) before the final trip out the door. (2 mins)
4. Unexpected meltdown about which jacket to wear, and wanting to wear rain boots and bring umbrella on a perfectly sunny day. (6 mins)
You catch my drift…
So, I’m learning more about how to properly include uncertainty in my estimates at the appropriate times in the project development, both in the projects I manage and the mini-projects I manage at home every morning. It’s a good thing that our preschool allows us a 30 minute window for morning drop-off!

(This blog post was originally published in May 2008 on Flightpath's Digital Insight Blog)