22 March 2018

Phoenix  - A Challenge to the Public Sector

(This post was originally published on Medium)

The irony would be sublime — “Rising from the ashes of Phoenix”. Irony, however, isn’t my goal. Instead I intend to provide a concrete way forward to our elected & appointed federal government officials to replace the Phoenix pay system. It frustrates me to no end that in 2018 we have to pay for failed IT systems that use approaches firmly rooted in the 1960s, when I and many other software people know that we could ship most if not all of the system, with substantially better quality, at a cost that’s at least one if not two orders of magnitude lower.

In the 2018 budget tabled by the Liberal government, they acknowledged that the Phoenix Pay system had severe issues and would have to be replaced. Phoenix was intended to replace the antiquated mainframe systems that paid Canada’s public servants. It has been live for 2 years, and has been fraught with issues resulting in under and overpayments of those employees, and in some cases no payments at all.

I will present an alternative approach to delivering such a system. My approach isn’t based on private sector naïveté — I spent over 15 years building systems large and small for 9 departments and Crown Corporations and have no illusions about the difficulties faced in that environment. I’ve also seen how different approaches within government can be remarkably successful and wish to leverage that experience.

My recommendations, therefore, are based on a background of 30 years of combined public and private sector software system delivery.

The Crash of the Phoenix

Based on what you read, Phoenix was supposed to cost $50 million Canadian dollars to deliver but has, as of March 2018, cost over $450M. The government has budgeted another $431M to deal with the continuing issues until a replacement system can be delivered. To be fair, and based on my previous history working on large government projects, I’m going to assume that the $881M value is for the whole program to build the system, consolidate the compensation handling in Miramichi, New Brunswick, provide training, etc. The original $50M was probably for just the system itself. News reports have a tendency to show the numbers in the least complementary light. The truth about the cost of the actual computer system likely lies in the middle somewhere, but that really isn’t the point.

This is yet another example of our government taking the same old broken approach to defining, procuring, and delivering these systems when they know that it just doesn’t work. If you believe that statement is unfair, then you can simply refer to the Auditor General’s reports from practically every year going back to the 1990s.

Phoenix was yet another case where the approach taken would:
  • Mandate the use of a commercial off-the-shelf (COTS) software package (Peoplesoft in the case of Phoenix) in order to leverage existing functionality;
  • Dream up every possible requirement that a compensation management system would need, requiring substantial customization of the COTS package;
  • Consolidate all that into a Request for Information (RFI) consisting of a metre thick binder;
  • Review the RFI from the vendors and revise the binder such that it becomes 2 metres thick;
  • Use the binder to obtain approval from Treasury Board for some low-ball amount for the project;
  • Issue a massive Request for Proposals (RFP) to only those vendors who are large enough to handle such a massive system (remember the 2 metre thick binder);
  • Take months if not years to evaluate the responses;
  • Select the winner, IBM in the case of Phoenix, who has also low-balled their bid;
  • Begin the massive waterfall project, with the hope that everything will just fall into place like it never has in any project ever before.
  • When undiscovered or misunderstood requirements are discovered during the development process, the integrator issues a constant stream of change requests to cover the new work.

After multiple years (2011–2016), the system was rolled out after being delayed in 2015 at IBM’s request due to critical issues. Despite constant problems that were overwhelming the compensation people who worked with the system, the government kept bulldozing forward to meet their deployment objectives. One of those objectives was to lay off 2,700 compensation advisors across the government who had worked in offices across the country (they were replaced by the single, consolidated pay centre in Miramichi, one of the cost savings “wins” the system was intended to provide). On March 9, 2016 the first full payday was processed. Ish.

By July 2016, some 80,000 employees had issues with their pay ranging from complete non-payment to over and underpayments. More money was thrown at the system, akin to tossing a can of gasoline on a fire.

In the February 2018 budget, the government essentially admitted defeat. Phoenix is going to chew up further hundreds of millions of dollars until its replacement can be delivered. Meanwhile, the Treasury Board quietly earmarked $16M over two years to study a replacement. Study. This is where my blood pressure begins to rise.

A Definition of Insanity…

The budget also indicated that, after the 2-year study had taken place, a new procurement process would start, followed by another development attempt, with the system delivered probably in 2025.

All of that indicates to me that the same people who already created the Phoenix problem will harbour the same assumptions, using the same processes and expect to have a different outcome. It’s true that doing the same thing repeatedly and expecting different results each time is not actually a definition of insanity. It is, however, a tremendously effective way to waste enormous amounts of time, money and, dare I say it, the sanity of the people involved.

I don’t just fear that this will happen, I know it will.

It will cost probably as much or more than Phoenix. It will use one or possibly multiple of the “usual suspects” of the big system integrators like IBM (again), CGI, HP, Accenture, etc. It will be late and have significant problems, but those will still be spun as a success story because, for political reasons, it can’t fail.

The Case for Change

My private sector experience suggests that, given $16M and two years, I could assemble a damned good team and ship a not insignificant increment of the replacement system. Yes, I know — also from experience — that the public sector is different. But this is where I want to start teasing apart the massive, gnarly knot of assumptions that comprise the existing approach to delivering these systems.

Several years of my public sector development experience was in building systems for Human Resources. I also did a 6-month stint in Compensation systems where I wrote code that interfaced directly with one of the predecessors to Phoenix, Online Pay. So, I’m coming at this problem with an understanding of what’s involved for at least one government department.

Let’s examine two of the assumptions regarding government pay — that it is extremely complex and requires a massive team.

Compensation in the government is complicated, with some 80,000 business rules according to one source, but it isn’t complex. For a given input, you can accurately predict the output. That’s what delineates complicated from complex, in which the output can only be observed afterwards and not predicted in advance.

There are some 300,000 people who need to be paid by the system, which means the data volumes are quite large. However, neither of these necessitate that the project needs to be massive, with a massive team fielded from a massive company, with an associated massive budget.

Those assumptions are symptomatic of the broken process. Surely something this large and complicated can’t be handled by a small team! Well, that’s exactly the approach used to fix the disastrous initial version of the healthcare.gov web site and health insurance application process in the U.S.

I know what you’re thinking — this isn’t a web site, it’s a serious payroll management system. First, healthcare.gov was more than just a web site and had to process the enrolment of literally millions of applicants. The usage volumes were considerably larger than what Phoenix handles. Secondly, it had to handle U.S. government security and privacy requirements that are at least as stringent as those in Canada. Finally, from the perspective of the end users it was at least as important as being paid. So we can dispense with any notion that healthcare.gov isn’t in the same league as Phoenix.

On the day that healthcare.gov launched in October 2013, it was a disaster. Only 6 people were able to successfully sign up for health care. Major aspects of the system such as direct enrolment simply didn’t work. This was known before the launch, of course, but the date was the date was the date.

Sounds familiar? A large team from large integrators building a large system with a large budget, and it didn’t work. What the U.S. government did next, though, is a hint at what I have in mind to fix Phoenix.

First, a small group of people were recruited from Silicon Valley as part of what was called the Tech Surge to perform emergency work to get the system into a working state. Second, after that initial surge, a second group took over and spent months replacing the applications that had been written by the massive original project group.

Not only did both of these teams successfully ship the required software, they did it at tiny fractions of the cost. One example cited is the login process, which had cost the original massive team $250M USD (yes, that’s a quarter of a billion U.S. dollars!) to build, with $70M USD annually to maintain. The small group replaced that with one that cost $4M USD and under $1M USD annually for maintenance. With the original version, it took from 2 to 10 seconds to log into the system. The replacement took 30 milliseconds. The old system took a user, on average, 20 minutes to complete the enrolment forms after working through up to 76 pages that “helped” direct the person through the process. The new system took an average of 9 minutes, with at most 16 pages.

Comparison of the Login process for healthcare.gov
How was this possible? Simple — the U.S. government decided to challenge the assumption that everything had to be large. A small team with the required skills given a clear mandate with the standard bureaucracy kept out of the way is able to move much, much faster than a large one. When the team discovered something new, they could respond quickly. They had the autonomy to decide to focus on quality rather than simply shipping features. They still encountered challenges within the government ecosystem, but those were eventually removed or overcome.

A small team was able to replace in the order of months what a large team could barely deliver over years, and they did so at a fraction of the cost.

The Proposal

An interesting note about the group that fixed healthcare.gov is that they didn’t completely disband. One set of people created a public benefit corporation called Nava and another group formed Ad Hoc, both of which are now using their approach for systems in other U.S. government departments.

My proposal is that rather than ending with a similar outcome, why not start there? Is there a similar structure that could be used in Canada? Could a Crown Corporation, Special Operating Agency or some form of not-for-profit organization be created that would be given the mission to replace Phoenix?

That organization would have attributes such as:
  • The ability to directly hire the people with the required skills to build the system (existing government employees would be considered, but would have to be seconded to this organization if hired rather than remaining on force with their department);
  • It would be provided funding by Treasury Board, or some combination of TBS and the departments who are the current stakeholders in Phoenix;
  • When that funding was secured, the organization would have total autonomy on how it was spent;
  • The people in the organization would be paid salaries, but wouldn’t be shareholders in the organization in any way, i.e. there would be no financial incentive to have the project to deliver the replacement system drag on over time.

The team comprising the organization would:
  • Have direct access to the people who would be the consumers of the system in order to ensure that the system would work effectively for them;
  • Have direct access to the subject matter experts in order to ensure that the system is properly handling the business rules;
  • Have direct access to the people responsible for any systems with which the new system would have to integrate;
  • Have complete autonomy regarding the process used to deliver the system;
  • Have complete autonomy regarding the system architecture and technologies used;
  • Be ridiculously transparent in their operation with respect to their progress and what work is being done;
  • And perhaps most importantly, have the constant, unwavering support from management at the Deputy Minister and Ministerial levels of the public service.

Having this organization outside of any one government department ensures that it isn’t unduly influenced by any one viewpoint on the system. Having the organization still remain within the orbit of the government with respect to funding and operation means that it wouldn’t be unduly influenced by external vendors.

Most important, having the organization in the first place means that large integrators won’t be spending enormous sums of tax dollars to feed their own services machine.

That alone should make politicians happy, but consider as well the increased likelihood of successfully delivering a replacement system. Rather than facing awkward queries during Question Period, ministers and MPs could show off the success achieved during their mandate.

So Now What?

Nothing is a foregone conclusion, of course. There are small teams who have failed and large ones who have succeeded. My experience, though, and that of many others, suggests that the smaller team approach is what’s needed to ensure that the replacement for Phoenix isn’t just another Phoenix.

The approach and thought processes currently used in the Canadian government are what led to the Phoenix débâcle in the first place, so I see no reason why the outcome should be any different if nothing changes. There’s an old saying the software business — garbage in, garbage out.

My proposal is to start with a small, handpicked team of perhaps a dozen people from a number of disciplines and grow only when there’s enough pain to warrant growth. Let that team work outside the normal government bureaucracy, with the backing of the highest levels of elected officials and members of the federal public service. Stay out of that team’s way and let them deliver a high quality system that is extremely well-tailored to the needs of the users and stakeholders, and does what it’s supposed to do… pay people in the public service.

If you believe in what I’ve outlined here, if you believe that there really is a better way to deliver systems of any size in the public sector, please share this. Send it to your MP. Send it to our Prime Minister! Send it to people you know in the government. Send it to journalists who can amplify the message.

After all, why do we want to change the way systems are built in government?

Because it’s 2018.

18 May 2015

Is Predictability Really What We Want?

Predictability in software system delivery is as close to a Holy Grail as it comes in the IT industry. I’ve heard many people stress being able to have a predictable delivery cadence as something valuable to them. As recently as today I saw a reference to “predictability over commitment” on Twitter! But why is predictability so important to so many people?

The people who pay for the software we create certainly want to get away from systems that cost more than was originally expected and take longer to deliver than thought. They also want fewer surprises from defects that show up only when a system is in production use, which require more time and money to fix.

The people who manage software delivery groups certainly want to be able to know how much their teams can deliver over a certain time period so they can work proactively to deal with issues such as training and career development.

7 May 2015

Getting Started with Test-Driven Development - Where Do I Start?

If you have ever built software using a true Test-Driven Development (TDD) approach, do you remember the first problem you had to overcome? Was it perhaps:
Where do I start? What’s the first test?
That’s a very common issue - simply not knowing where to start.

Quite a few years ago, I was coaching at a client in the St. Louis, Missouri area. I attended a meetup of the local XP group and it was a session about exactly this topic. It was being led by Brian Button, and he started by speaking about the blank look that some developers have when faced with figuring out where to start. His exercise was to have us pair up and start building a simple game of Blackjack using only a test-driven approach.

I paired with Brian Nehl, who just happened to work for the Missouri State Gaming Commission. I figured that would give us the upper hand over other pairs. Our platform of choice was C#, although the examples I will show here are in Java.

After a very brief discussion about the rules of Blackjack, we talked a bit about where we wanted to start. Well, we need a Game of some sort, so should we start there? The game will have to be able to compared Hands of cards, so maybe that’s the place to start. But what about players? We’ll need a Dealer and a Player and each will have a Hand.

After perhaps 2–3 minutes, we decided to focus on something more fundamental - the Card. A card would need to have a value, and I believe we even started with the concept of a suit. So our first test was something like:
   @Test
   public void eightOfHeartsIsGreaterThanThreeOfSpades() {
      Card card1 = new Card(8);
      Card card2 = new Card(3);

      assertTrue(card1.getValue() > card2.getValue());
   }
This test forced a few things. First, the compiler griped that there was no such thing as a Card class. So we had to create that:
public class Card {
}
That cleaned up the first errors, but now we were told there wasn’t a getValue method in Card. That was our next step:
   public int getValue() {
      return 0;
   }
The compiler errors were gone and we could run the test. Naturally, the test failed since getValue only returned a zero. That’s great… we now had our first failing test!

I know what you’re thinking… WHOA!! WAIT A MINUTE! You just hard-coded a return value just to please the compiler!

Yes. Yes I did. And that’s OK! TDD is all about expressing the desired behaviour using tests and making the tiniest change possible to allow the tests to pass. In a statically typed language like Java, it’s OK to think of compiler errors as a form of test.

But now we needed to make the test actually pass. So, we changed the code in the Card class to be something like this:
   private int value;

   public Card(value) {
      this.value = value;
   }

   public int getValue() {
      return this.value;
   }
Green bar!! Our test passed!

We added a couple more tests like this, then realized that we hadn’t actually done anything with the concept of a Suit (Hearts, Diamonds, Clubs, Spades). We decided to refactor our tests to pull out any mention of a Suit because we simply didn’t need it at that point in time.
   @Test
   public void eightIsGreaterThanThree() {
      Card card1 = new Card(8);
      Card card2 = new Card(3);

      assertTrue(card1.getValue() > card2.getValue());
   }
We proceeded to write tests that introduced face cards and the Ace. We eventually created the concept of a Hand, and could compare the value of two hands by aggregating he values of the Cards within a Hand.

I believe that was when we had run out of time, as there was only an hour for the coding session. The goal wasn’t to complete the Blackjack game, but rather to get comfortable with starting somewhere.
During the debrief, it was interesting to note that out of 6 pairs, only two started at the same place. Equally interesting was that there was no incorrect place to start!

That was the point of the exercise. If you’re stuck on where to start, just start somewhereanywhere!

A few years later, I attended Agile Games 2011 in Boston. Adam Sroka led a session about using TDD to build a Texas Hold ‘Em game. When we started, I recall several people tried to design the whole “system”, and built a long list of things that had to be done. There was more flailing over where to start.

I sat down with Moss Collum, another developer, and just asked, “Where do you want to start?” He handed me the keyboard and I typed:
   @Test
   public void eightOfHeartsIsGreaterThanThreeOfSpades() {
   }

Conclusion

Since then I’ve used TDD a couple of times for card games. Today I’d probably start with either the Game or the Hand rather than Card. Again, that isn’t because starting with Card was wrong, but I could see getting further in the same amount of time by using a different starting point. Or maybe not - it probably depends on the individual developers.

The main conclusion, though, is that you don’t need to agonize over what the first test should be. You don’t need to agonize over which concept to introduce first. Just start somewhere and adjust from there. The code will show you which way it wants to go.

27 April 2015

Getting Started with Test-Driven Development - My Left Hand

If you’ve done any amount of programming that wasn’t test-driven, the entire notion of writing a test before writing any code may sound odd. I can tell you from my own experience in late 2000 and early 2001 that it certainly seemed like a strange way to build software. When I first tried writing code in a test-driven manner, it felt more than odd or strange - it felt as if I was using my "wrong" hand to write my signature.

While some people are ambidextrous, most of us aren’t and have a decided preference to one hand or the other. I’m right-handed, and my first attempts at TDD felt like I was trying to write something with my left hand. I noted, though, that it wasn’t the first time I had encountered that feeling.

So, About My Left Hand

I was born and raised in the same town that James Naismith, the inventor of basketball, was raised. I even lived on Naismith Drive for over 15 years! Needless to say, basketball was a big thing in our town. I played during high school and even into my late 30’s in recreational leagues. We had a park with a basketball court right outside our back yard, and a bunch of us played constantly from when the snow melted in the spring until almost when it fell again in late fall.

My Dad also loved basketball and on a few occasions he came over to the park with me to shoot some hoops. One of those times, we played a little one on one. Dad noticed that I dribbled mostly with my right hand and all my shot attempts - even layups - were right handed as well.

For those of you who’ve met me in person, I’m not what you would call the prototypical basketball player. I’m a whopping 5 feet, 6 inches (165cm) tall and was blessed with endurance more than speed. Dad pointed this out and said that I needed to find other advantages when playing, since most (OK, all) of the players defending me would be taller and probably faster. That advantage would be to be able to use my left hand as in addition to my right when dribbling, passing and shooting. Other players wouldn’t be expecting someone right handed to use their left hand as much.

Dad was never one to give a lot of advice, so when he did I usually listened. I believe it was June when he suggested that, and I proceeded to spend the next two months of the summer trying to use my left hand as much as possible. And I sucked at it. I could hardly dribble, passing with my left hand was a joke, and shooting simply wasn’t happening. I was able to start making layups pretty quickly, though, which gave me some hope.

By the end of the summer, not only was I making layups but I was comfortable dribbling with either hand, I could make passes using my left hand that I had never been able to do before, and I could make shots with my left from almost anywhere within 10 feet of the hoop.

Let’s jump ahead now to the basketball season the following winter, when I was the starting point guard. I was much more effective because I was no longer limited to dribbling with my right hand. I could dribble away from defensive pressure and my new capability to pass with my left hand helped get me out of trouble when I couldn’t dribble. It also allowed me to make passes that the players on defence weren’t expecting.

To that point, I clearly remember one playoff game where I drove to the basket with a defender right behind me. When I went up for the shot his hand and arm were over my right shoulder, expecting a right-handed shot. I simply used my left hand instead and sank the layup. It was exactly what my Dad had said would happen the previous summer.

I actually had progressed to the point that I was more comfortable using my left hand on fast break layups than I was my right. That was in the early 80’s and it stayed with me right up until I hung up my Converse’s in the mid–2000’s.

Plasticity

Our brains are incredible - they have a capability called brain plasticity. You will hear about people who’ve suffered a brain injury, a stroke, or have had tumours removed and lost control of some part of their body. Through physical therapy they are able to regain some, if not all of their previous ability because their brain re-learned how to do whatever function had been lost.

Learning to play basketball with my left hand used the same concept, although I hadn’t lost anything. Our brains are capable of rewiring themselves to allow us to do things we hadn’t been able to do before. This isn’t a fast process, though. It takes time and repetition. It took me over two months of practice and repetition to have my brain create new pathways to provide the coordination to use my left hand.

Learning Test-Driven Development is very much the same. It takes time, practice and repetition for your brain to transform it from feeling like trying to write with your opposite hand to being the natural way to do things. From being almost impossible to me preferring to do layups with my left hand.

The question now is, how do you get started?

Don’t Be Embarrassed

The first time you try TDD, it will feel awkward. You will probably be quite self-conscious if others are present because you’ve been used to writing code and shipping systems using a traditional approach. Don’t worry about that - any new technique will feel odd, and this has no bearing on your ability to program. If someone asks why you seem to be going more slowly, tell them you’re using your “wrong hand”!

Use the Buddy System

Probably the most important thing for me when I learned TDD was that I had a “buddy” doing it with me. We shared our discomfort, laughed at each other, and learned together. We also pushed each other. I wasn’t going to give up if he wasn’t!

This concept is actually pretty standard for any behaviour changing activity. Weight loss programs like Weight Watchers use the buddy system. Alcoholics Anonymous has sponsors who work with a person. Those people have “been there” and can relate their experiences.

Do the same with TDD. Find someone who wants to learn as well. Pair with them. Make mistakes together and laugh! Learn together, knowing that you both will get better at it.

Practice Makes Perfect

Over the summer that I learned to use my left hand in basketball, I would be on the court almost every day, sometimes twice a day if I had the time. There were a bunch of us who just showed up each night when the weather cooperated. We played a ton of basketball that summer (and many others as well)! You can do the same while learning TDD.

There are code practice sessions called Katas that are extremely helpful when you’re starting out. You can spend 15–30 minutes working on one of the kata programming exercises such as Prime Factors, Fizz Buzz and a String Parser among others. That practice will take you from that awkward, uncoordinated, “using my wrong hand” feeling to a natural flow.

Conclusion

Test-Driven Development is a development technique that comes with what I call a “learning cliff”… it’s simply too steep to call a curve! However, using these steps can help you scale that cliff relatively easily.

Find someone to scale the TDD cliff with you. Try it out, practice and compare your experiences. Most of all, though, be patient! It will take some time and you likely become frustrated at some point. Remember that the outcome of clean, well-factored, well-designed code that happens to have a comprehensive set of tests describing its behaviour is more than worth the effort.

More on that in later posts!

16 April 2015

It's Just a Feature for Feature Port

Ah, the 80's!
A long time ago, in a galaxy far, far away known as my teenage years, I used to watch the popular CBS show Magnum, PI. The show's protagonist, Thomas Magnum (portrayed by Tom Selleck), had a running self-narration throughout the show which helped fill out the plot and characters.

One line of narration that Magnum often spoke was, "I know what you're thinking...". He did this when it was clear to the audience that he was about to do something that wouldn't work out all that well, and used it to explain why he was doing it anyway. Usually there would be unforeseen complications that led Magnum into danger and from which he would have to extract himself and others in dramatic fashion.

Gotta love Hollywood!

Regardless of how "80's" the show may seem today, that line - "I know what you're thinking" - has stuck with me. I've even used it myself several times during my software career, each time when someone is porting some functionality to a new language, framework or platform.
It's just a simple feature for feature port. We don't need to figure out the requirements. We don't need the business people until the end. We aren't going to add anything new or change the functionality at all!

6 April 2015

Experiments, Not Failures!

In the Agile world we the often hear statements like, "Fail fast, fail often", and "Failure is how you learn". Equally often we hear about organizations whose culture disapproves of failure. We say that such organizations will be a difficult fit for any Agile process and there will be much pain and suffering during the transition.

But what do we actually mean by the term failure?

When performing technical coaching, I've adopted Mike Hill's terminology of microtest to replace unit test. Why? Because unit test has accumulated all sorts of "baggage" over the many years it has been used. At one client, a unit test was something that could take developers two weeks to write. At another client, a unit test was performed manually by QA people. This overloading of the term makes it quite difficult to achieve a shared understanding of testing, for example.

Does that same problem exist with the word failure?

26 March 2015

Changing the Tires

You're driving down the highway trying to reach a distant destination. You've had delays such as traffic along the way, and you know that you're going to have to "push it" in order to have any hope at all of arriving on time. You start to feel something strange in the steering wheel. The car seems to be pulling to one side and the steering is rather "mushy". It dawns on you that you're losing air in one of the tires.

Ugh. You've already been delayed, and you really need to get to the destination on time.  Perhaps there's a store that will be closed. Perhaps an important meeting that requires you to be there. Regardless, a tire that's becoming flat will certainly delay you more than you can afford.

So, what do you do?


16 March 2015

Mary Had a Little Lamb - The Power of Shared Understanding

I spend quite a bit of my coaching time helping teams improve how they determine what needs to be built in order to satisfy a business need. I warn them right from the start that they're going to hear me repeat the same term many, many times before they get rid of me. That term is "Shared Understanding".

There's a thinking tool that I use to help people understand what shared understanding means. I go to a whiteboard or flipchart and write the following:
Mary had a little lamb.
I then ask the group what that sentence means.