In the Lean manufacturing world there’s a measurement called First-Time-Through (FTT), which monitors whether a cell is making products right the first time. It’s a measurement of the effectiveness of the cell’s standardized work and shows the percentage of product made without any need for rework or scrap.
FTT = ( Total units processed - Rejects or Reworks ) / Total units processed
If the standardized work is adhered to, the product will be made right first time and FTT will be 100%. However, flawed materials, faulty components and operator error all contribute to rework and scrap.
Who cares about parallels between manufacturing and software development? I was just interested to read about FTT because I’ve been thinking for a while now about the effectiveness of software teams … at an operational level, let’s say. I’ve long considered an effective team as one that is able to sustain throughput (i.e. the number of cards released to production that deliver value) while fixing defects immediately and repaying technical debt to keep the amount of rework small.
I consider technical debt and defects to be rework, and technical debt to be a natural byproduct of software development. It stems from earlier decisions, based on what we knew at the time, and requires attention later when the system has outgrown the outcomes of those decisions. It is necessary rework that keeps the emerging design relevant and the software healthy and habitable, reducing risks and medium to long-term costs. Defects are basically mistakes. They happen. How we create software determines whether we have a small and manageable amount of rework or a crippling amount of rework. If we’re responsible, skilled and bake quality into code we can minimize rework to technical debt and occasional defects. If we’re irresponsible and cut corners, or we’re rubbish and write crap code, then rework can become so large that the only viable option is to cancel or start again.
Technical debt requires careful management and continuous investment while defects should be fixed as soon as they are found. A proportion of a team’s capacity is therefore always expended doing an amount of rework. That’s a good thing providing:
- the completed rework is small compared to the throughput so that capacity mostly focuses on value demand, and
- the completed rework is enough to keep the remaining rework small compared to the throughput, thus minimizing further failure demand.
(Throughput excludes repaid technical debt and fixed defects that went live).
On a weekly basis then, the throughput in relation to the remaining technical debt and defects might be a useful measure of a team’s effectiveness.
Effectiveness = ( Throughput - Rework ) / Throughput
where
Throughput = Number of cards released to production that deliver value
and
Rework = Number of technical debt and defect cards in inventory and work-in-process
I’ve pushed various teams’ data through and the charts seem to correlate with the events described in my historical notes. Here’s a chart based on a small, experienced team working on a small project for 3 months.

You can see there wasn’t any throughput in the first 4 weeks as completed cards queued up in inventory. In week 5 that inventory was flushed to became throughput as the first cut was released. Effectiveness then varied with the weekly releases until week 10, which saw the team 100% effective with no rework cards in inventory or work-in-process. In week 12, however, effectiveness dropped to -33% because 1 technical debt card was work-in-process and 3 fixed defects were queued in inventory while only 3 cards were released.
Although it’s perhaps a simplistic indicator do you think it’s useful as a measure for effectiveness (i.e. a team’s ability to deliver value and stay healthy)? Or is it utter tosh? Can it be refined (without complicating it)?
10 Comments
Very interesting, it's a nice way of visualizing that fixing defects does not add to your effectiveness. One thing I notice is that the delay between releasing a feature and the returning defect. I wonder if you can minimize this further.
A defect card and a feature card are of equal size in your equation, doesn't that lead to very low dips in your Effectiveness? Shouldn't you take into account the amount of work for a defect?
Actually fixing defects does contribute indirectly to effectiveness. If there are defects and they are being fixed then that works to keep the remaining rework (technical debt and defects) small.
I should have mentioned that all our cards are sized to be less than 2 days of effort to complete, on average. There may be a defect that takes longer but they don't happen often enough to affect the overall balance.
Here's a couple of earlier posts that talk about how we use stories and do estimation:
http://blog.energizedwork.com/2009/05/pomodoro-galore.html
http://blog.energizedwork.com/2009/11/how-we-use-stories.html
Simon,
I like this idea. Here's a question: Not all defects are created equal, even if the work items are approximately the same size. A defect may be reported, but fixing it may be deemed to be of low value or not an immediate priority. This item would then remain in inventory for an extended time. In reality, the rework will only impact the team during the time it's in process, and not continuously while it's in inventory. Wouldn't this skew the metric?
Cheers,
Dave
Hi Dave
We define inventory to be cards that have been completed but not released. We use this definition to place emphasis on the need to release regularly (it's a good metric when used in £ terms to convince the business). It's also closer to the notion of stock or inventory, which is finished goods sitting on shelves. And from an accounting perspective it's not an asset but is cost (or sleeping money since it we would want it to realize value upon release).
Typically, for us, defects are automatically prioritized to the top of the pile and are fixed immediately. However, I admit that every once in a while there's a defect that just isn't a priority. One of two things usually happens - either the team deals with it in the slack (because they simply prefer not to tolerate defects hanging around) or it goes on the backlog, in which case it's neither inventory or work-in-process.
Our board has these columns:
Backlog > Prepare > Waiting > Started > Ready > Released
For us lead time starts when a card enters the backlog and ends when it is released. Cycle time starts when a card leaves the backlog and enters the Prepare column and ends when it is released. So work-in-process includes any cards in the Prepare, Waiting and Started columns. Inventory is the cards in the Ready column. And throughput is the cards in the Released column.
I see. I was thinking of 'inventory' as including the Backlog column.
Interesting thoughts, and a great description of how a team can manage technical debt responsibly.
My worry would be that people would be inclined to game this metric by choosing not to acknowledge technical debt, and not fix it. If you talk about 'effectiveness' I think you need to take into account the idea that technical debt rework is inevitable, and somehow reward working on that stuff as well as building new features.
Good points Matt. I share your concerns and thought long and hard about gaming. I came to the conclusion that anything can be gamed. And preventing gaming really requires a change in behavior and not a 'hardening' of the measure. We use the concept of a product hub to help keep people true and honest (I'll blog about that soon).
I described technical debt as "necessary rework" (you refer to it as inevitable). The reward as I see it for managing rework is the ability to continue to move at speed, to be effective. This is reflected indirectly in the equation - by continually investing in the necessary rework you are keeping the remaining rework small, which works to keep effectiveness high. Of course this assumes that code is being created responsibly.
on a regular basis that should mean your remaining rework is small
Hey Matt,
A valid concern. I see development teams gaming their metrics frequently and it disturbs me. We need to analyze real measurements in order to gain insight into how things are currently working - a basis for the next experimental change - the 'C' in 'PDCA'.
When people game the metrics they distort the voice of the process and we can't see what's really going on. It's worth noting that you just can't be 100% effective - and people shouldn't be aiming for a percentage or rewarded on that basis: they should be focused on delighting their customers with great stuff instead. :)
Technical debt is a natural byproduct of software development and experienced teams know that they need to keep it in check to stay predictable and keep quality high.
Gus.
Interesting metric - have you tried using the data in a capability chart, looking for stability between the upper and lower control limits, and using data points outside the limits to investigate improvement opportunities?
Hi Karl
I did some time ago but the data I had produced strange natural process limits. I can't remember what was strange about them. I'll try it again with more recent data I've collected. With respect to SPC, the interesting one I use is for cost per story because it's a great indicator for inventory and costs.