Understanding Variation: The Key to Managing Chaos by Dr. Donald J. Wheeler

A review and summary of Understanding Variation by Dr. Donald J. Wheeler.

Mar 04, 2024

“The book completely changed the way I operate. Understanding Variation ostensibly teaches you how to separate signal from noise1. In practice, it teaches you a method for becoming an EXTREMELY effective operator.” — Cedric Chin, author of Commoncog

If you work on a computer, you likely spend a good chunk of your week looking at charts, numbers, graphs, and spreadsheets. But if you’re anything like me, a lot of this time can feel wasted. We all want to be more data driven, but few of us are ever taught how to do it well.

Understanding Variation by Dr. Donald J. Wheeler is the best book I’ve read on how to use data to make more effective decisions. It’s no surprise, Dr. Wheeler has been teaching people how to use data for more than 40 years, and was a student and colleague of Dr. W. Edwards Deming2 for over 20 years.

Or to put another way, the work of Dr. Wheeler and Dr. Deming gives you a the path toward Process Power like the Toyota Production System.

Traditional status reports are bad

The way data is typically communicated in status reports compares the current period with the previous period or the same period in the prior quarter or year. In general, status reports tend to include rows that look like this:

While the numbers shown above are an improvement on last year, they’re worse than they “should” be according to the plan. If you were the metric owner, you’d have to explain why and outline steps to get things back on track.

This sounds like the right thing to do. The number is worse than you want, so you need to make it better. The issue is that you can’t learn from this data without its context and what is provided isn’t enough.

Data must be presented in context

Comparing the current value to a prior value has two issues:

The amount of data is small
Both numbers are subject to variation

These two issues make it hard to know how much of the difference is due to variation, and how much (if any) is due to real change. You should only compare two values as part of the broader context.

If two data points aren’t enough, should we present every value?

Tables of data are better than direct comparisons, but they provide too much detail. Humans struggle to process them. We need to present data in a human readable form.

Time series provide context in a human readable form

Times series graphs (or running records) typically have days, weeks, months, quarters, or years on the horizontal axis and values on the vertical. Scanning from left to right is the passage of time.

Time series communicate data faster and more completely than tables. They make it easy for humans to compare the current value to previous values, and to see if the current value is unusual.

In general, graphs and other visualizations are the way to provide context without information overload. They include all values in a visual—rather than digital—way.

The table above is graphed in a time series below:

Histograms compress data and make comparisons easier

A histogram (or tally plot) is an accumulation of the different values as they occur without the time component. A mark is placed on a value each time it’s observed. Tally plots compress data which make them useful for side-by-side comparisons.

If we plot possible values of our data set on the horizontal axis, the vertical axis represents frequency. The two histograms below were constructed from the table above, grouping values into groups of ten thousand:

Stacking time series can make comparison easier

Another way to compare two time series is to stack them on top of each other:

Average and range: Two numerical summaries of data

In addition to time series and histograms, numerical summaries of data can be useful:

Average is the sum of the values divided by the number of values, typically known as x̄ (X-bar). Think of the average as a measure of location of the set of values. Averages are typically used to compare two data sets. The set with the greater average “exceeds” the other set. Be careful with averages, in most data the majority of values are not equal to the average.
Range is a measure of dispersion of a set of values defined as the maximum value minus the minimum value, typically known as R.

Numerical summaries supplement graphs, but never replace them. Other measures of location and dispersion exist but average and range are sufficient for most uses.

The first principle for understanding data

Dr. Walter Shewhart’s3 two rules for presenting data form the basis for honest statistics. You should follow them:

Present data in a way that ensures all predictions that may be made can be made. Provide a table of data, graph, and context. Data divorce from context is in danger of distortion.
Never mislead people into taking actions they wouldn’t take if data were presented in a different form. Averages, ranges, and histograms summarize data, but they also obscure the time-order. If the time-order shows a definite pattern, don’t obscure it. Always include time series graphs when summarizing data.

These two rules can be summarized into one principle: No data have meaning without context. Don’t trust anyone who cannot, or will not, provide context for their metrics.

The most common methods of analysis are flawed

Before you can interpret data, you must have a method of analysis. Sadly, the two most common methods—comparison to specification and comparison to average—are flawed.

These methods of analysis encourages a binary worldview and don’t consider the impact of variation. They simple ignore variation and treat every change as signal.

Comparison to specification

Plans, goals, budgets, and targets are specifications. If you’ve read Andy Groves’ High Output Management, the specification approach to management will sound familiar. Andy was famous for applying the principles and discipline of manufacturing to management. And the tradition continues with objectives and key results (OKRs).

If you haven’t read Andy’s book, the idea of a goal being a specification may sound odd. OKRs come from manufacturing. Manufacturers compare product measurements to specification limits to ensure production quality. Managers compare outcomes to OKRs in an attempt to do the same.

Comparison to specification compares the current value to a target value, and deems it as acceptable or unacceptable (in or out of spec), resulting in a binary outcome. Those who are in spec are given a pat on the back. Those who aren’t are reprimanded or fired.

In other words, as long as you hit your target, you’re okay. OKRs have their place, but they should only be used for things that can be measured, and ideally, things that are activities rather than outcomes. You want people to feel like they have control over whether they can hit their target(s).

Another common mistake is setting unrealistic targets. Specifications should be based upon careful analysis of past data, current actions, and likely future conditions. They should not be based on what you want to be possible. Arbitrary targets aren’t helpful, and are often detrimental.

If you pressure people to meet an unrealistic target, they can:

Improve the system
Distort the system
Distort the data

Improving a system is hard. People need time to understand which inputs affect which outputs, and the power to change the inputs to achieve the desired result. If they don’t have the power to make changes, it’s always easier to make data look more favorable.

The specification approach can tell you where you are, not how you got there, or how you’ll get somewhere better.

Comparison to average

There are times where you must hit specification. You want no fraud, downtime, or data breaches. In these situations, it’s common to compare the current value to the average. Like the specification approach, comparing to the average has a binary outcome: above or below average.

Whenever the current value varies too far from the average, someone will ask for an explanation. The issue is the average is generally near the midpoint of the data, so you should expect to be above the average half the time and below the other half.

If you (or your manager) fail to appreciate this variance and apply too much pressure to the metric owner to fix it, they will distort the system or the data to make it look better than it is.

While comparison to the average is similar to comparison to specification, the average at least comes from the process rather than a (potentially) arbitrary target. Unfortunately, the average is only part of the process and doesn’t fully convey context.

Control charts: An approach that accounts for variation

The goal of data analysis is to gain knowledge so our predictions become more accurate over time. We look to data to detect changes, so we can uncover when something we’ve done is impactful. This sounds simple, but anyone who has tried knows how hard it is to do in reality.

A large part of this is due to variation. According to Shewhart, there are two types of variance:

Routine variation is expected even when the process hasn’t changed.
Exceptional variation is outside the bounds of routine and is interpreted as a signal of change.

The noise introduced by routine variation is what confuses and clouds comparisons between two values. Until you know what is noise, you can’t truly understand what may be indicated by a value.

To separate routine and exceptional variation, Shewhart created the control chart (or process behavior chart). A control chart begins with the data plotted in a time series with an average line and limits computed from the data to detect changes. Control charts provide a better approach to data analysis. They overcome the shortcomings of the two methods above because they explicitly factor in variation.

The control chart shown below includes 122 consecutive points that measure the daily conversion rate of a signup process. The fact all values fall within the computed limits with no obvious trend suggests the process is in control. This means the daily conversion rate should continue to fall between 0% and 9.78% with a long term average of 4.89%, unless the process changes in some fundamental way.

That’s a mouthful, so let’s break it down. The power of control charts lie in their limits. The limits are placed symmetrically on either side of the average at a distance which filters out virtually all routine variation4. By visualizing the extent of routine variation, the limits allow you to quickly see any exceptional points. If all points on a control chart fall between its limits, the process is said to be in control or predictable.

You can expect an in control process to stay in control going forward unless something changes. This means you can define how much movement in a metric you should expect to see in the future (e.g. how far the next point may differ from the average).

When a point (or points) fall outside the limits of a control chart, you know it’s worth investigating the cause so you can better understand how to improve your process. The presence of exceptional variation signals that a change you made has had an impact or that there are inputs which affect your process that you don’t understand.

Note how much value control charts provide. By characterizing variation as routine or exceptional, control charts shift the focus from the result and toward the behavior of the system. When a system is predictable (like the signup process above), it’s already performing as well as it can under its current design. Looking for improvement without making changes is a waste of time. We must focus on making changes.

If the system were unpredictable, it would be futile to try improve it. We would need to understand what is causing the unpredictability, stabilize it, then try to improve.

The second principle for understanding data

All data contains noise, some data contains signal. But before you can detect signal, you must filter out noise. This is why control charts are a more powerful form of data analysis than the traditional approaches.

Control charts show you the signal and stop you from attaching meaning to every value. They give you a way to know when it’s safe to extrapolate, and if it is, define the range of values you should expect to see.

By filtering out routine variation (noise) and highlighting exceptional variation (signal), control charts help you distinguish between signal and noise which is the foundation of meaningful data analysis. If you don’t have a good way to separate signal from noise, you will:

Interpret noise as signal: At best this leads to actions which are inappropriate and at worst, completely contrary to what you should do.
Fail to detect when signal: This often appears hand-in-hand with the specification approach to analysis. For example, the underlying process changes but the values are within spec, so no one digs into why.

You can avoid the first mistake by never reacting to any value, but doing so guarantees many mistakes of the second kind. And you can avoid the second mistake by treating everything as signal but this guarantees the first mistake.

You will make one of these mistakes if you can’t differentiate between signal and noise, and you’ll never properly analyze data. Control charts strike a balance between these errors and minimize both errors.

If you don’t use control charts to analyze data, you’ll always be at a disadvantage. Effective data analysis begins with separating signal from noise and control charts are the simplest way to do it.

What you choose to analyze is important

We collect data to drive decisions, and as we’ve learned proper use of data requires the separation of signal from noise. If we can’t, our actions may be inconsistent with the data. But how do we choose what to analyze?

For the purpose of comparison, let’s use the traditional approaches to data analysis—comparison to specification and averages—to pick a metric to analyze. If you’re anything like me, when faced with a table of data like the monthly marketing report shown below, you’ll hone in on the percentage difference columns. The goal is to find what has changed most dramatically and then investigate why.

There are three problems with this approach:

The size of a change partially depends upon the base number. A one unit change from 10 to 11 is a 10% change. A one unit change from 100 to 101 is a 1% change. Percentages show the relative size of a change, not the actual amount. Comparing one change to another is not a reliable way of finding interesting data.
Comparing rows in a report by comparing the size of the percentage differences assumes all lines should show similar variation month-to-month. In reality, each row has its own inherent variation. Some lines show larger percentage differences, others smaller. Choosing what to focus on based on which has the biggest difference guarantees some lines get more attention than they deserve while others don’t get enough.
Period-over period changes–such as the 8% decrease in cost per lead in February compared to the previous year–don’t account for variation. The difference may be due to an unusual value in the past rather than an unusual value in the present. If last February’s cost per lead was unusually high, the reduction may not indicate a trend of decreasing cost but rather a return to baseline.

Comparing percentage differences is an unreliable way to find signal. Large percentage changes don’t necessarily indicate signal, and small percentage differences aren’t necessarily noise.

With that said, it’s common practice so let’s use it to select a metric to analyze.

Using control charts for analysis

The first metric that popped out to me was average session duration, which is 33.3% lower than the plan. First, we’ll consider the usual approach to interpreting data and then we’ll construct a control chart.

If you were responsible for increasing average session duration to meet the plan’s three minutes for the month (or five minutes for the year), what would you do? I’m guessing such a large deviation from the plan would require you to create a plan to get back on track, or at least set up a meeting to talk about what you’re doing.

This sounds great in theory, but many of these reports become works of fiction. Their only purpose is to present something that shows something is being done about the perceived problem.

The whole idea behind creating a plan to get back on track assumes the change in average session duration is exceptional. But is it? To get to an answer grounded in data, we need to filter out the signal from the noise. And to do that, we need to go beyond direct comparison and start looking at historical data:

I don’t know about you, but I don’t get a lot from that table beyond the fact that it might be decreasing. Let’s use construct a control chart to get a better view.

The average of these 26 values is 3.54. We can use this as the average line in our time series graph of average session duration:

As we guessed, February is below average and there does seem to be a small trend toward lower average session duration from 2022 to 2024, but this isn’t enough to answer whether February is exceptional.

We need limits to filter out the routine variation from exceptional variation, which means we need a way to measure variation.

Moving ranges: The measure of variation over time

To measure month-to-month variation we compute the absolute value5 between successive monthly values. These values are called moving ranges:

The absolute value between January 2022’s 4 and February’s 3 is 1 as is the difference between February’s 3 and March’s 4. Repeating this process, we get all the moving ranges from our previous graph:

The time series graph of these moving ranges is shown below:

These moving ranges directly measure the month-to-month variation. Their average is called the Average Moving Range and is included above.

The two graphs in an XmR chart: X-chart and range chart

To construct a control chart for individual values and a moving range (an XmR chart) we begin with the average session duration and moving range time series charts. The first graph is the X-chart and the second is the range chart or the moving range chart.

How to compute the limits of an XmR chart

To obtain the Upper Range Limit for the moving range chart, we multiply the Average Moving Range by 3.27. The value of 3.27 is a constant required to convert the average rating into an appropriate upper bound for ranges.

Upper Range Limit = URL = 3.27 x Average Moving Range

This Upper Range Limit is plotted as a horizontal line on the moving range portion of the combined graph. In the example below, we multiple our Average Moving Range of 0.88 by 3.27 to get 2.88:

This completes our moving range chart. There is no Lower Range Limit because absolute value is never negative.

Now we need to construct the limits for our X-chart. The Upper Control Limit is found by multiplying the Average Moving Range by 2.66 then adding the average line of the X-chart:

Upper Control Limit = Average + (2.66 x Average Moving Range)

The Lower Control Limit is found by subtracting the the Average Moving Range multiplied by 2.66 from the average:

Lower Control Limit = Average - (2.66 x Average Moving Range)

In our average session duration example, we have an Upper Control Limit of 5.88 (3.54 + 2.66 x 0.88) and a Lower Control Limit of 1.2 (3.54 - 2.66 x 0.88). If the Lower Control Limit calculation results in a negative number, and your process cannot produce negative results, use 0 instead.

The complete XmR chart is shown below:

How to interpret an XmR chart

Let’s run through how to interpret our newly made XmR chart. Month-to-month variation is on the moving range portion of the chart:

The Upper Range of 2.88 means the Average Session Duration needs to move by more than 2.88 minutes in a month to be exceptional. A change of this amount from one month to the next is excessive and likely due to something.

The actual monthly values are shown on the individual values portion of the X-chart:

The limits on the X-chart define how large or how small a single monthly value must be before it represents a definite departure from the average. Here, a value over 5.88 minutes would signal that average session duration has materially improved. Likewise, a value below 1.2 minutes would be a signal of a downward shift.

In either case, you would be justified in asking for an assignable cause for the change.

This means February’s 2 minutes is not, by itself, a signal. There is no evidence of any real change in average session duration. It is likely routine variation, which means looking for a cause is a waste of time.

When you show management a control chart, they may not be happy with the amount of variation shown between the upper and lower limits. This example shows that average session duration could differ by over four and a half minutes month-to-month. “Surely our average session duration should be more steady than that.”

It’s not, and it won’t be until you make a fundamental change to the input metrics that result in this average session duration. The limits define what average session duration will fall between as long as things stay the same.

If we bring our attention back to the plan for the year, we see it calls for an average session duration of five minutes:

Our XmR chart shows this hasn’t been done in the past and is unlikely to happen in the future unless there is a fundamental change. The goal of five minutes does nothing to change or improve the outcome.

Dissatisfaction with the limits cannot be cured by finding some alternative method of computing them. Calculating them differently6 defeats their purpose.

If management is not happy with the variation shown by the limits, they need to give you resources to improve the site to generate the desired result, rather than setting an arbitrary goal that isn’t based on past performance.

There is a second problem here, average session duration is a output metric, not a input metric. Average session duration can only be improved by improving pages per session or time on page. Both of these metrics also impact other metrics. If we aren’t careful, pressure to increase average session duration could hurt other parts of the process such as new leads or conversion rate.

Small changes can contain signal

To reinforce why using percentage differences to select data to analyze is wrong, let’s look at on-time payments. On-time payments didn’t change in February and only moved by -0.6% year-over-year:

Most people wouldn’t bother analyzing on-time payments because the percentages changes are small to nonexistent. The XmR chart shown below shows this is wrong:

As you can see, four of the individual values fall outside the limits. There is too much variation in this time series to be due to chance alone. The four values outside the limits are signal, even if the changes in raw percentages is small.

We should have been looking for an explanation as to why the percentage of on-time payments materially decreased in March 2022 and increased in March, April, and May of 2023. But if we relied on comparison to specification and averages, we would have missed it.

Percentage changes are not a reliable way to find signal

In the average session duration example, we assumed a large percentage change represented signal. Analysis revealed no signal. The change was within the limits and routine. Interpreting it as signal and directing resources to fix it would be a waste of time. In the on-time payments example, we saw how signals can be missed if we rely on the traditional approach to data analysis.

The traditional approach results in wasted effort and missed opportunity. It guarantees an excess of both types of mistakes people make when interpreting data.

The control chart filters out the probable noise and detects signal in data. And when noise is filtered out, we minimize the number of times we interpret noise as signal and we limit the number of times we miss signal.

Focusing on percentage changes alone is not enough to identify signal.

Two other patterns to look for

There are two other patterns you need look for. The first is eight or more successive points on the same side of the average line. The X-chart shown below shows on-time payments became detectably higher than the average in March 2023. However, we can see the run7 may have begun in January 2023:

So, while there was a detectable signal in March 2023, the improvement could have begun as early as January 2023. Why January? Whenever eight more more successive values fall on the same side of the average line, it’s safe to assume the time series has shifted away from the average.

You can apply this rule even when none of the individual points fall outside the computed limits. Eight or more successive values on one side of the average line is roughly the same as getting eight or more heads (or tails) in a row when flipping a coin.

Given the odds of that happening eight times in a row is less than 1 out of 128, we can interpret eight or more successive points on the same side of the average as signal.

With such strong evidence of change, you can begin to ask the right questions and hopefully identify an assignable cause for the shift in a timely manner. When you look at the graph (even without the limits), the fact there was a change is apparent. The limits just help to clarify what the graph reveals.

The second pattern is a sequence of three out of three, or three out of four consecutive points that are closer to a limit than the average line8. In an alternate world our on-time payments could have looked like this:

In the chart above, no points break the limits but we have two periods of exceptional variance. The red section has three consecutive points closer to the lower limit than the average, and the green section has four consecutive points closer to the upper control limit. In both cases, we would want to investigate the cause of these changes as they are too large to have occurred solely by chance.

You need don’t a ton of data to make control charts work. Uncertainty in limits does decrease as the amount of data used to compute them increases, but useful limits can be constructed with five or six values.

The best analysis is the simplest that gets you to the right answer

The most popular method of analysis—comparing a current value to a prior value—doesn’t filter out noise. Large differences can be all noise, and small differences can be all signal. But that doesn’t mean we should overcomplicate things. The best analysis is the simplest analysis that gets the job done.

Software can’t solve all your problems

As data analytics tools have become more popular, more managers now have access to real-time data for any routine measurements of their business. It’s never been easier to generate reports on a weekly, daily, or even real-time basis.

No more waiting for the end of the month for a surprise. More access to data is good, but software can’t solve all your problems. You still need to invest time to interpret the graphs.

Imagine you’re a Growth Product Manager and your job is to increase signups for your product. You measure signups in an analytics tool like Posthog:

Can you see anything interesting in the data?

Remember, we traditionally used the current value compared to last month’s value to understand how we were doing. This table above is much more comprehensive than a limited comparison.

But does it help you see anything of interest? Eyeballing data is analysis by osmosis—you first soak up the data then you try to understand it.

Unfortunately, this is not reliable. Different people detect different things from the same data. If your reports only show tables of numbers, you are communicating in an impoverished way. It’s always better to graph your data, even if you don’t use an XmR chart:

Look how much easier it is to detect patterns with a graph versus a table of data.

Each spike corresponds to the last week of the month. You post this graph in your company’s Slack and ask if any other teams are doing anything at the end of each month. Your Head of Marketing replies and says they’ve been sending a newsletter in the last week of the month. Given how well this newsletter is driving additional signups, you might want to test increasing its cadence.

The point of this is to show how graphs present interesting information in higher fidelity than tables of data. Tables of data tend to overwhelm people with extraneous detail. Graphs sweep away complexity and reveal structure present in the data.

In some situations, such as the example above, the signals are so obvious and easy to understand that you don’t need to compute limits. You can effectively eyeball the graph to naturally filter out the noise. For most situations, changes aren’t as easily recognized and control charts are a major quality of life improvement.

If you’re not careful, data can lead you astray

Imagine you joined a fast-growing startup as the Head of Engineering in December 2023. One of the metrics you’re responsible for is the number of voluntary leavers in your team. Attracting engineering talent is hard—it takes time for them to get up to speed—so the metric is included in the executive management report, and is one of the first topics in every monthly meeting:

February, with five leavers, was far above average. More than three times as many voluntary leavers as usual, and five more than last February. In fact, the company has never had five engineers voluntarily leave in a month. You’ve had seven people voluntarily leave since you started two months ago, 3.5x ahead of the previous year.

This isn’t great and requires some sort of action, but what? In the meeting, the Head of People Operations suggests you should aim to cut the number of voluntary leavers by 50% next year, and there is widespread agreement.

But voluntary leavers are a result, not a cause. You cannot directly manage it by goal setting. The data can be distorted by pressure to meet the goal—such as by changing the definition of a voluntary leaver to only those who have been with the company past probation—but what is causing people to leave will not be impacted by choosing a target out of thin air.

You need to study the system that results in people leaving, and the best way to do that is to use an XmR chart:

The Average Moving Range for this data is 1.68 and the Upper Control Limit is 6.08. Even though five voluntary leavers in a month isn’t ideal, it is inevitable as long as the engineering team averages 1.62 voluntary leavers per month.

There is no evidence in the XmR chart that the environment which results in voluntary leavers is any different in February than it was before you started.

The number of voluntary leavers in the engineering team cannot be said to have worsened. The X-chart shows a predictable system.

As long as the team averages 1.62 leavers per month, the monthly values will range from zero to six. You would need to have seven people voluntary leave in a month for the value to be exceptional.

With all that in mind, how could you reduce the number of voluntary leavers? Since the system is stable, you have to study all the reasons people have left in every month. It would be a mistake to single out February.

What you choose to measure is also important

Good analysis depends on good data, and how you interpret data is based on where (and how) it was gathered. Unfortunately, a lot of data is aggregated over so many different processes that the only context remaining is that it was collected over the same time period.

Aggregated data loses its usefulness

Imagine you’re the Head of Sales at a B2B SaaS company, each month you record the Average Time to Close9. Your OKR for the year is to get 95% of AEs closing deals in less than 30 days. This metric appears in the monthly management report as Average Time to Close Efficiency:

Once again, this doesn’t provide enough context for anyone to actually understand how the team is performing. Values for the past 26 months are shown below:

Notice how it’s the percentage of Account Executives closing deals in time, not the count. This is because the number of AEs is not constant over time. If it were, you could use count or percentages to tell the same story. Given the number of AEs does change, you must use percentages.

February’s value of 71.3% is the lowest on record. Has something changed or is it simply a low, but routine value? The simplest way to answer this is an XmR chart:

I set the Upper Control Limit of the X-chart at 100% as the computed value exceeds 100%, which is obviously not possible.

The last point on the X-Chart is below the lower limit and is therefore signal. Detectably fewer reps were closing deals in less than 30 days. The chart can’t tell you why this happened, but it does tell you something has happened. You should investigate. You may realize you hire a lot of new reps in February and they are not yet to fully ramp up. In this situation, it may make sense to adjust your metric to be the number of ramped up reps closing deals in 30 days.

There is something else we can learn from this control chart.

With the exception of February, the percentage of AEs closing deals in less than 30 days is predictable and the average is 89%. Six percentage points lower than your OKR.

While you may have the occasional month with more than 95% of reps closing deals in time, your team cannot average 95%. Your OKR is beyond the capabilities of the current system. If your management team wants you to meet (or exceed) the goal of 95%, something has to change.

You will not be able to bring about the desired changes by only setting a goal, you must improve the system. The issue is summary statistics—like the one shown above—are not focused enough to bring about change.

Measure individual inputs

The key thing to realize is every rep is different. Some reps may take longer to close deals than others. You need to consider each rep as their own system, with their own stream of data. These data streams then need to be analyzed separately to discover if they behave predictably.

If a given rep displays predictable variation, you will do no good setting a goal that is outside their limits. You will only get what they can deliver and setting a goal outside of that will only cause them to try to distort the system or the data.

And if a given rep doesn’t display predictable variation, there is no point setting a goal. They have no well-defined capability, so there is no way of knowing what they will produce month-over-month.

Instead of setting arbitrary goals at a high level, you need to analyze the data for each rep to uncover problem areas, opportunities for improvement, and most importantly what your best reps are doing differently.

This is extremely difficult to do at an aggregate level. Aggregated data tends to be useless when you’re trying to identify problems and solutions. Process improvement requires specific measures and contextual knowledge. These are more available when you are measuring individual activity.

In order to improve your output metric, you need to develop appropriate input metrics. You need to look back at the process of your best reps and find out what they are doing. Once you know, you can start measuring the input metrics that result in the outcome you want.

Good data is not an accident.

Finally, keep in mind closing 95% of deals in less than 30 days may not even be a good goal. If closing deals quicker means a lower Average Sales Price and increased churn, you may want reps to take more time even if it raises your sales costs.

Problems like this are common when related metrics are owned by different teams. Such as when Net Revenue Retention is owned by the Customer Success team and sales is not incentivised to sell to customers who stick around.

Continuous improvement means looking at the whole picture, not a narrow slice. Optimizing the system often requires some parts operate suboptimally.

How to measure rare events

An XmR chart is inappropriate when working with data where the average falls below 1.0. Counts for rare events have to be placed on a different control chart called a c-chart, which uses slightly different formulas than the XmR chart to account for rare events which occur independently of each other.

Imagine you’ve been running a SaaS startup for just over two years and occasionally have to taken you have unexplained downtime. It annoys customers, leads to churn, and you do everything possible to prevent it. Of course, when the app does go down unexpectedly, it has to be fixed and the root cause has to be properly documented to avoid making the same mistake again.

In the past, you’ve averaged one hour of unexpected downtime every 5ish months. The most recent downtime was in February:

As you can see from above, the single hour of downtime is 520% above average. When dealing with very small numbers, such as the count of rare events, a one unit change can result in a huge percentage difference. The rest of the “information” shown is virtually useless.

A more complete perspective is shown by a time series graph:

To create a c-chart for this data, we need to compute an average count. During this time you’ve had five hours of downtime. Five hours of downtime in 26 months gives an average of:

x̄ = 5 hours / 26 months = 0.192 hours per month

This average is used as the central line for the x-chart and the upper limit is computed according to this formula:

UCL = x̄ + 3 x √x̄ = 0.192 + 3 x √0.192 = 1.506

The complete c-chart is shown below:

Despite the single hour of downtime being 520% above the average, the c-chart shows it’s not outside the limits. This is not a problem with the charts, it’s a problem with the data. Rare events are inherently insensitive and weak. No matter how you analyze these counts, there is nothing to discover.

Instead of counting the number of hours each month, you should instead measure the number of days between downtime and then use an XmR chart. Doing this analysis would allow you to see if unexplained downtime is becoming more or less frequent as time goes on. In general, counts are weaker than measurements. When possible, you should always measure the activity and not merely count events.

You need to use control charts

We collect data to make better decisions. Before data can be used, you must interpret it. Proper interpretation of data requires that it is presented in context, and that noise is filtered out.

The traditional approach of using percentage differences to provide context is like using a time series graph with two points. It’s not enough to know if things have improved, worsened, or stayed the same. Limited comparison can be useful, but it generally doesn’t provide enough context to point toward the right decision. Nor does it filter out noise.

Likewise, comparisons to specifications, plans, goals, targets, budgets, or averages make no attempt to filter out noise. These comparisons show if you are or aren’t “in spec”. If you aren’t, they provide no suggestion as to how to get back into spec and if you are, they don’t predict whether you will continue to be going forward.

Graphs present data in context. Among the many graphs possible, the time series and histograms are the two most useful. As you should know by now, time series form the base of our control charts.

Control charts are the simplest way of filtering out the probable noise in any data. These charts help you concentrate on what is important and ignore what is not. Not using control charts to analyze time series is one of the best ways to increase costs, waste effort, and lower morale.

Statistics isn’t a spectator sport, and control charts are no exception. Find a process you want to measure, gather the data, compute the limits, and construct an XmR chart (or c-chart).

It’s only after you use control charts that you begin to understand how they work and why they work. And only then can you start to understand the inputs to your process.

Apply what you’ve learned

Write out a list of metrics you use and report on.
Ask yourself if you are collecting the right data. Both the data you collect and the data you report need to be useful, correct, and undistorted. Data which describes activity is more useful for process improvement than data that describes outcomes. Focus on input metrics you can control rather than output metrics.
Pick two or three metrics and plot them on a control chart.
Insist upon interpreting data within their context. This will require a transformation of the traditional reporting structure.
Filter out noise before interpreting anything as a potential signal.
Stop asking for explanations of noise. If there is no signal, the current value cannot be said to differ from preceding values. There is no amount of explanation, no matter how well worded and reasoned, that can be supported by data.
Understand that no matter how results compare against specifications, a process which is predictable is performing as well as it can in its current design.
Always distinguish between what you want and what the process can give.
Help others take action on newly discovered input metrics. Knowing when your output metrics change is the first step. Once you know, you can discover new detrimental input metrics that need to be measured and eliminated, and beneficial ones that need to be made part of the process.

Additional resources

Articles

Commoncog’s series on Becoming Data Driven in Business

Books

High Output Management, by Andy Grove
Working Backwards, by Colin Bryar and Bill Carr
Lessons from the Titans, by Scott Davis, Carter Copeland, and Rob Wertheimer
Kochland, by Christopher Leonard
The Goal, by Eliyahu M. Goldratt and Jeff Cox
Toyota Production System, by Taiichi Ohno and Norman Bodek

Videos

Thanks to Jack Walsh for reading drafts and providing feedback that made this post much better than it would have been.

Signal is the meaningful change that you're trying to detect. Noise is the random, unwanted variation or fluctuation that interferes with signal.

Deming was the father of the quality movement and hugely influential in post-WWII Japan. His work is credited with revolutionizing Japanese industry and helping Japan become one of the world’s most successful economies.

Shewhart, pronounced like “shoe-heart”, is known as the father of statistical process control (SPC). SPC is the application of statistical methods to monitor and control the quality of a production process. Key tools used in SPC include run charts, control charts, a focus on continuous improvement, and the design of experiments.

The limits are computed by multiplying the average by a scaling factor that results in “three-sigma limits.” Nearly all values should fall within three standard deviations of the average, so we can points outside the limits as near certainty of change.

All this means is remove any negative sign in front of a number, and to think of all numbers as positive (or zero).

There is a valid alternative for computing limits for the XmR chart. You can use the Median Moving Range instead of the Average Moving Range. When this is done, the value of 2.66 is changed to 3.14, and the value of 3.27 is changed to 3.87. The limits above will be different, but the story they tell remains similar.

A run is a sequence of points on the same side of the average line.

This is a sign of exceptional variance because predictable processes have about 85% - 90% of the data sit closer to the average line than either limit.

Average Time to Close is a metric used to measure the length of time it takes for a sales team to close a deal or convert a lead into a customer. It's calculated by taking the total number of days it takes to close a deal and dividing it by the total number of deals closed within a specific timeframe.