In asset-heavy industries, failures are inevitable. Things fall out of alignment, seize, break. But by converting failures into hard, actionable numbers, you can better keep failures smaller, less frequent, and more manageable.
Before looking at how to measure and convert it into metrics, we need to make sure we really understand what failure is. Failure has nuance.
Partial vs complete failures
Generally, failures can be divided into two types, partial and complete. With partial, the asset might still work, but it's not going to be working well. If it's a giant press for pots, for example, they might now be not exactly the right shape. Or, they might be coming out the right size and shape,but at the wrong speed, throwing everything off down the line. In the case of a complete failure, however, the asset stops working altogether. The press just stops pressing.
Are partial failures better than complete failures? It depends on the situation, but it can easily be argued that pressing a bunch of slightly-wrong pots is worse than pressing none at all. At least with a complete failure, with the line coming to a screeching halt, you know there's a problem and can fix it.
There's one more important difference between partial and complete. Complete failures are like being asleep: you are or you aren't. But partial failures exist along a spectrum. They're the same as being tired. You can be everything from a bit sleepy to dead on your feet.
Partial vs complete, a simple example
Consider a bicycle. We can say a complete failure is when the bike's chain slips the gears and comes off all the way. No matter how hard and fast you pedal, you're not going anywhere. But what if just the chain guard comes off? In that case, the bike still works and you might not even realize there's a problem. Moving along the spectrum, we can see failures that are more obvious but still only partial. Imagine someone's gone and stolen the bicycle's seat. With a bit of determination and balance, you can still ride the bike by standing up on the pedals. It's not a complete failure; it's still only partial.
Now that we understand failure, let's look at two important metrics, MTBF (mean time before failure) and MTTF (mean time to failure). Just before we do that, though, let's remember that there's actually a third metric, MTTR (mean time to repair), which is equally as important. We already looked at it in great detail in What Rocky Balboa Can Teach Us about Failure and MTTR. I've included some of the highlights below, but it's worth your time to go and read the earlier post and them come back.
MTTR: Mean Time To Repair
With Mean Time To Repair, we're measuring how efficiently the maintenance department gets assets back up and running.
How to calculate MTTR
The first thing you need to know is how much time was spent repairing an asset over a set period. Say you have a press with a tricky motor. Over a week, you spend a total of four hours working on it. The first time you work on it for an hour and a half. Then the second time you need another two and a half hours. Something to remember: In this specific case, the lengths of time to repair the asset are fairly similar. This does not have to be the case. You can still use MTTR with very different repair times. So, on another asset, the first time you fixed it, you needed thirty minutes. The second time, three hours. Third time, two days.
It's fine if the lengths of time are very different from one another. But, the people doing the repairs need to be roughly the same in terms of ability and preparation. What you want to know is how long a properly trained professional using a clear set of instructions takes to complete the repairs. If some of the data you're collecting is from a new hire working on an asset without an O&M manual, you're not going to end up with a useful result.
Next, take the total amount of time (which we already said was four hours) and divide it by the number of times you worked on the asset (which we said was two). Your MTTR is 2.
MTBF: Mean Time Between Failure
This metric is used to determine reliability. Basically, how long on average will an asset run before it needs to be repaired. The phrase "be repaired" is key here: you only calculate MTBF for assets that can be fixed. For things that can only ever be replaced, for example light bulbs, you use a different metric.
How to calculate MTBF
You need three things: the total number of hours the asset was in operation, the number of times it failed, and the amount of time it took to repair after each failure. You take the total number of hours of operation and divide it by the total number of failures.
One thing you don't need: the amount of time the asset was offline because of preventive maintenance. Calculating MTBF does not include the time you spent trying to avoid problems.
Let's look at a simple example. Say you have a press that ran for 24 hours. During that time it failed twice, and each time it took an hour to get it back up and running. So, it was in operation for a total of 22 hours (24 hours minus the two hours it took for repairs). Twenty-two divided by two, the total number of failures, equals 11. Not a great asset, really, because on average it's going to fail every 11 hours. That's not good.
Value of MTBF
But don't throw that press out just yet. Generally, when you have a low MTBF, it can be traced back to either operator error or issues with how the asset is being repaired. You can likely improve MTBF with additional training and closer oversight.
Not only does MTBF expose issues with past use and repair, but it also helps set up your preventive maintenance schedule for the future. If you know an asset, on average, fails every 100 hours, you can set PMs at every 90 hours. That way you're getting the most bang for your PM buck.
MTTF: Mean Time To Failure
Here again, we're looking at reliability, but now it's for things that can't be repaired. They can only be replaced. The easiest example is light bulbs.
How to calculate MTTF
When we looked at MTBF, all the numbers were from one asset. But for MTTF, we need a group of identical failed items. Going back to our basic example, light bulbs, we might have four burned out bulbs, and they ran for 20, 22, 26, and 18 hours respectively. We add up those numbers and get 86. When we divide that by the number of bulbs, which was four, we get a MTTF of 21.5 hours.
Value of MTTF
Looking at our MTBF for the light bulbs, we can see right away you're going to need to switch brands, which is really all you can ever do when you have a low MTTF. You can only improve your results by buying better quality products. MTTF is the "you get what you pay for" metric.
MTTF also helps you better manage inventory. If you decide to stay with these awful light bulbs, at least you'll know to keep a lot of them in onsite inventory. Later, if you decide to switch to a better bulb, you know you can reduce carrying costs by keeping fewer of them around.
But the real power of MTTF is what it can tell you about the reliability of bigger, more complex assets. In fact, the MTTF for a small part inside a large asset can have a huge effect on that asset's reliability. Think about your car. What happens when one of the interior lights burns out? Aside from some minor inconvenience, nothing. But what about the fan belt? Like the light, it falls under the MTTF metric because it can't be fixed, only replaced. But because the car can't run without the fan belt, the fan belt's MTTF can be more important than the car's MTBF when determining the car's overall reliability.
You can only really start to use failure metrics once you have a rock-solid data-collection system in place. Luckily, the easiest way to do that is with a CMMS. If you don't have a CMMS yet, now's the perfect time to look into getting one. Older versions required huge upfront investments in IT infrastructure and licensing contracts. Not only that, the software tended to be hard to learn and temperamental. But a good CMMS software today is easy to learn and easy to use, offering a clean, intuitive interface and go-anywhere accessibility. Providers use cloud-based computing to make sure your data stays secure. And it's always your data; good providers are just babysitting it for you; of course, you can have it back whenever you ask for it.