Welcome


Welcome to my blog for all things related to business quality (processes, systems and ways of working), products and product quality, manufacturing and operations management.

This blog is a mixture of real-world experience, ideas, comments and observations that I hope you'll find interesting.

Pages

July 2010
M T W T F S S
« Jun   Aug »
 1234
567891011
12131415161718
19202122232425
262728293031  

The real meaning of MTBF

Ignore some of the more disparaging descriptions of what ‘M.T.B.F.’ means; it actually stands for Mean Time Between Failures (or, for products that can’t be repaired, the term Mean Time To Failure is often used instead). It’s the inverse of the annual failure rate if the failure rate is constant.

And it isn’t quite what you might think.

What is the MTBF of an 25 year old human being? 70 years? 80? No, it’s actually over 800 years which highlights the difference between lifetime and MTBF. Take a large population of, say, 500,000 over a year, and seeing how many ‘failed’ (died) that year – e.g. 600 – so the failure rate is 600 per 500,000 ‘people-years’, i.e. 0.12% per year and the MTBF is the inverse of that which is 830 years. An individual won’t last that long, they will wear out long before then (unless they are Doctor Who), but for the population as a whole, in that ‘high reliability’ portion of their lifespan, it holds true – in a typical year you will only have to ‘replace’ 600 of them.

So why measure MTBF? “If you can’t measure it you can’t manage it” – knowing your MTBF allows you to benchmark yourself against competitors and can be a marketing asset; many customers expect you to know and disclose your figures. It also allows you improve the weak spots in your product range, and is useful feedback for the design process.

There are two main methods for calculating MTBF:

MTBF Prediction is a mathematical model of reliability, based on accumulating the individual MTBFs for the product’s constituent parts and subassemblies, gleaned from manufacturers data or libraries of standard figures and mathematically combining them into an overall figure. MIL-HDBK-217 (MIL-STD-217) was one of the first methods and is still very well known although other schemes have since come into common usage such as Telcordia’s SR-332, BT’s HRD5, and others; there are software tools available, from free to megabucks, that help you make the calculations.

These theoretical methods are supposedly based on empirical evidence but have a number of flaws, primarily that (a) the individual parts never actually have the MTBFs you expect of them, and (b) combining them mathematically ignores many of the real-world effects that dominate the MTBF of the whole product. I once designed a large audio mixing desk whose predicted MTBF according to MIL-STD-217 was less than 8 minutes; I’m glad to say that, in practice, it was a great deal longer than that!

MTBF Measurement sounds simple in principle; count how many failures you have in a given period of product usage and some easy maths gives you the MTBF. The Devil is in the detail, though – doing statistically meaningful averages over large volumes and long periods is easy, but what about small populations, and what if you need answers quickly rather than waiting for several years?

In practice you have to make some assumptions, the main one being that your failure rate is constant. Now this may not be true; if we take the classic bathtub reliability curve you may have a long drawn-out leading edge with a high level of infant mortality, or you may have a long trailing edge where products start to fail prematurely after relatively little life in the field, but both of these are problems that you would need to do something about urgently. The norm is to have a fairly long period of constant reliability – bumping along the bottom of the bathtub – and in this zone the failure rate over a short period can be extrapolated to the rate that would be achieved over a much longer period… as long as it is within the published lifetime of the product (the MTBF of an 80 year old human is not 830 years!).

So take the date that you shipped a unit to a customer, add a little time for the customer to put it into service, then open up a ‘sampling window’ in time of, say, 6 months to look for any failures. If the failure rate is constant then the annual failure rate is twice the number of failures in the 6 month window. If the units are used 24/7 the MTBF in years equals the number of units built divided by the annual failure rate (back to 500,000 25 year old humans, divided by 600 failures, equals 830 years MTBF). Periodic use, say 8 hours a day, would require the MTBF to be scaled down accordingly (because it has clocked up fewer operating hours per failure, hence a lower MTBF).

Don’t be too harsh on yourself, by the way; you wouldn’t normally expect to count units returned as faulty but that turned out to be No Fault Found, or units damaged by the customer or in transit, or units that were prototypes and not expected to have the performance and longevity of production units, or units that had not been properly serviced or maintained or had reached their published end of life, so you can normally exclude these from the calculations.

And how do you define a failure – does the malfunction of a single dashboard bulb in a car mean the whole vehicle has failed? You will want to have a sensible, defensible criteria for “fail”.

Now, I plead guilty to dramatically simplifying the subject; what about Mean Time To Repair, what about non-linear failure rates, what about the difference between constant failure rate and constant failure density, what about adding normalising or scaling factors to match different environments? All valid questions and, I’m sorry to say, beyond the scope of this short blog.

However, the key message is that you can calculate MTBF quite easily with a little patience and a simple spreadsheet, and it’s a very useful figure to have.

TwitterDiggDeliciousRedditStumbleUponLinkedInGoogle BookmarksShare

1 comment to The real meaning of MTBF

  • dear sir,

    i have a FIT rate of 1 and so i converted this to annual failure rate in ppm which is 8.76 ppm. Can i multiply this by 10 if i need 10 year failure rate in ppm, so i get 87.6 ppm over 10 years. is this a correct way to make this calculation or is there another formula availabel somewhere.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>