Capitoline carries out failure analysis survey
Capitoline LLP is developing data centre management and operations methods largely based its work with the Amsterdam Internet Exchange, AMS-IX.
Capitoline thought a good place to start would be to try and analyse why data centres go wrong.
Capitoline has carried out a failure analysis survey which has revealed at least one major data centre outage per month
Information on this has been published before but usually by manufacturers who have a specific interest in justifying a demand for their own products, or sometimes by users such as Google who are not in a hurry to give much away about their own shortcomings.
As a result information tends to be varied and with no common reporting terminology.
Capitoline's approach was to roundup up all press articles published over the last thirty months in online trade journals all over the world.
Its sources in turn were very often the status dashboards published by the data centre operators for their own customers.
Over a 30 month period 32 major failures were identified.
A major failure is here defined as something that took down the entire data centre or at least rendered its main operational status as unusable for one or more major customers.
This is just under one failure a month but this is only for those incidents publically announced.
The assumption that this is only half of all incidents means there is a major operational incident at a data centre about every two weeks.
Taken over a likely figure of about 1,000 major data centres in the world, failures at a rate of 24 per year then give a chance of major failure at about 2.4 per cent or one in forty, per year.
Of the 32 failures investigated, 24 of them reported actual outage times.
The figures varied from 48 hours to 30 minutes with an average downtime of 14.7 hours per major incident.
Not surprisingly major incidents such as fire, flooding and total power loss take some time to put right.
The final area Capitoline looked at was the cause of the outage.
Its analysis showed that power problems are the main cause of failures at 31 per cent.
This includes power failures that started with the utility supply but where the internal data centre back-up system failed to respond correctly.
This includes generators failing to start for numerous reasons and multiple generators failing to synchronise.
Storm and flood damage came second at 22 per cent and must make data centre operators think carefully about location, building design and lightning protection.
Fire was the next major event, including fires within the data centre and centres also taken out by buildings on fire in close proximity.
At least three quarters of the problems could have been avoided through the use of better design and operational practices, not least a proper testing, under load, of all the back-up power supply equipment.
Capitoline introduced one of the first data centre design training programmes (DCD) four years ago and has now trained over 300 companies worldwide.
In response to the demand for better operational and management practices, Capitoline has launched its Data Centre Operational Management Course (DCOM) to help users avoid the common pitfalls concerning data centre management.
DCOM courses are available throughout Western Europe and the Middle East.
Capitoline is an independent UK-based IT infrastructure consultancy offering data centre training, audit and design.
Not what you're looking for? Search the site.
Browse by category
- Building Industry News (7919)
- Information Technology (2961)
- Building Structures and Products (15785)
- Building Services (11320)
- Building Systems (1482)
- Security and Fire Protection (2472)
- Site Preparation (1588)
- Landscaping (563)
- Plant, Equipment and Hire (1776)
- Civil Engineering (1465)
- Interiors (1258)
- Latest Exhibitions and Awards (24)


