Why DCiRN is important to me
by Don Carless
UK Expert on BSI TCT 7/3 and Industry End User Representative on TCT 7/1 & 7/2
Key people from the DCiRN Executive and DCiRN Technical Authority are being asked 1 simple question - Don Carless, "Why is DCiRN Important to you?"
Clean Hands and Best Practice
I first encountered the suggestion of a Data Centre incident Reporting Network (DCIRN) as a global method of anonymously posting DC incidents for open analysis and advice for best practise, when I saw a presentation by Ed Ansett (i3 Solutions) a few years ago at Data Centre Dynamics (DCD) in London.
Ed had been given dispensation to disclose his findings from investigating an incident at the Singapore Stock Exchange (SSE) – sharing information as a simple, but totally altruistic gesture to help the industry. Until that event the Data Centre industry was secretive – incidents and failures were communicated in whispers, causes were mostly speculative and lessons had to relearned the hard way – I was struck by how valuable sharing root cause information can be and what a magnanimous and mature gesture had been made by SSE – an example to us all, and one that I recognised needed to be emulated throughout the industry. DCiRN will be that vehicle.
At the opposite end of the spectrum (in terms of responsible behaviour) – I recently notified a manufacturer that I’d discovered a fault; their response was to send me a firmware patch – which solved the problem, but the event had been avoidable. I asked why I hadn’t received the software patch by default, and, are there any other software patches available for issues I hadn’t discovered? The response of the manufacturer was that they only send patches to customers who had experienced the issue. This wholly reflects our industry. The motivation, it seems, was to maintain their reputation and market perception –which is, in my opinion a short term and irresponsible attitude and not necessarily in the best interest of our industry.
Our industry was founded on the business requirement to store and process data, most of which was not necessarily time or life sensitive, end of day or period reconciliations could resolve outage problems. Our world now houses systems as diverse as hospital data and autonomous systems running on digital infrastructure, Smart Cities have integrated digital nervous systems to manage everything from traffic lights and trains to emergency communication systems – incidents could result in human fatalities.
Ed drew the parallel with the Airline Industry and their anonymised reporting system CHIRP. Early signs of bad practise, malfunction or poor design are shared in the airline industry. This makes sense to me. The end game is, you can’t hide an airplane crash – there will always be a follow up investigation involving government bodies and often law suits and compensation. As an industry if a DC failure takes life – how can we demonstrate we have clean hands and used best practice? I believe the answer is DCiRN. The alternative will be the dead hand of Government regulation – which will be painful and expensive. We have to grow up – we need to share best practices – DCiRN is important – we need to be able to tell Governments “We’re on it!” and this is how we’re ensuring transparency and best practice.
Another benefit of DCiRN is we can demonstrate to the consumer that our offerings are more reliable, designed and operated using the latest learnings and best practise that go beyond standards. The assurance of the anonymization process maintains our employers and customers reputation in the market. Within the organisation I’ve drummed into all my staff that - if they are aware of an incident – then “hands-up” and don’t hide anything. However, I can now seek root cause analysis and publish our lessons learned anonymously to the industry. My industry is a mechanical and electrical world all designed and managed by humans – currently a teams reputation is all about how they recover from an incident. However, I would rather avoid the incident.
By passing our learnings forward and agreeing to share information via DCiRN, we can all sleep better at night.
What's the DCIRN mission?
The role of the DCIRN is to manage an independent, voluntary confidential reporting programme for data center operators and personnel working in the data center industry in order to to improve the safety and reliability of data centers and the services they provide to the public and the safety of individuals employed within or associated with data center operations.
We have developed an independent, confidential and anonymous reporting programme of data center failures and significant incidents with a mission to improve the reliability of data centers by collating and analysing information that can be shared within the industry.
We will achieve our mission by:
1) Managing an independent confidential reporting programme for the reception and handling of human factors, equipment reliability and safety-related issues associated with the data center community;
2) Analysing data and identifying factors as a contribution to improving safety and reliability in data center operations;
3) Informing the data center community of safety and reliability related reports and trends that we consider will be of public benefit.
How does DCIRN ensure incident submissions are confidential?
DCIRN has established a process to ensure that all incident submissions are made anonymous before publication. Incident submissions are encrypted and password protected. Incident submissions remain within the Secretariat and are made anonymous before being sent to the Technical Advisory Committee for review, comment and subsequent publication on the website or bulletin