Incident Investigation and Root Cause Analysis

Incident Investigation ebook


$31.50 USD


The thorough investigation and analysis of incidents (both actual events and near misses), along with the appropriate follow-up, provides one of the most effective means of improving the safety and reliability of process facilities through the effective use of incident investigation. Other risk management programs, such as hazards analysis and management of change, are directed toward anticipating problems so that corrective actions can be taken before an event occurs. Yet, in spite of their undoubted value, these predictive techniques do have the following limitations:

  • The analyses are, of necessity, theoretical and speculative; there can be no assurance that all plausible events have actually been identified. Indeed, it is likely that important failure mechanisms will be overlooked.
  • It is difficult to predict the true level of risk associated with each identified event because estimated values of both consequence and likelihood are usually very approximate. In particular, predictions as to what might happen are invariably colored by the personal experiences of the persons carrying out the analysis.
  • Most serious events have multiple causes, some of which appear to be totally implausible or even weird ahead of time (which is why such accidents so often seem to come out of the blue). Even the best qualified hazards analysis team will have trouble identifying such multiple-contingency events.
  • It is very difficult to predict and quantify human error — yet most major events involve such error.

Actual incidents, on the other hand, provide hard information as to how things can go wrong, thus helping to cut through wishful thinking, prejudice, ignorance and misunderstandings. The root cause analysis that follows an incident investigation will help identify weaknesses and limitations in a facility’s management system, thereby reducing the chance of recurrence of similar incidents.

Another reason for emphasizing the importance of incident investigation in the process industries is that process safety management (PSM) systems — of which Incident Investigation and Analysis constitutes one element — have been in place in many cases for more than twenty years. Many of these facilities have made good progress in meeting regulatory requirements. However, the fact that such systems can ‘survive an audit’ and are working well on paper does not mean that they are as effective at actually improving safety as they might be. Incident investigations help identify how the elements of PSM really are functioning, and can provide management with insights as to how the systems can be improved.

Table of Contents

Management Level 
   Line Supervision 
   Facility Management 
   Executive Management 
   Industry Regulations and Standards 
Incident Investigation and Analysis Philosophy 
   Trust and Candor 
   Listen to the Facts
   Technical Expertise 
   Root Cause Analysis 
      Difficulties with “Root Cause” 
      Ockham’s Razor 
   Project Management 
   Attorney-Client Privilege 
Blame and Fault-Finding 
   Management Trust
   Early Reporting of Bad News 
   Management Pressure 
   Safety as a Cause of Incidents 
   Mid-Level Managers 
   Senior Managers 
   Near Miss / Hit 
   Potential Incident 
   High Potential Incident 
Incident Investigation Steps 
   Step 1 — Initial Investigation
   Step 2 — Evaluation and Team Formation 
   Step 3 — Information Gathering 
   Step 4 — Timeline Development 
   Step 5 — Root Cause Analysis 
   Step 6 — Report and Recommendations 
Step 1. Initial Investigation 
   The ‘Go Team’ 
      Immediate Actions 
      Team Preparation 
   Drug and Alcohol Testing 
   Incident Report Form 
      Incident Number 
      Location, Date and Time of Event 
      Duration of Event 
      Date and Time of Report 
      How Observed 
      Person(s) Reporting 
      Preliminary Ranking 
      Incident Type 
      Incident Flags 
      First Description of Event 
      Immediate Corrective Actions Taken 
      Contractor Involvement 
      Detailed Location 
      Emergency Response 
      Security Issues 
      System Alert 
      Incident Owner / Department 
      Notes and Attachments 
   First Management Report
Step 2. Evaluation and Team Formation
   Team Formation
   Outside Investigators
   Corporate Support
   Team Members
      Incident Owner
      Facility Manager
      Lead Investigator
      Area Supervisor
      HSE Representative
      Process Safety Management Coordinator
      Employee Representative
      Process / Facilities Engineer
      Maintenance Technicians
      Subject Matter Experts
      Contractors / Vendors
      Emergency Response Specialists
   Charter / Terms of Reference
   Team Member Qualifications
      Common Sense
      Jumping to Conclusions 
      Haughtiness and Empathy 
      Understand Incident Investigation Methodology 
      You Do Know What You Don’t Know
      Understand Process Systems
      Logical Thinking / Painstaking
Step 3. Information Gathering
   Interview Guidelines
   Regulatory / Legal Interviews
   Witness Interviews
   Interviewer Attributes
      Rapport and Trust
      Technical Skills
      Critical Factors Recognition
      Effective Note Taking
      Management Interviews
   Engineering Information
   Operating Information
      Instrument Records
      Log Books, Maintenance Records and JSAs 
      Hazards Analysis Reports
      Management of Change Records 
      Operating Manuals / Procedures
      Incident Investigations and Audits
   Vendor Data 
   Field Information 
   Damage Assessment
   Photographs and DVDs
   Closed Circuit Television 
   Instrument Records 
   Testing / Lab Analysis 
Step 4. Timeline Development 
   Timeline Steps
      Section 1 — Events Prior to the Incident
      Section 2 — The Incident
      Section 3 — Post-Incident Response
   Timeline Construction
   Multiple Timelines
   Timeline Table
   Background Information
Step 5. Root Cause Analysis
   Management Action
   Levels of Root Cause
      Single Incidents
      Multiple Incidents
   Types of Root Cause Analysis
   Argument by Analogy: Story Telling 
      False Extrapolation
      World Views
   Barrier Analysis
      Equipment Failure
      Human Error as a Root Cause
      Process Systems Failure
   System Analysis
   Why Trees
      Single Chain of Events
      Wrong Chain
   Fault Tree Analysis
   Linkage of Fault Trees to the Timeline
   Common Cause Events
6. Report and Recommendations
   Levels of Recommendation
      Short Term Recommendations
      Intermediate Recommendations
      Long Term Recommendations
      Industry Guidance
   Report Structure
      Executive Summary
      What Happened? 
      What Could Have Happened?
      What Was the Cause?
      What Actions Should Be Taken?
      Terms of Reference
      Reason for Selection 
      Sequence of Events 
      Root Causes 
      Other Hazards 
      Attachment A — Regulations and Standards 
      Attachment B — Root Cause Analysis 
      Attachment C — Organization Chart 
      Attachment D — Review of Similar Events 
      Attachment E — Investigation Team 
      Attachment F — Review of Modern Designs 
      Attachment G — Index to Pictures and Documents 
      Attachment H — Detailed Timeline 
   Issuing the Report 
      Writing the Report 
      Presenting the Report 
      Follow Up and Recommendations Tracking 
      Legal Issues 
Information Security and Chain of Custody 
   Record Retention
   Removing Evidence 
   File Systems 
   Incident / Risk Register
Lessons Learned 
Incident Data Bases 
   National Response Center (NRC) 
   Accidental Release Information Program (ARIP) Database 
   CFOI (Census of Fatal Occupational Injuries) 
   Major Accident Reporting System (MARS) 
   Marsh & McLennan Reviews 
   Annual Loss Prevention Symposia 
   Process Safety Beacon 
   Government Agencies 

Copyright © Ian Sutton. 2018. All Rights Reserved.