Incident Investigation and Root Cause Analysis
Price
Overview
The thorough investigation and analysis of incidents (both actual events and near misses), along with the appropriate follow-up, provides one of the most effective means of improving the safety and reliability of process facilities through the effective use of incident investigation. Other risk management programs, such as hazards analysis and management of change, are directed toward anticipating problems so that corrective actions can be taken before an event occurs. Yet, in spite of their undoubted value, these predictive techniques do have the following limitations:
- The analyses are, of necessity, theoretical and speculative; there can be no assurance that all plausible events have actually been identified. Indeed, it is likely that important failure mechanisms will be overlooked.
- It is difficult to predict the true level of risk associated with each identified event because estimated values of both consequence and likelihood are usually very approximate. In particular, predictions as to what might happen are invariably colored by the personal experiences of the persons carrying out the analysis.
- Most serious events have multiple causes, some of which appear to be totally implausible or even weird ahead of time (which is why such accidents so often seem to come out of the blue). Even the best qualified hazards analysis team will have trouble identifying such multiple-contingency events.
- It is very difficult to predict and quantify human error — yet most major events involve such error.
Actual incidents, on the other hand, provide hard information as to how things can go wrong, thus helping to cut through wishful thinking, prejudice, ignorance and misunderstandings. The root cause analysis that follows an incident investigation will help identify weaknesses and limitations in a facility’s management system, thereby reducing the chance of recurrence of similar incidents.
Another reason for emphasizing the importance of incident investigation in the process industries is that process safety management (PSM) systems — of which Incident Investigation and Analysis constitutes one element — have been in place in many cases for more than twenty years. Many of these facilities have made good progress in meeting regulatory requirements. However, the fact that such systems can ‘survive an audit’ and are working well on paper does not mean that they are as effective at actually improving safety as they might be. Incident investigations help identify how the elements of PSM really are functioning, and can provide management with insights as to how the systems can be improved.
Table of Contents
Introduction
Management Level
Line Supervision
Facility Management
Executive Management
Industry Regulations and Standards
Incident Investigation and Analysis Philosophy
Trust and Candor
Listen to the Facts
Technical Expertise
Root Cause Analysis
Difficulties with “Root Cause”
Ockham’s Razor
Project Management
Attorney-Client Privilege
Blame and Fault-Finding
Management Trust
Early Reporting of Bad News
Management Pressure
Safety as a Cause of Incidents
Communications
Technicians
Mid-Level Managers
Senior Managers
Definitions
Incident
Accident
Near Miss / Hit
Potential Incident
High Potential Incident
Incident Investigation Steps
Step 1 — Initial Investigation
Step 2 — Evaluation and Team Formation
Step 3 — Information Gathering
Step 4 — Timeline Development
Step 5 — Root Cause Analysis
Step 6 — Report and Recommendations
Step 1. Initial Investigation
The ‘Go Team’
Immediate Actions
Team Preparation
Drug and Alcohol Testing
Incident Report Form
Incident Number
Title
Location, Date and Time of Event
Duration of Event
Date and Time of Report
How Observed
Person(s) Reporting
Preliminary Ranking
Incident Type
Incident Flags
First Description of Event
Immediate Corrective Actions Taken
Witnesses
Contractor Involvement
Detailed Location
Consequences
Emergency Response
Security Issues
System Alert
Incident Owner / Department
Notes and Attachments
First Management Report
Step 2. Evaluation and Team Formation
Evaluation
Team Formation
Outside Investigators
Corporate Support
Team Members
Sponsor
Incident Owner
Facility Manager
Lead Investigator
Administrator
Area Supervisor
HSE Representative
Process Safety Management Coordinator
Employee Representative
Process / Facilities Engineer
Maintenance Technicians
Subject Matter Experts
Contractors / Vendors
Emergency Response Specialists
Attorneys
Charter / Terms of Reference
Team Member Qualifications
Objectivity
Common Sense
Jumping to Conclusions
Haughtiness and Empathy
Understand Incident Investigation Methodology
You Do Know What You Don’t Know
Understand Process Systems
Logical Thinking / Painstaking
Step 3. Information Gathering
Interviews
Interview Guidelines
Regulatory / Legal Interviews
Witness Interviews
Interviewer Attributes
Rapport and Trust
Technical Skills
Critical Factors Recognition
Objective
Effective Note Taking
Management Interviews
Documentation
Engineering Information
Operating Information
Instrument Records
Log Books, Maintenance Records and JSAs
Hazards Analysis Reports
Management of Change Records
Operating Manuals / Procedures
Incident Investigations and Audits
Vendor Data
Field Information
Damage Assessment
Photographs and DVDs
Closed Circuit Television
Instrument Records
Testing / Lab Analysis
Step 4. Timeline Development
Timeline Steps
Section 1 — Events Prior to the Incident
Section 2 — The Incident
Section 3 — Post-Incident Response
Timeline Construction
Conditions
Multiple Timelines
Timeline Table
Background Information
Step 5. Root Cause Analysis
Management Action
Levels of Root Cause
Single Incidents
Multiple Incidents
Types of Root Cause Analysis
Argument by Analogy: Story Telling
False Extrapolation
Linearity
World Views
Barrier Analysis
Categorization
Equipment Failure
Human Error as a Root Cause
Process Systems Failure
System Analysis
Why Trees
Single Chain of Events
Wrong Chain
Fault Tree Analysis
Linkage of Fault Trees to the Timeline
Common Cause Events
6. Report and Recommendations
Levels of Recommendation
Short Term Recommendations
Intermediate Recommendations
Long Term Recommendations
Industry Guidance
Report Structure
Executive Summary
What Happened?
What Could Have Happened?
What Was the Cause?
What Actions Should Be Taken?
Recognition
Terms of Reference
Reason for Selection
Sequence of Events
Consequences
Root Causes
Other Hazards
Recommendations
Attachments
Attachment A — Regulations and Standards
Attachment B — Root Cause Analysis
Attachment C — Organization Chart
Attachment D — Review of Similar Events
Attachment E — Investigation Team
Attachment F — Review of Modern Designs
Attachment G — Index to Pictures and Documents
Attachment H — Detailed Timeline
Issuing the Report
Writing the Report
Presenting the Report
Follow Up and Recommendations Tracking
Legal Issues
Information Security and Chain of Custody
Record Retention
Removing Evidence
File Systems
Incident / Risk Register
Feedback
Lessons Learned
Incident Data Bases
National Response Center (NRC)
Accidental Release Information Program (ARIP) Database
CFOI (Census of Fatal Occupational Injuries)
Major Accident Reporting System (MARS)
Marsh & McLennan Reviews
Annual Loss Prevention Symposia
Process Safety Beacon
Government Agencies
Copyright © Ian Sutton. 2018. All Rights Reserved.