The Fundamentals of Risk

Fundamentals of risk management in the process industries

Risk

This article discusses risk fundamentals in the process and energy industries.The word 'risk' has a wide range of meanings. In the context of process plant management it can be categorized in one of three ways.

The first type of risk is to do with catastrophic events and serious safety violations. They can also have a major effect on a company's financial performance to the point where the organization can be driven into bankruptcy. Such events can lead to totally unacceptable loss of life, major environmental problems, huge economic shortfalls, very bad public relations, civil litigation and even criminal prosecution. Although such events are rare, they tend to have a major impact on the development of management systems and regulations.

The second type of process risk is to do with troubleshooting in which equipment items are not operating as they should. Examples of "trouble" are:

  • A critical pump breaks down on numerous occasions.
  • Steam consumption is up 10%, and no one seems to know why.
  • The quality of the final product is erratic.

The risk associated with events such as these is primarily economic. However, if such situations are not taken care of properly, unsafe conditions could develop, not least because workers are required to break down equipment or work on electrical systems, thus exposing them to potential safety problems.

The third type of risk is to do with equipment reliability and system availability. This is related to troubleshooting but implies a higher degree of predictability. Items are predicted to fail with a certain frequency, and they are repaired and put back into service in before they actually fail.

The management systems used to control these different types of risk have much in common with one another but they are not identical, particularly when it comes to the prevention of catastrophic events.

Components of Risk

Risk, which always implies some type of negative outcome, is made up of three components:

  1. Hazards;
  2. The consequences of the hazards; and
  3. The predicted frequency (likelihood) of occurrence of the hazards.

These three terms can be combined as shown in Equation (1).

     RiskHazard  =  Consequence  *  Predicted Frequency . . . . . . (1)

Equation (1) shows that risk can never be zero - a truth not always grasped by members of the general public or the news media. Hazards are always present within all industrial facilities. Those hazards always have undesirable consequences, and their likelihood of occurrence is always finite. The consequence and likelihood terms can be reduced in size, but they can never be eliminated. The only way to achieve a truly risk-free operation is to remove the hazards altogether (or, with respect to safety, to remove personnel from the site).

Hazards

The first term in Equation (1) is the hazard. A hazard is a condition or practice that has the potential to cause harm, including human injury, damage to property, damage to the environment or some combination of these. The key word in the definition is "potential". Hazards exist in all human activities but rarely result in an incident. For example, walking down a staircase creates the hazard of "falling down stairs", with the consequence of an injury, ranging from minor first-aid to a broken limb or even death. However most people, most of the time, manage to negotiate a flight of stairs without falling.

Some of the hazards associated with the second standard example that we use in our books and ebooks are:

  1. Tank T-100 is pumped dry.
  2. Tank T-100 overflows.
  3. P-101A seal fails.
  4. V-101 is over-pressured.
  5. Liquid flows backward from V-101 into T-100.
  6. Other.

One of the greatest challenges to do with practical risk analysis is defining the scope of the hazard term. For example, with respect to the second hazard, the overflow of T 100, simply to say that RM-12 overflows from T-100 is not enough. Clearly there is an enormous disparity between having a few drops spill into a closed drain system, and having thousands of liters of the chemical pour on to the ground and then flow into the local waterways.

Similarly, with regard to the fifth hazard - "Liquid flows backward from V-101 into T 100" - there is a world of difference between a reverse flow of a few milliliters of RM 12 lasting for a few seconds and a reverse flow of thousands of kilograms of material lasting for an hour or more.

The final hazard listed is "Other". This term is included as a reality check. No risk management team, no matter how well qualified the members may be or how much time they put into the analysis, can ever claim to have identified all hazards. Throughout this book the 'other' term is used in all types of analysis in order to keep everyone on their toes and thinking creatively as to 'what might be'.

Consequence

Once the hazards associated with a process have been defined, the corresponding consequence and likelihood values can be determined. The consequence of an event usually falls into one of three categories:

  • Safety;
  • Environmental; and
  • Economic

For many companies, safety tends to be the driver; they reason that, if they can avoid people being hurt, then the environmental and economic performance will follow along.

Predicted Frequency

Each event has a predicted frequency of occurrence, such as once in a hundred years.

The word "predicted" is used to point out that the future frequency of an event is not necessarily the same as its historical frequency, particularly if the risk management program is effectively reducing the chance of a failure from occurring.

Nor is frequency the same as probability. An item has a frequency of failure measured in inverse time units. The consequences of that event may be mitigated by a safeguard, which has a (dimensionless) probability of occurrence. For example, high level in a tank may occur once every year. However, the tank has level control instruments that detect high level and stop the flow of liquid into the tank. These instruments may have a probability of failure of 0.01 or 1%. Therefore the likelihood of a system failure is 0.01 yr-1, i.e., once in 100 years.

Overall Risk

Figure 1 shows that an inverse relationship generally exists between consequence and frequency. For example, in a typical process facility, a serious event such as the failure of a pressure vessel may occur only once every ten years, whereas trips and falls may occur weekly.

Risk fundamentals in the process and energy industries Likelihood vs Consequence
Figure 1 Likelihood vs Consequence

FN Curves

The total risk associated with a facility is obtained by calculating the risk value for each hazard, and then adding all the individual risk values together. The result of this exercise is sometimes plotted in the form of an FN curve as shown in Figure 2 in which the ordinate represents the cumulative frequency (F) of fatalities or other serious events, and the abscissa represents the consequence term (usually expressed as N fatalities). In Figure 2 it is projected that the organization will have a fatality about once every fifty years, whereas a major event (say more than 10 fatalities) will occur every thousand years or so. Saraf (2009) provides an example of a FN curve for the frequency of fatalities in the process industries during the years 1911-1995.

Because the values of F and N typically extend across several orders of magnitude both axes on an FN curve are logarithmic. (More sophisticated analyses will actually have multiple curves with roughly the same shape as one another. The distribution of the curves represents the uncertainty associated with predicting the frequency of events.) The shape of the curve itself will vary according to the system being studied; frequently a straight line can be used.

Risk fundamentals in the process and energy industries FN Curve for evaluating risk
Figure 2 Representative FN Curve


FN curves are generally used when making industry-wide decisions; they would not generally be calculated for individual process facilities. However, if two types of technology are being considered, their respective FN curves can be compared, as illustrated in Figure 3, which compares technologies A and B.

Risk fundamentals in the process and energy industries Comparative FN curves
Figure 3 Comparative FN Curves

Safeguards

The term "safeguards" can cause confusion in risk analysis. In particular, when determining the likelihood of an event, it is important to know whether credit is to be given for safeguards

Presence of Persons

One factor that radically affects the safety risk associated with a hazard is the presence of persons in the area of the event. For example, the consequence of a seal failure from P-101 A/B could be a fire. If no one is present the safety impact is zero. The economic loss may be great but no one will be hurt. However, if someone is present they could be killed. Yet it can be very difficult for a risk analyst or for a risk management team to forecast whether or not someone will be present at the time of the event.

In some cases, the probability of someone being present is higher than would normally be anticipated because those people are there to work on what appears to be a relatively minor problem. For example, at a chemical facility in Texas a tank exploded killing seventeen workers. The area in which the explosion occurred was normally deserted; but the workers were there to correct the conditions that led to the explosion. On another occasion a refinery in west Texas experienced a major explosion and fire. Many major equipment items were destroyed, and the smoke from the fire was so great that an adjacent freeway had to be closed down. If a risk management team had modeled the event ahead of time they would surely have postulated multiple fatalities and serious injuries. In fact, no one was hurt.

Management must be particularly cautious when sending operators and maintenance workers into a hazardous situation in order to correct problems. It is likely that additional safeguards and precautions will be needed.

In other situations it may be found that the presence of a hazardous situation actually reduces the number of people at risk. For example, in the Gulf of Mexico during the period 2003 - 2005 some 164 offshore platforms were either lost or seriously damaged due to hurricanes. The economic loss was high, yet the number of fatalities and serious injuries associated with the storms was zero; the reason being that, whenever a hurricane is brewing in the general location of a platform, the crew is evacuated. In this case, the probability of workers being present was much less than would have been anticipated by a risk management team.

In other cases the people may be located close to a hazard for reasons that no risk analyst could reasonably foresee. Probably the best known example of this occurred at the Texas City refinery explosion in the year 2005. Adjacent to the site of the release were temporary trailers used for project workers. All of the fatalities were to people working in those trailers (CSB 2007).

Another example of the unexpected presence of people at an incident site occurred at a facility in the southern United States. A high pressure, high capacity pump that had just been brought on line for the first time was exhibiting serious mechanical problems. About half a dozen people, including some senior managers, were gathered in the area to find out what was going on. Suddenly the pump's seal failed, shooting out a large jet of a high temperature caustic liquid. Fortunately the direction of the failure was away from the personnel. The liquid caused some environmental damage but none of the people in the area were hurt. They were lucky - they could have been seriously burned had the jet been directed toward them. Once more, no risk analysis could have reasonably anticipated that a large number of people would be present and that the jet would point the way it did.

In the long-term, one of the best means of improving safety is to develop systems that are so automated that very few humans are required to be in the vicinity of operating equipment so that they are not exposed to hazards.

Single Contingency Events

When a facility is being designed the single risk concept is typically applied. It specifies that only one emergency (or group of interrelated emergencies) will occur at one time. The probability that multiple unrelated incidents would occur simultaneously is so low that it is not a credible consideration. Therefore, when designing a facility it is normal to design a safety device to handle the largest single risk. For example, pressure vessels can be subject to over-pressure for a number of separate causes such as external fire, pump pressure and internal chemical reactions. The safety relief valve will be designed for the worst of the identified cases. Multiple unrelated incidents are examined through the use of common cause effect analysis and with techniques such as fault tree analysis. When it is plausible that a relief device could be called upon to handle multiple releases, such as may occur during a cooling water failure, capacity should be provided for this emergency.