A true systems approach to asset failure

Scottish Water, with BMA’s decision-making tool, is now leading the way in developing a truly holistic approach to quantifying water quality risk. The combination of BMA’s underlying model and a series of user-friendly dashboards will allow Scottish Water to rapidly identify asset criticality and failure pathways in its water treatment works, leading to faster and better decision-making enabling a true systems approach to asset failure.

In 2019, the Drinking Water Inspectorate Risk Management Assessment Scheme (DWIRMAS) defined a new standard for Drinking Water Safety Plan (DWSP) processes. Scottish Water was at the forefront of UK water utilities’ efforts to gain compliance with the new standard.

The standard spans entire water systems, mandating safe practices from source to tap, including:

  • allowing for all relevant hazards and hazardous events
  • continually reviewing DWSPs and updating them with new data
  • driving and prioritising actions that inform investment processes
  • conducting verification exercises to assure quality and compliance.

Scottish Water must assess and record risk for each of its 230 water treatment works, which include many different legacy designs. Without a single platform or tool, it is challenging to maintain consistent methodologies and data quality across the different sites, and to comply with the system-spanning, source-to-tap standard.

Now, Scottish Water is leading the sector in developing a truly holistic approach to quantifying water quality risk that will improve the consistency of risk assessment and enable future dynamic risk assessment.

Here we specifically focus on an approach for detailed technical assessment of disinfection system risk and resilience. Disinfection systems are an important sub-process within water treatment works to deactivate or kill pathogenic microorganisms. They involve multiple assets and sensors, typically controlled by a Programmable Logic Controller (PLC). In asset systems like these, with high degrees of interdependency, it can be challenging to gain a holistic understanding of system resilience.

Traditional approaches do not fully capture the functional interdependencies between assets. They tend to overestimate risk where there is significant redundancy, and underestimate risk where there is significant dependency, between system components.

To better understand the risk and resilience of its disinfection systems, Scottish Water sought the development of a proof of concept model. The model needed to analyse and quantify:

  • the resilience of specific treatment works to hazardous events (including individual events and combinations of two or more events)
  • the resilience of specific sites to component failures
  • the criticality of assets and controls.

Crucially, the model needed to be able to rapidly analyse the impact of any hazardous event, including novel events, on the whole population of 230 disinfection systems. If a previously unidentified hazardous event caused a failure at a particular site, Scottish Water would then be able to rapidly assess which other sites would require additional controls to defend against that hazard.

Business Modelling Associates (BMA) developed a conceptual methodology linking hazardous events to overall disinfection system function. The methodology accommodates both individual and combined hazardous events, and is not restricted to known events.

Another key feature of the methodology is that it is not limited to the physical assets on site; it also encompasses the PLC, the control room, and the actions of the operator. All these elements proved essential to accurately predict the impact of hazardous events on the system.

Effective and user-friendly dashboards and visualisations were also an important part of the project. As part of the proof of concept model, BMA created dashboards to explore the effects of single and multiple component failures, individual and combined hazardous events, and asset criticality.

Not all water treatment works are the same, or have the same component failure pathways. To create the component failure dashboard, BMA assessed the fault trees for a standard-specification disinfection system and for three specific Scottish Water treatment works. Over 700,000 permutations were run for the three sites, comprehensively testing system resilience against a range of hazardous events and individual component failures.

To calibrate and test the hazardous events dashboard, Scottish Water provided details of 25 historical system failure events, including 46 unique hazardous events. The tests highlighted that assessing such events individually, rather than assessing combinations of two or more hazardous events happening simultaneously, significantly underestimates risk. All 25 historical system failures involved at least two simultaneous hazardous events, and 80 per cent resulted in water entering the supply without being disinfected (in all events the failure was brief and customers were protected by operational mitigations and post-disinfection blending). But when the 46 hazardous events were assessed individually, only two resulted in undisinfected water entering the supply.

Two asset criticality dashboards allowed BMA and Scottish Water to explore the layers of controls in more depth. Figure 1 is an example of a visualisation, showing that a single failure of a single dosing pump on a single site will not result in under-disinfected or undisinfected water entering the supply; because of redundancy within the system, two more component failures would have to occur for undisinfected water to enter the supply.

Analyses like this can be used to rapidly quantify the risk of any changes to the design of the disinfection system or to its component assets, and to inform the design of new disinfection systems.

The overall result is a model of system reliability, not of system capability. The combination of this underlying model and a series of user-friendly dashboards will allow Scottish Water to rapidly identify asset criticality and failure pathways in its water treatment works, leading to faster and better decision-making. It enables a true systems approach to asset failure, in line with the new DWIRMAS standard.

And because the model was developed with scalability in mind, Scottish Water will be able to roll out this solution across its 230 water treatment works to:

  • identify the most critical assets and controls
  • quantify how a site’s resilience changes when an asset or control is added or removed
  • compare specific disinfection systems to the standard-specification system and identify which additional assets or controls would increase resilience the most
  • extend the model to other water treatment processes, such as pH correction and coagulation.

Related Insights