A multinational energy company in Canada was motivated to shift to a proactive, risk-based maintenance strategy to improve the operational efficiency of its offshore production platforms.
Historically, the machinery and maintenance teams used a time and routine-based approach to maintain their offshore platforms. While the existing approach was effective, it was time-intensive, inefficient, and prone to errors due to siloed systems.
The machinery teams responsible for the asset health of platform equipment used multiple systems to monitor asset health, navigating between data dashboards, sensor analysis, maintenance logs, and case management systems. As each system stores different sets of data, when a machinery engineer receives an alert from the central surveillance team, he would have to spend significant time and effort piecing together data from various systems to create a maintenance plan.
Moreover, the existing systems flooded central surveillance and platform machinery teams with false alerts every year. The central surveillance team received a staggering 3,600 distinct alerts in a year (on average 10/day), 360 of which are passed onto the machinery engineer for further investigation. The machinery engineer would then spend upwards of 10 hours going through the investigating process for each alert. The alerts severely limited the capacity of both teams to work on higher-value activities.
Furthermore, the manual process of gathering and piecing together data from multiple systems lacks a feedback loop, making it a challenge for continuous improvement. The system is unable to learn from its past alerts and field feedback and adjust to improve future alerts. These limitations led to potential over-maintenance of equipment, which is not only costly but also high-risk.
Recognizing the need for a more holistic and risk-based approach, the energy company partnered with C3 AI to deploy C3 AI Reliability, a market-leading AI-based predictive maintenance application. The initial phase is focused on two turbine-driven gas compression trains on one offshore production platform. The company wanted a solution that could provide an accurate and comprehensive view of system risks and enable optimized allocation of precious engineering resources to the highest-risk systems.
Data Science Approach
First, the C3 AI team unified data across relevant existing systems to remove data siloes and create a unified data image. More than two years of time-series data (100M+ rows), 800 tags, P&IDs, and equipment diagrams were automatically ingested onto the C3 AI Platform and mapped to the C3 AI Reliability asset data model, where users can visualize the asset relationships and hierarchy on C3 AI Reliability’s comprehensive dashboard. Then the C3 AI team normalized, treated, and processed all the time-series data in preparation for ML modeling.
Figure 1. Live Data Integration Across Enterprise Systems
Anomaly Detection ML Model
Leveraging the unified data, the next step was to create a machine-learning model to detect asset anomalies. C3 AI Reliability provided state-of-the-art machine learning techniques for anomaly detection, including semi-supervised autoencoders. The C3 AI team configured 20 autoencoder-based models to detect equipment failure and anomalies on the most critical equipment on these compression trains. These autoencoder models were trained to learn normal sensor signals using time-series data from periods when the equipment was under normal operation. Then the trained models were used to generate the expected normal sensor signals, which were compared against actual sensor measurements, and the difference was noted as a risk score. The more the sensor measurements deviate from expected sensor values, the higher the risk score and risk of failure for the equipment.
Figure 2. Example of Autoencoder Architecture
Alerting Logic Configuration
After creating the initial set of autoencoding models, the C3 AI team worked with industry experts from C3 AI and the energy company to incorporate subject matter expertise into the machine-learning models. Alerting logic was configured by the company based on their experience and objectives, such as the minimum threshold of risk score and maximum frequency to trigger an alert. The model also automatically retrieved the feature tag contributions to the risk score so the user can easily identify the top contributing tags that led to the alert or failure. The tag contributions can be further mapped to the C3 AI Reliability Failure Mode Library to help users better understand and accelerate root cause analysis.
To enable a comprehensive monitoring approach that also encompasses auxiliary equipment such as dry gas seals, lube oil systems, and air inlet systems, the C3 AI team extended the Sensor Health module - a native data quality and alerting framework – on top of C3 AI Reliability to include 131 additional threshold-based alarms for monitoring anomalies on auxiliary equipment.
Early Proven Result
Within the initial phase, the anomaly detection models successfully identified 24 major events in the 2-year historical dataset, all of which were validated by the company’s subject matter expert team as a comprehensive set of events that were worthy of investigation. In fact, the application identified three major events that were not detected by the company’s existing monitoring systems, proving the tangible value of the C3 AI Reliability application. Moreover, the models achieved a precision of 81% and recall of 100%, because they were able to capture all major failure events while keeping the false positive rate low. An example event successfully detected in advance by the model is discussed below:
Figure 3. Example of Early Warning before Eventual Shutdown Event
The top two plots show how the reconstructed pressure and temperature ratios (red) deviated from the actual measurements (blue) significantly in early September 2021, which led to the increase in risk score and an alert raised in the third chart. The red vertical bar shows how there was indeed an eventual shutdown event a few weeks after the alert was raised. By raising an alert before an impending shutdown, the application allows the company to take proactive maintenance actions and avoid costly unplanned downtime and production loss.
In this initial phase of the C3 AI Reliability implementation, the application successfully
• Identified 24 major events on historical data via 34 rich and interpretable alerts – three of which were not captured by existing monitoring approaches.
• Achieved a 99% reduction in alert noise from an estimated 3,600 alerts that central surveillance engineers contend with annually.
• Achieved a 95% reduction from the 360 alerts that are distilled down to the on-site machinery engineering teams for evaluation.
Engineers can easily understand alerts via C3 AI’s user-friendly interface with additional information, such as feature contributions, failure modes, prescriptive recommendations, and comprehensive tag and metric trending capabilities. These interpretable and prioritized AI-driven insights drive operating efficiency and prioritization for the energy company.
In addition to major event identification and significant reductions in alerts, the C3 AI application unified data across several systems currently used for event identification and alert triaging. By unifying monitoring systems and data in one place, the company estimates a reduction in alert triaging time from ten hours to one hour, saving nine hours per alert.
Figure 4. Major Reductions in Alerts and Time on Evaluation and Triaging
Key Success Factors
The underlying infrastructure of the C3 AI Reliability application and the C3 AI Platform provided the foundation for the success of this pilot.
Fast Live Model Deployment
One key success factor was the speed with which developed models went into live production. The C3 AI Reliability application streamlines the model go-live process by leveraging the built-in Model Deployment Framework to integrate streaming data, trained model artifacts, and the UI. Once the model is ready, and the streaming data is in place, deploying the model for live inference is just one click away. Thanks to such capabilities, our first model went live in week five of the initial phase. The speed-to-live-production provided the company with an end-to-end application experience early on and the C3 AI team the opportunity to integrate the company’s feedback in a timely manner.
Another advantage of the C3 AI Reliability application is the scalability of the solution. The application provides pre-configured asset templates and pre-trained model templates to speed up the onboarding of new assets. Asset templates embed the knowledge of subject matter experts on expected sensors and data models. For instance, the C3 AI team leveraged pre-configured templates to model the pressure and temperature ratios of a compressor. The application also provides bulk training APIs that allow users to model a large number of assets at scale by leveraging distributed computing in the back end.
With this new proactive monitoring approach, the company can avoid unnecessary maintenance and free up engineers who previously spent multiple hours per week on surveillance to focus on higher-value production optimization opportunities. Given the continued success of the initial phase, the company extended the program and plan to scale out C3 AI Reliability to additional asset classes and use cases, including process optimization, water injection, and virtual metering.
About the Authors
Chuan Tian (author) is a Senior Data Scientist at C3 AI where he is primarily focused on projects related to C3 AI Reliability. He is passionate about utilizing cutting-edge machine-learning techniques to solve enterprise problems at scale. Chuan has a Ph.D. in Energy Resources Engineering from Stanford University.
Siddharth Ramesh (author) is a Solutions Leader at C3 AI, focused on serving the energy segment. In this role, Sid works with high-performing technical teams to identify high-impact business opportunities, scope them to tractable AI/ML problems, and drive the delivery of solutions with compelling business value. Sid holds a BSE in Mechanical Engineering from the University of Michigan, Ann Arbor.
Further Reading & Acknowledgements
 Chris Kuo. “Convolutional Autoencoders for Image Noise Reduction.” https://towardsdatascience.com/convolutional-autoencoders-for-image-noise-reduction-32fce9fc1763