For safety-critical systems, predicting when components will fail is a million-dollar question. A key
piece of information in predicting failures is the topological structure of the components and the relations
of different components with each other. While the information on these connections is available on
engineering diagrams, parsing the engineering diagrams and extracting the metadata from the diagrams has
long been tedious work that is difficult to automate. In this blog post, we will go through the steps for
building a simple diagram parsing model and demonstrate how this model can be made production-ready using
the C3 AI Application
Platform.
Along the way we will highlight some of the key features and benefits of the platform, namely:
A unified data model with simple APIs to access and manage all data.
Simple APIs for converting pre-existing machine learning/deep learning models into the C3 AI
MLPipe.
Seamlessly persisting models in a database and keeping track of all of modeling iterations.
Using the resources available in the C3 AI cluster for model training to not be limited by the size
of the Jupyter container.
Using the model as an individual step of complex ML pipelines without worrying about managing runtime
for each step and passing data between these steps.
Overview of the Problem
Below is an example of a piping & instrumentation diagram (P&ID). P&ID engineering diagrams
contain valuable information about the sensor and equipment locations as well as relations among these
sensors and equipment. Manual extraction of sensor and equipment locations and relationships from P&ID
diagrams is a time-consuming task that relies on domain experts. The goal of diagram parsing is to create an
automated pipeline that can identify each component on an engineering diagram, recognize the “id” and “text”
related to each symbol, and also identify connections between different components. Finally, using the
identified component ids, we can link each component with other associated data sources (like time series
data) for additional modeling tasks.
Figure 1: Raw P&ID engineering diagram (left) vs parsed diagram (right)
Figure 2: Example of using a parsed diagram for finding sensor time series related to an asset
In this blog post, we will build a simple diagram parsing pipeline on a data science notebook from
scratch using native Python and C3 AI Python SDK. Specifically, we will:
Explore the training data set using the C3 AI Python SDK
Build and train an object detection model using Keras
Convert the object detection model to a C3 AI MLPipe and persist the model
Build a C3 AI MLPipeLine that combines the object detection model with an OCR pipe for text
detection
Each diagram in our data set contains at most one symbol. We rely on a data model from an existing C3 AI
Application, C3 AI Reliability, and assume raw data required for this demo are
loaded into the application. The data contains a set of diagrams with annotated symbol bounding boxes and
another set of diagrams without any annotations.
Prototype an Object Detection Model in Python
Data Exploration
For visualizing the diagrams and exploring the available data, we first import matplotlib and a helper
function for converting instances of C3 AI Types into pandas DataFrame.
import matplotlib.pyplot as plt
c3_grid = c3.DiagramParsingTypeUtils.fetchGrid
To build an object detection model, we will use a training set of labeled diagrams along with the
coordinates of the bounding boxes for each diagram. As a first step, we load our training images and the
bounding boxes labels. Our diagrams are stored in the c3.PNGDiagram type.
C3 AI Type System provides simple APIs to fetch data from its distributed data stores that are backed by
various database technologies like Cassandra and Postgres. The detailed implementation and the query details
for managing data are abstracted away by simple APIs like fetch or
remove and optimized by the platform. This enables data scientists and application
developers to spend less time on building and debugging their queries and focus on the application at
hand.
First let’s count all the diagrams that are available in our environment:
print(f'There are {c3.PNGDiagram.fetchCount()} diagrams persisted')
There are 4999 diagrams persisted
Next, let’s get the ids and the creation timestamp of 5 sample diagrams:
print(f'There are {c3.PNGDiagram.fetchCount()} diagrams persisted')
c3_grid(c3.PNGDiagram, ['id', 'EXT', 'meta.created'], limit=5)
There are 4999 diagrams persisted
id
EXT
meta.created
0
001a7cf3-8db8-4d2e-9468-288a25638171
.png
2022-01-18 18:25:30+00:00
1
00205a8e-4a71-403d-9893-d2850e04369c
.png
2022-01-18 18:26:32+00:00
2
002d0225-1074-4605-88b5-dc8f1f5db9f9
.png
2022-01-18 18:24:11+00:00
3
00301408-b2a0-47b2-911f-2988a69c7146
.png
2022-01-18 18:24:55+00:00
4
004197ab-a070-4cf1-99a2-43554a687e1f
.png
2022-01-18 18:23:13+00:00
C3 AI’s Python SDK allows accessing the data in a pythonic way without writing queries for a specific
type of database. As an example, we can fetch the diagrams matching a specific filter, in this case, the
ones that have a value for bounding box field.
remove a persisted diagram or remove all the persisted diagrams from the backend data store
# you can also remove it by
specific_diagram.remove();
# or just remove all of them # c3.PNGDiagram.removeAll()
The fetched data are directly converted into python objects by C3 AI’s Python SDK and they can be used in
our notebook just like any other project in Python.
n_examples = 8
sample_diagrams = c3.PNGDiagram.fetch({'limit': n_examples}).objs
plt.figure(figsize=(20, 20*n_examples))
for i inrange(n_examples):
d = sample_diagrams[i]
plt.subplot(f"1{n_examples}{i+1}")
plt.imshow(d.toImage())
plt.show()
Model Architecture
We will use an anchor box regression approach in this implementation which is a simplified version of the
Region Proposal Net used in Faster R-CNN object detection architecture.
At a high level, this method assumes there is a box with predefined height and width located in the
center of an image, which we refer to as the anchor box. The model then tries to answer the following 3
questions:
Is there any symbol significantly overlapping with the anchor box?
How should the anchor box be moved so that the anchor box center aligns with the target symbol
center?
How should the anchor box be rescaled so that its height and width match the dimension of the
symbol?
For simplicity, we choose the anchor box size to be the same as the image size (128, 128).
Figure 3: Three Questions Answered by the Object Detection Model
These questions can be formulated into a classification problem with 1 output (probability, p) and a
regression problem with 4 outputs (translations dx, dy, and scaling factors rx, ry).
Figure 4: Object Detection Model Architecture
We will use the architecture with a few stacked convolution layers to implement the anchor box regression
model. Binary cross-entropy loss will be used for optimizing the classification objective and mean squared
error will be used for optimizing the regression objective. We will implement the architecture with
Keras.
import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Lambda
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.losses import MeanSquaredError
First, a few stacked convolutional layers are used to extract high-level features from the
diagrams
Then we flatten the features from 3D tensors into 1D arrays and use stacked dense layers to reduce the
dimensions. Finally, we use a dense layer with an output size of 4 to predict the translation vector on the
horizontal direction, translation vector on the vertical direction, scaling factors for the height, and
scaling factor for the width of the bounding box. The dense layer with an output size of 1 activated by the
sigmoid function can generate a probability and indicates if the image actually contains a target
symbol.
# flatten for prediction
flat = Flatten()(lvl_3)
reduced = Dense(units=64, use_bias=True, activation='relu')(flat)
reduced = Dense(units=32, use_bias=True, activation='relu')(reduced)
# If there is any symbol inside the image? Probability
cls_output = Dense(units=1, use_bias=True, activation='sigmoid')(reduced)
# What are the translation values and scaling factors?
reg_output = Dense(units=4, use_bias=True, activation= 'linear')(reduced)
# jointly optimize the regression and classification losses
cls_loss = Lambda(lambda x: tf.keras.losses.BinaryCrossentropy()(*x))([cls_target, cls_output])
reg_loss = Lambda(lambda x: tf.keras.losses.MeanSquaredError()(*x))([reg_target, reg_output])
all_loss = reg_loss + cls_loss
mdl = Model(inputs=[images, cls_target, reg_target], outputs=[cls_output, reg_output])
mdl.add_loss(all_loss)
mdl.add_metric(cls_loss, aggregation='mean', name='cls loss')
mdl.add_metric(reg_loss, aggregation='mean', name='reg loss')
mdl.compile(optimizer=Adam(0.0025))
mdl.summary()
WARNING:tensorflow:Output dense_6 missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to dense_6.
WARNING:tensorflow:Output dense_7 missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to dense_7.
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_4 (InputLayer) [(None, 128, 128, 3) 0
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 64, 64, 128) 1664 input_4[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 64, 64, 128) 512 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 32, 32, 64) 32832 batch_normalization_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 32, 32, 64) 256 conv2d_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 16, 16, 32) 8224 batch_normalization_5[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 16, 16, 32) 128 conv2d_6[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 8, 8, 8) 1032 batch_normalization_6[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 8, 8, 8) 32 conv2d_7[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 512) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 64) 32832 flatten_1[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) (None, 32) 2080 dense_4[0][0]
__________________________________________________________________________________________________
input_5 (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
input_6 (InputLayer) [(None, 4)] 0
__________________________________________________________________________________________________
dense_6 (Dense) (None, 1) 33 dense_5[0][0]
__________________________________________________________________________________________________
dense_7 (Dense) (None, 4) 132 dense_5[0][0]
__________________________________________________________________________________________________
lambda_3 (Lambda) () 0 input_6[0][0]
dense_7[0][0]
__________________________________________________________________________________________________
lambda_2 (Lambda) () 0 input_5[0][0]
dense_6[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_1 (TensorFlowOp [()] 0 lambda_3[0][0]
lambda_2[0][0]
__________________________________________________________________________________________________
add_loss_1 (AddLoss) () 0 tf_op_layer_add_1[0][0]
__________________________________________________________________________________________________
add_metric_2 (AddMetric) () 0 lambda_2[0][0]
__________________________________________________________________________________________________
add_metric_3 (AddMetric) () 0 lambda_3[0][0]
==================================================================================================
Total params: 79,757
Trainable params: 79,293
Non-trainable params: 464
__________________________________________________________________________________________________
Model Training
We can now create a data generator to convert the bounding box coordinates into the regression targets
that are normalized with the anchor sizes for easier model convergence.
Using the generator defined above, we can train our model using the labeled diagrams. We use the last 100
diagrams for validation.
all_diagrams = c3.PNGDiagram.fetch({'filter': 'exists(bounding_box)'}).objs
train_diagrams = all_diagrams[:-100]
valid_diagrams = all_diagrams[-100:]
train_g = sample_generator(train_diagrams, size, 32)
valid_g = sample_generator(valid_diagrams, size, 64)
valid_data = next(valid_g)
# you can also directly load the model here from the h5# from tensorflow.keras.models import load_model# mdl = load_model('rpn.h5')
mdl.fit(train_g, epochs=64, steps_per_epoch=32, validation_data=valid_data, verbose=0)
Model Inference
To show that the model that we just trained works as expected, we will test it using a diagram from a
holdout test set that does not have its bounding box or text attributes populated. Then we will use our
trained model to generate the bounding boxes and the texts within the symbol. As shown in the visualization
below, both the bounding box and the text field are empty in the beginning.
# run the model
imgs = np.array([d.toImage() for d in unlabeled])
mdl_input = [imgs, np.empty(len(imgs)), np.empty((len(imgs),4))]
cls_outputs, reg_outputs = mdl.predict(mdl_input)
Since we used relative translations and relative scaling factors as the regression target of our model,
we need to transform the model output to recover the coordinates of the bounding box.
# populate the bounding_box attribute of the diagramfor diagram, img, reg_output inzip(unlabeled, imgs, reg_outputs):
box = decode_result(len(img), *reg_output)
diagram.bounding_box = box
c3.PNGDiagram.upsertBatch(unlabeled);
As we can see, our model generates a bounding box that accurately captures the target symbol and achieves
the desired outcome.
unlabeled[0].show()
Text Recognition OCR
Now that we demonstrated building an object detection model to tell where the target symbol is,. we will
next demonstrate how to use a pre-trained OCR pipe that is readily available in the platform.
Using c3.OcrPipe we will extract the text inside the bounding box, and populate
the text attribute of the symbol.
As shown in the text field below, the OCR pipe correctly recognizes the id of the target symbol.
labeled_diagram.show()
Building a Production-Ready Pipeline with the
platform
Building a production-ready pipeline using the symbol detection and OCR models is very simple. C3 AI
Application Platform provides many out-of-the-box Types to convert TensorFlow, Keras, or PyTorch models
created in Python to instances of MLPipe.
Keras Pipe
As the first step, we encapsulate our trained Keras model as an instance of
a c3.KerasPipe. In one line, the native python model is converted to an instance
of a C3 AI Type and persisted to the platform.
# you can directly save a trained model
keras_pipe = c3.KerasPipe().upsertNativeModel(mdl)
Using a KerasPipe, the trained model, along with its hyperparameters can easily
be persisted. This simplifies keeping track of the details of all of your modeling iterations. Similar to
any other C3 type, we can fetch these pipes, or update or remove them with convenient APIs.
C3 AI Application Platform also provides utility types and functionalities to simplify the development of
a specific application. To simplify the development of a diagram parsing model, here we use
the SymbolDetectionPipe from the C3 AI Reliability Application. This type provides
utility functions to apply the logic for decoding the outputs from a symbol detection model and populate the
bounding box attribute of an input diagram.
We will use our KerasPipe as the core model for a
SymbolDetectionPipe.
WARNING:tensorflow:Output dense_6 missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to dense_6.
WARNING:tensorflow:Output dense_7 missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to dense_7.
With the text recognition pipe, the diagram also has the text attribute populated
And we can sync the current state of the diagram and save everything into the database.
to_parse.merge();
Creating a Multi-Step Machine Learning Pipeline
Finally, we can very easily build an end-to-end symbol detection and text recognition pipeline that can
process our diagrams and populate their bounding box and text fields.
Now that our end-to-end pipeline is persisted, it can be used in an application to process new diagrams
and save hundreds of hours of manual work for domain experts!!!
About The Authors
Josh Zhang is a Senior Data Scientist at C3 AI, where he developed
algorithms for multiple large-scale AI applications. He holds an M.S. in Mechanical Engineering from
Duke University and a B.S. in Mechanical Engineering from Lafayette College. Before C3 AI, he worked
on the development of a large-scale graph deep learning framework as a software engineer.
Amir H. Delgoshaie is a Data Science Manager at C3 AI, where he has
worked on the development and deployment of multiple large-scale AI applications for the utility,
energy, and manufacturing sectors. He holds a Ph.D. in Energy Resources Engineering from Stanford
University and master’s and bachelor’s degrees in Mechanical Engineering from ETH Zurich and Sharif
UT. Prior to C3 AI, he developed algorithms and software at various research and industrial
institutions.
by Josh Zhang and Amir H. Delgoshaie
Introduction
For safety-critical systems, predicting when components will fail is a million-dollar question. A key piece of information in predicting failures is the topological structure of the components and the relations of different components with each other. While the information on these connections is available on engineering diagrams, parsing the engineering diagrams and extracting the metadata from the diagrams has long been tedious work that is difficult to automate. In this blog post, we will go through the steps for building a simple diagram parsing model and demonstrate how this model can be made production-ready using the C3 AI Application Platform.
Along the way we will highlight some of the key features and benefits of the platform, namely:
Overview of the Problem
Below is an example of a piping & instrumentation diagram (P&ID). P&ID engineering diagrams contain valuable information about the sensor and equipment locations as well as relations among these sensors and equipment. Manual extraction of sensor and equipment locations and relationships from P&ID diagrams is a time-consuming task that relies on domain experts. The goal of diagram parsing is to create an automated pipeline that can identify each component on an engineering diagram, recognize the “id” and “text” related to each symbol, and also identify connections between different components. Finally, using the identified component ids, we can link each component with other associated data sources (like time series data) for additional modeling tasks.
In this blog post, we will build a simple diagram parsing pipeline on a data science notebook from scratch using native Python and C3 AI Python SDK. Specifically, we will:
Each diagram in our data set contains at most one symbol. We rely on a data model from an existing C3 AI Application, C3 AI Reliability, and assume raw data required for this demo are loaded into the application. The data contains a set of diagrams with annotated symbol bounding boxes and another set of diagrams without any annotations.
Prototype an Object Detection Model in Python
Data Exploration
For visualizing the diagrams and exploring the available data, we first import matplotlib and a helper function for converting instances of C3 AI Types into pandas DataFrame.
import matplotlib.pyplot as plt c3_grid = c3.DiagramParsingTypeUtils.fetchGrid
To build an object detection model, we will use a training set of labeled diagrams along with the coordinates of the bounding boxes for each diagram. As a first step, we load our training images and the bounding boxes labels. Our diagrams are stored in the
c3.PNGDiagram
type.C3 AI Type System provides simple APIs to fetch data from its distributed data stores that are backed by various database technologies like Cassandra and Postgres. The detailed implementation and the query details for managing data are abstracted away by simple APIs like
fetch
orremove
and optimized by the platform. This enables data scientists and application developers to spend less time on building and debugging their queries and focus on the application at hand.First let’s count all the diagrams that are available in our environment:
print(f'There are {c3.PNGDiagram.fetchCount()} diagrams persisted')
Next, let’s get the ids and the creation timestamp of 5 sample diagrams:
print(f'There are {c3.PNGDiagram.fetchCount()} diagrams persisted') c3_grid(c3.PNGDiagram, ['id', 'EXT', 'meta.created'], limit=5)
C3 AI’s Python SDK allows accessing the data in a pythonic way without writing queries for a specific type of database. As an example, we can fetch the diagrams matching a specific filter, in this case, the ones that have a value for bounding box field.
diagrams = c3.PNGDiagram.fetch({ 'filter': 'exists(bounding_box)', 'limit' : 5 }).objs
get a diagram with specific id
specific_id = diagrams[0].id specific_diagram = c3.PNGDiagram.get(specific_id)
remove a persisted diagram or remove all the persisted diagrams from the backend data store
# you can also remove it by specific_diagram.remove(); # or just remove all of them # c3.PNGDiagram.removeAll()
The fetched data are directly converted into python objects by C3 AI’s Python SDK and they can be used in our notebook just like any other project in Python.
plt.imshow(diagrams[1].toImage()) print('Bounding Box', diagrams[0].bounding_box)
Let’s visualize some additional training samples:
n_examples = 8 sample_diagrams = c3.PNGDiagram.fetch({'limit': n_examples}).objs plt.figure(figsize=(20, 20*n_examples)) for i in range(n_examples): d = sample_diagrams[i] plt.subplot(f"1{n_examples}{i+1}") plt.imshow(d.toImage()) plt.show()
Model Architecture
We will use an anchor box regression approach in this implementation which is a simplified version of the Region Proposal Net used in Faster R-CNN object detection architecture.
At a high level, this method assumes there is a box with predefined height and width located in the center of an image, which we refer to as the anchor box. The model then tries to answer the following 3 questions:
For simplicity, we choose the anchor box size to be the same as the image size (128, 128).
These questions can be formulated into a classification problem with 1 output (probability, p) and a regression problem with 4 outputs (translations dx, dy, and scaling factors rx, ry).
We will use the architecture with a few stacked convolution layers to implement the anchor box regression model. Binary cross-entropy loss will be used for optimizing the classification objective and mean squared error will be used for optimizing the regression objective. We will implement the architecture with Keras.
import numpy as np import tensorflow as tf import tensorflow.keras.backend as K from tensorflow.keras.models import Model from tensorflow.keras.layers import Input from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Lambda from tensorflow.keras.layers import BatchNormalization from tensorflow.keras.layers import Flatten from tensorflow.keras.optimizers import Adam from tensorflow.keras.losses import BinaryCrossentropy from tensorflow.keras.losses import MeanSquaredError
First, a few stacked convolutional layers are used to extract high-level features from the diagrams
size = 128 images = Input(shape=(size, size,3)) cls_target = Input(shape=(1)) reg_target = Input(shape=(4)) # just a few convolution layers lvl_0 = Conv2D( filters=128, kernel_size=(2,2), strides=(2,2), activation='relu', use_bias=True)(images) lvl_0 = BatchNormalization()(lvl_0) lvl_1 = Conv2D( filters=64, kernel_size=(2,2), strides=(2,2), activation='relu', use_bias=True)(lvl_0) lvl_1 = BatchNormalization()(lvl_1) lvl_2 = Conv2D( filters=32, kernel_size=(2,2), strides=(2,2), activation='relu', use_bias=True)(lvl_1) lvl_2 = BatchNormalization()(lvl_2) lvl_3 = Conv2D( filters=8, kernel_size=(2,2), strides=(2,2), activation='relu', use_bias=True)(lvl_2) lvl_3 = BatchNormalization()(lvl_3)
Then we flatten the features from 3D tensors into 1D arrays and use stacked dense layers to reduce the dimensions. Finally, we use a dense layer with an output size of 4 to predict the translation vector on the horizontal direction, translation vector on the vertical direction, scaling factors for the height, and scaling factor for the width of the bounding box. The dense layer with an output size of 1 activated by the sigmoid function can generate a probability and indicates if the image actually contains a target symbol.
# flatten for prediction flat = Flatten()(lvl_3) reduced = Dense(units=64, use_bias=True, activation='relu')(flat) reduced = Dense(units=32, use_bias=True, activation='relu')(reduced) # If there is any symbol inside the image? Probability cls_output = Dense(units=1, use_bias=True, activation='sigmoid')(reduced) # What are the translation values and scaling factors? reg_output = Dense(units=4, use_bias=True, activation= 'linear')(reduced) # jointly optimize the regression and classification losses cls_loss = Lambda(lambda x: tf.keras.losses.BinaryCrossentropy()(*x))([cls_target, cls_output]) reg_loss = Lambda(lambda x: tf.keras.losses.MeanSquaredError()(*x))([reg_target, reg_output]) all_loss = reg_loss + cls_loss mdl = Model(inputs=[images, cls_target, reg_target], outputs=[cls_output, reg_output]) mdl.add_loss(all_loss) mdl.add_metric(cls_loss, aggregation='mean', name='cls loss') mdl.add_metric(reg_loss, aggregation='mean', name='reg loss') mdl.compile(optimizer=Adam(0.0025)) mdl.summary()
Model Training
We can now create a data generator to convert the bounding box coordinates into the regression targets that are normalized with the anchor sizes for easier model convergence.
# relative translations def generate_translation_label(anchor_box_shape, box): if not box: return [0, 0] h, w = anchor_box_shape x1, y1, x2, y2 = box center_x = (x1 + x2)/2 center_y = (y1 + y2)/2 dx = (w/2 - center_x)/w dy = (h/2 - center_y)/h return dx, dy # relative scaling factors def generate_scaling_label(anchor_box_shape, box): if not box: return [0, 0] h, w = anchor_box_shape x1, y1, x2, y2 = box box_h = y2 - y1 box_w = x2 - x1 rx = np.log(box_h/h) ry = np.log(box_w/w) return rx, ry
Then we can use the above two functions for generating labels and use the generator for training the model.
anchor_box_shape = (128, 128) def sample_generator(diagrams, size, batch_size): while True: images, cls_labels, reg_labels = [], [], [] for _ in range(batch_size): d = random.choice(diagrams) img = d.toImage(cache=True) box = d.bounding_box has_symbol = bool(box) dx, dy = generate_translation_label(anchor_box_shape, box) rx, ry = generate_scaling_label(anchor_box_shape, box) images.append(img) cls_labels.append(has_symbol) reg_labels.append((dx, dy, rx, ry)) images = np.array(images) cls_labels = np.array(cls_labels) reg_labels = np.array(reg_labels) yield (images, cls_labels, reg_labels), None
Using the generator defined above, we can train our model using the labeled diagrams. We use the last 100 diagrams for validation.
all_diagrams = c3.PNGDiagram.fetch({'filter': 'exists(bounding_box)'}).objs train_diagrams = all_diagrams[:-100] valid_diagrams = all_diagrams[-100:] train_g = sample_generator(train_diagrams, size, 32) valid_g = sample_generator(valid_diagrams, size, 64) valid_data = next(valid_g) # you can also directly load the model here from the h5 # from tensorflow.keras.models import load_model # mdl = load_model('rpn.h5') mdl.fit(train_g, epochs=64, steps_per_epoch=32, validation_data=valid_data, verbose=0)
Model Inference
To show that the model that we just trained works as expected, we will test it using a diagram from a holdout test set that does not have its bounding box or text attributes populated. Then we will use our trained model to generate the bounding boxes and the texts within the symbol. As shown in the visualization below, both the bounding box and the text field are empty in the beginning.
unlabeled = c3.PNGDiagram.fetch({'filter': '!exists(bounding_box)', 'limit': 5}).objs unlabeled[0].show()
Symbol Detection
# run the model imgs = np.array([d.toImage() for d in unlabeled]) mdl_input = [imgs, np.empty(len(imgs)), np.empty((len(imgs),4))] cls_outputs, reg_outputs = mdl.predict(mdl_input)
Since we used relative translations and relative scaling factors as the regression target of our model, we need to transform the model output to recover the coordinates of the bounding box.
def decode_result(size, dx, dy, rx, ry): h, w = size, size center_x = size/2 center_y = size/2 center_x -= dx * w center_y -= dy * h box_w = np.exp(rx) * w box_h = np.exp(ry) * h xmin = int(center_x - box_w/2) xmax = int(center_x + box_w/2) ymin = int(center_y - box_h/2) ymax = int(center_y + box_h/2) return [xmin, ymin, xmax, ymax]
# populate the bounding_box attribute of the diagram for diagram, img, reg_output in zip(unlabeled, imgs, reg_outputs): box = decode_result(len(img), *reg_output) diagram.bounding_box = box c3.PNGDiagram.upsertBatch(unlabeled);
As we can see, our model generates a bounding box that accurately captures the target symbol and achieves the desired outcome.
unlabeled[0].show()
Text Recognition OCR
Now that we demonstrated building an object detection model to tell where the target symbol is,. we will next demonstrate how to use a pre-trained OCR pipe that is readily available in the platform. Using
c3.OcrPipe
we will extract the text inside the bounding box, and populate the text attribute of the symbol.diagram = unlabeled[0] ocr_pipe = c3.OcrPipe() labeled_diagram = ocr_pipe.process(diagram) labeled_diagram.upsert().id
As shown in the text field below, the OCR pipe correctly recognizes the id of the target symbol.
Building a Production-Ready Pipeline with the platform
Building a production-ready pipeline using the symbol detection and OCR models is very simple. C3 AI Application Platform provides many out-of-the-box Types to convert TensorFlow, Keras, or PyTorch models created in Python to instances of
MLPipe
.Keras Pipe
As the first step, we encapsulate our trained Keras model as an instance of a
c3.KerasPipe
. In one line, the native python model is converted to an instance of a C3 AI Type and persisted to the platform.# you can directly save a trained model keras_pipe = c3.KerasPipe().upsertNativeModel(mdl)
Using a
KerasPipe
, the trained model, along with its hyperparameters can easily be persisted. This simplifies keeping track of the details of all of your modeling iterations. Similar to any other C3 type, we can fetch these pipes, or update or remove them with convenient APIs.keras_pipe.get('id, meta.created, typeVersion')
print('Part of the Keras Model Parameters:\n', keras_pipe.technique.modelDef[:500], '...')
C3 AI Application Platform also provides utility types and functionalities to simplify the development of a specific application. To simplify the development of a diagram parsing model, here we use the
SymbolDetectionPipe
from the C3 AI Reliability Application. This type provides utility functions to apply the logic for decoding the outputs from a symbol detection model and populate the bounding box attribute of an input diagram.We will use our
KerasPipe
as the core model for aSymbolDetectionPipe
.This pipe can be used for populating the
bounding_box
field for the diagram.syb_pipe.get('id, meta.created')
unlabeled = c3.PNGDiagram.fetch({'filter': '!exists(bounding_box)', 'limit': 3}).objs to_parse = unlabeled[0] # the target diagram is empty in the beginning to_parse = unlabeled[0].get() to_parse.show()
With the symbol detection, the diagram now has the bounding box of the target symbol
With the text recognition pipe, the diagram also has the text attribute populated
And we can sync the current state of the diagram and save everything into the database.
Creating a Multi-Step Machine Learning Pipeline
Finally, we can very easily build an end-to-end symbol detection and text recognition pipeline that can process our diagrams and populate their bounding box and text fields.
step_1 = c3.MLStep( name="SymbolDetection", pipe=syb_pipe ) step_2 = c3.MLStep( name="TextRecognition", pipe=ocr_pipe ) pipeline = c3.MLSerialPipeline(steps=[step_1, step_2]) pipeline.id = pipeline.upsert().id
pipeline.get('id, meta.created, steps.name')
to_parse = unlabeled[2] pipeline.process(to_parse).show()
Now that our end-to-end pipeline is persisted, it can be used in an application to process new diagrams and save hundreds of hours of manual work for domain experts!!!
About The Authors