This chapter explains how to use deep-learning-based optical character recognition (Deep OCR).
With Deep OCR we want to detect and/or recognize text in an image. Deep OCR detects and recognizes connected characters, which will be referred to as 'words' (in contrast to OCR methods which are used to read single characters).
A Deep OCR model can contain two components, which are dedicated to two distinct tasks, the detection, thus the localization of words, and the recognition of words. By default a model with both components is created, but the model can also be limited to one of the two tasks solely.
HALCON already provides pretrained components, which are suited for a multitude of applications without additional training as the model is trained on a varied dataset and can therefore cope with many different fonts. Information on the available character set and model parameters can be retrieved usingget_deep_ocr_param
.
To further adjust the reading to a specific task it is possible to retrain
the recognition component on a given application domain using deep
learning operators.
This paragraph describes the workflow how to localize and read words using
a Deep OCR model. An application scenario can be seen in the HDevelop
example deep_ocr_workflow.hdev
.
Create a Deep OCR model containing either one or both of the two model components
detection_model
and
recognition_model
using the operator
.
create_deep_ocr
To use the retrained model component instead of the provided one
adjust the created model by setting the retrained model component as
'recognition_model'
using
Model parameters regarding, e.g., the used devices, image dimensions,
or minimum scores can be set using
.
set_deep_ocr_param
The Deep OCR model is applied on your acquired images using
. The inference results depend on the used
model components. See the operator reference of apply_deep_ocr
for details regarding which dictionary entries are computed for
each model composite.
apply_deep_ocr
The inference results can be retrieved from the dictionary
.
Some procedures are provided in order to visualize results and score maps:
DeepOCRResult
Show location and/or recognized word using
dev_display_deep_ocr_results
.
Show location (and, if inferred, recognized word) on preprocessed
image using dev_display_deep_ocr_results_preprocessed
(if the model contains detection_model
).
Show score maps using dev_display_deep_ocr_score_maps
(if the model contains detection_model
).
This paragraph describes the retraining and evaluation of the recognition
component of a Deep OCR model using custom data. See also the
HDevelop example deep_ocr_recognition_training_workflow.hdev
for an application scenario.
This part is about how to preprocess your data.
The information what is to be read in which image of your training dataset needs to be transferred. This is done by the procedure
read_dl_dataset_ocr_recognition
.
It creates a dictionary DLDataset
which serves as
a database and stores all necessary information about your data.
For more information about the data and the way it is transferred, see
the section “Data” below and the chapter
Deep Learning / Model.
Split the dataset represented by the dictionary
DLDataset
. This can be done using the procedure
split_dl_dataset
.
The network imposes several requirements on the images. These requirements (for example the image size and gray value range) can be retrieved with
For this you need to read the model first by using
Now you can preprocess your dataset. For this, you can use the procedure
preprocess_dl_dataset
.
To use this procedure, specify the preprocessing parameters as, e.g.,
the image size.
Store all the parameter with their values in a dictionary
DLPreprocessParam
, for which you can use the procedure
create_dl_preprocess_param_from_model
.
We recommend to save this dictionary DLPreprocessParam
in
order to have access to the preprocessing parameter values later
during the inference phase.
This part explains how to train the recognition component of a Deep OCR model.
Set the training parameters and store them in the dictionary
TrainParam
.
This can be done using the procedure
create_dl_train_param
.
Train the model. This can be done using the procedure
train_dl_model
.
The procedure expects:
the model handle DLModelHandle
the dictionary DLDataset
containing the data
information
the dictionary TrainParam
containing the training
parameters
In this part, we evaluate the Deep OCR model.
Set the model parameters which may influence the evaluation.
The evaluation can be done conveniently using the procedure
evaluate_dl_model
.
This procedure expects a dictionary GenParamEval
with the
evaluation parameters.
The dictionary EvaluationResult
holds the accuracy
measures. To get a clue on how the retrained model performed
against the pretrained model you can compare their accuracy values.
After a successful evaluation the retrained model can be used for inference (see section “General Workflow for Deep OCR Inference” above).
This section gives information on the data that needs to be provided in different stages of the Deep OCR workflow.
We distinguish between data used for training and evaluation, consisting of images with their information about the instances, and data for inference, which are bare images. How the data needs to be provided is explained in the according sections below.
As a basic concept, the model handles data over dictionaries, meaning it
receives the input data over a dictionary
and
returns a dictionary DLSample
and DLResult
,
respectively. More information on the
data handling can be found in the chapter Deep Learning / Model.
DLTrainResult
The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below.
The training data is used to train and evaluate a network for your specific recognition scenario. With the aid of this data the network can learn to read text samples that resemble text that occurs during inference. The necessary information is given by providing the depicted word for each image.
How the data has to be formatted in HALCON for a DL model is explained
in the chapter Deep Learning / Model.
In short, a dictionary
serves as a database for
the information needed by the training and evaluation procedures.
DLDataset
The data for
can be provided in two different ways.
In both cases the dataset can be read using
DLDataset
read_dl_dataset_ocr_recognition
and will be converted as
required.
In this case, images with words that are labeled with rotated bounding boxes need to be provided. You can label your data using the MVTec Deep Learning Tool, available from the MVTec website. The dataset must be built as follows:
'class_ids'
: class IDs
'class_names'
: class names
(Needs to contain the class 'word'. All other classes are ignored.)
'image_dir'
: path to the image directory
'samples'
: tuple of dictionaries, one for each sample
'image_file_name'
: name of the image file
'image_id'
: image ID
'bbox_col'
: bounding box column coordinate
'bbox_row'
: bounding box row coordinate
'bbox_phi'
: bounding box angle
'bbox_length1'
: first half edge length of the
bounding box
'bbox_length2'
: second half edge length of the
bounding box
'label_custom_data'
: list of dictionaries containing
custom label data for each bounding box
'text'
word to be read
In this case, only images that are cropped to a single word each are included in the dataset. The dataset must be built as follows:
'image_dir'
: path to the image directory
'samples'
: tuple of dictionaries, one for each sample
'image_file_name'
: name of the image file
'image_id'
: image ID
'word'
: word to be read in the image
The example program deep_ocr_prelabel_dataset.hdev
can provide
assistance by prelabeling your data.
Your training data should cover the full range of characters that
might occur during inference. If a character is not or only very rarely
contained in the training dataset the model might not properly learn to
recognize that character. To keep track of the character distribution
within the dataset the procedure
gen_dl_dataset_ocr_recognition_statistics
is provided, which
generates statistics on how often every single character is contained in
your dataset.
You also want enough training data to split it into three subsets, used for training, validation and testing the network. These subsets are preferably independent and identically distributed, see the section “Data” in the chapter Deep Learning.
The model poses requirements on the images, such as the dimensions,
the gray value range, and the type.
See the documentation of
for the specific values
of the trainable Deep OCR model.
For a read model they can be queried with read_dl_model
.
In order to fulfill these requirements, you may have to preprocess your
images.
Standard preprocessing of an entire sample, including the
image, is implemented in get_dl_model_param
preprocess_dl_samples
.
Requirements for images used for inference are described in
.
apply_deep_ocr
The network output depends on the task:
As output, the operator will return a dictionary
with the current value of the total loss as well as values for all
other losses included in your model.
DLTrainResult
As output, the network will return a dictionary
for every sample.
This dictionary will include the recognized word as well as the
candidates and their confidences for every character of the word.
DLResult
apply_deep_ocr
create_deep_ocr
get_deep_ocr_param
read_deep_ocr
set_deep_ocr_param
write_deep_ocr