This chapter explains how to use semantic segmentation based on deep learning, both for the training and inference phases.
With semantic segmentation we assign each pixel of the input image to a class using a deep learning (DL) network.
The result of semantic segmentation is an output image, in which the pixel value signifies the assigned class of the corresponding pixel in the input image. Thus, in HALCON the output image is of the same size as the input image. For general DL networks the deeper feature maps, representing more complex features, are usually smaller than the input image (see the section “The Network and the Training Process” in Deep Learning). To obtain an output of the same size as the input, HALCON uses segmentation networks with two components: an encoder and a decoder. The encoder determines features of the input image as done, e.g., for deep-learning-based classification. As this information is 'encoded' in a compressed format, the decoder is needed to reconstruct the information to the desired outcome, which, in this case, is the assignment of each pixel to a class. Note that, as pixels are classified, overlapping instances of the same class are not distinguished as distinct.
Edge extraction is a special case of semantic segmentation, where the
model is trained to distinguish two classes: 'edge' and 'background'.
For more information, see “Solution Guide I - Basics”
.
Semantic segmentation with deep learning is implemented within the more
general deep learning model of HALCON.
For more information to the latter one, see the chapter
Deep Learning / Model.
For the specific system requirements in order to apply deep learning,
please refer to the HALCON “Installation Guide”
.
The following sections are introductions to the general workflow needed for semantic segmentation, information related to the involved data and parameters, and explanations to the evaluation measures.
In this paragraph, we describe the general workflow for a semantic
segmentation task based on deep learning.
It is subdivided into the four parts
preprocessing of the data, training of the model,
evaluation of the trained model, and inference on new images.
Thereby we assume, your dataset is already labeled, see also the section
“Data” below.
Have a look at the HDevelop example series
segment_pill_defects_deep_learning
for an application.
The example segment_edges_deep_learning_with_retraining
shows the
complete workflow for an edge extraction application.
This part is about how to preprocess your data.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_1_preprocess.hdev
.
The information what is to be found in which image of your training dataset needs to be transferred. This is done by the procedure
read_dl_dataset_segmentation
.
Thereby a dictionary DLDataset
is created, which serves as
a database and stores all necessary information about your data.
For more information about the data and the way it is transferred,
see the section “Data” below and the chapter
Deep Learning / Model.
Split the dataset represented by the dictionary
DLDataset
. This can be done using the procedure
split_dl_dataset
.
The resulting split will be saved over the key split
in
each sample entry of DLDataset
.
Now you can preprocess your dataset. For this, you can use the procedure
preprocess_dl_dataset
.
This procedure also offers guidance on how to implement a customized preprocessing procedure.
To use this procedure,
specify the preprocessing parameters as e.g., the image size.
For this latter one you should select the smallest possible image
size at which the regions to segment are still well recognizable.
Store all the parameter with their values in a dictionary
DLPreprocessParam
, wherefore you can use the procedure
create_dl_preprocess_param
.
We recommend to save this dictionary DLPreprocessParam
in order to have access to the preprocessing parameter values
later during the inference phase.
During the preprocessing of your dataset also the images
weight_image
will be generated for the training dataset by
preprocess_dl_dataset
.
They assign each class the weight ('class weights') its
pixels get during training (see the section “Model Parameters and
Hyperparameters” below).
This part is about how to train a DL semantic segmentation model.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_2_train.hdev
.
A network has to be read using the operator
The model parameters need to be set via the operator
Such parameters are e.g., 'image_dimensions'
and
'class_ids'
,
see the documentation of
.
get_dl_model_param
You can always retrieve the current parameter values using the operator
Set the training parameters and store them
in the dictionary TrainParam
.
These parameters include:
the hyperparameters, for an overview see the section “Model Parameters and Hyperparameters” below and the chapter Deep Learning.
parameters for possible data augmentation (optional).
parameters for the evaluation during training.
parameters for the visualization of training results.
parameters for serialization.
This can be done using the procedure
create_dl_train_param
.
Train the model. This can be done using the procedure
train_dl_model
.
The procedure expects:
the model handle DLModelHandle
the dictionary with the data information DLDataset
the dictionary with the training parameter
TrainParam
the information, over how many epochs the training shall run.
In case the procedure train_dl_model
is used, the total loss
as well as optional evaluation measures are visualized.
In this part we evaluate the semantic segmentation model.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_3_evaluate.hdev
.
Set the model parameters which may influence the evaluation, as
e.g., 'batch_size'
, using the operator
The evaluation can conveniently be done using the procedure
evaluate_dl_model
.
The dictionary EvaluationResult
holds the asked
evaluation measures.
You can visualize your evaluation results using the procedure
dev_display_segmentation_evaluation
.
This part covers the application of a DL semantic segmentation model.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_4_infer.hdev
.
Set the parameters as e.g., 'batch_size'
using the operator
Generate a data dictionary DLSample
for each image.
This can be done using the procedure
gen_dl_samples_from_images
.
Preprocess the image as done for the training. We recommend to do this using the procedure
preprocess_dl_samples
.
When you saved the dictionary DLPreprocessParam
during
the preprocessing step, you can directly use it as input to specify
all parameter values.
Apply the model using the operator
Retrieve the results from the dictionary
'DLResultBatch'
.
The regions of the particular classes can be selected using e.g.,
the operator
on the segmentation image.
threshold
We distinguish between data used for training and evaluation, and data
for inference.
The latter ones consist of bare images.
The first ones consist of images with their information and ground truth
annotations. You provide this information defining for each pixel,
to which class it belongs (over the segmentation_image
, see
below for further explanations).
As basic concept, the model handles data over dictionaries, meaning it
receives the input data over a dictionary DLSample
and
returns a dictionary DLResult
and DLTrainResult
,
respectively. More information on the
data handling can be found in the chapter Deep Learning / Model.
The training data is used to train a network for your specific task.
The dataset consists of images and corresponding information.
They have to be provided in a way the model can process them.
Concerning the image requirements, find more information in the
section “Images” below.
The information about the images and their ground truth annotations
is provided over the dictionary DLDataset
and for every
sample the respective segmentation_image
, defining the class
for every pixel.
The different classes are the sets or categories differentiated by
the network.
They are set in the dictionary DLDataset
and are passed
to the model via the operator
.
set_dl_model_param
In semantic segmentation, we call your attention to two special cases: the class 'background' and classes declared as 'ignore':
'background' class:
The networks treats the background class like any other
class. It is also not necessary to have a background class.
But if you have different classes in your dataset you are
not interested in although they have to be learned by the network,
you can set them all as 'background'.
As a result, the class background will be more diverse.
See the procedure preprocess_dl_samples
for more
information.
'ignore' classes:
There is the possibility to declare one or multiple classes as
'ignore'. Pixels assigned to a 'ignore' class are ignored by the
loss as well as for all measures and evaluations.
Please see the section “The Network and the Training Process” in
the chapter Deep Learning for more information about the
loss. The network does not classify any pixel into a class declared
as 'ignore'. Also, the pixels labeled to belong to such a class
will be classified by the network like every other pixel into a
non-'ignore' class.
In the example given in the image below, this means the network will
classify also the pixels of the class 'border', but it will not
classify any pixel into the class 'border'.
You can declare a class as 'ignore' using the parameter
'ignore_class_ids'
of
.
set_dl_model_param
In edge extraction only two classes are distinguished: 'edge' and 'background'. The class 'edge' is labeled just like a normal class. Thus, only one class is labeled and this class is called 'edge'.
DLDataset
This dictionary serves as a database, this means, it stores all
information about your data necessary for the network as, e.g., the
names and paths to the images, the classes, ...
Please see the documentation of Deep Learning / Model for the
general concept and key entries.
Keys only applicable for semantic segmentation concern the
segmentation_image
(see the entry below).
Over the keys segmentation_dir
and
segmentation_file_name
you provide the information how they
are named and where they are saved.
segmentation_image
In order that the network can learn, how the member of different
classes look like, you tell for each pixel of every image in the
training dataset to which
class it belongs.
This is done by storing for every pixel of the input image the class
encoded as pixel value in the corresponding
segmentation_image
.
These annotations are the ground truth annotations.
( 1) | ( 2) |
You need enough training data to split it into three subsets, one used
for training, one for validation and one for testing the network. These
subsets are preferably independent and identically distributed
(see the section “Data” in the chapter Deep Learning.
For the splitting you can use the procedure split_dl_data_set
.
Regardless of the application, the network poses requirements on the
images regarding the image dimensions, the gray value range, and the
type.
The specific values depend on the network itself, see the documentation
of
for the specific values of different networks.
For a loaded network they can be queried with
read_dl_model
.
In order to fulfill these requirements, you may have to preprocess your
images.
Standard preprocessing of an entire sample and therewith also the
image is implemented in get_dl_model_param
preprocess_dl_samples
.
This procedure also offers guidance on how to implement a customized
preprocessing procedure.
The network output depends on the task:
As output, the operator will return a dictionary
with the current value of the total loss as
well as values for all other losses included in your model.
DLTrainResult
As output, the network will return a dictionary
for every sample.
For semantic segmentation, this dictionary will include for each input
image the handles of the two following images:
DLResult
segmentation_image
: An image where each pixel has a
value corresponding to the class its corresponding pixel has been
assigned to (see the illustration below).
segmentation_confidence
: An image, where each pixel has
the confidence value out of the classification of the according pixel
in the input image (see the illustration below).
( 1) | ( 2) |
Next to the general DL hyperparameters explained in Deep Learning, there is a further hyperparameter relevant for semantic segmentation:
'class weights', see the explanations below.
For a semantic segmentation model, the model parameters as well as the
hyperparameters (with the exception of 'class weights') are set using
.
The model parameters are explained in more detail in
set_dl_model_param
.
get_dl_model_param
Note, due to large memory usage, typically only small batch sizes are
possible for training. As a consequence, training is rather slow and we
advice to use a momentum higher than e.g., for classification.
The HDevelop example
segment_pill_defects_deep_learning_2_train.hdev
provides
good initial parameter values for the training of a segmentation network
in HALCON.
With the hyperparameter 'class weights' you can assign each class the weight its pixels get during training. Giving the unique classes a different weight, it is possible to force the network to learn the classes with different importance. This is useful in cases where a class dominates the images, as e.g., defect detection, where the defects take up only a small fraction within an image. In such a case a network classifying every pixel as background (thus, 'not defect') would achieve generally good loss results. Assigning different weights to the distinct classes helps to re-balance the distribution. In short, you can focus the loss to train especially on those pixels you determine to be important.
The network obtains these weights over weight_image
, an image
which is created for every training sample.
In weight_image
, every pixel value corresponds to
the weight the corresponding pixel of the input image gets during
training.
You can create these images with the help of the following two
procedures:
calculate_dl_segmentation_class_weights
helps you to
create the class weights.
The procedure uses the concept of inverse class frequency weights.
gen_dl_segmentation_weight_images
uses the class
weights and generates the weight_image
.
This step has to be done before the training. Usually it is done during
the preprocessing and it is part of the procedure
preprocess_dl_dataset
.
Note, this hyperparameter is referred as class_weights
or
within procedures.
An illustration, how such an image with different weights looks like,
is shown in the figure below.
ClassWeights
Note, giving a specific part of the image the weight 0.0, these pixels do not contribute to the loss (see the section “The network and its training” in Deep Learning for more information about the loss).
( 1) | ( 2) |
For semantic segmentation, the following evaluation measures are supported
in HALCON.
Note that for computing such a measure for an image, the related ground
truth information is needed.
All the measure values explained below for a single image
(e.g., mean_iou
) can also be calculated for an arbitrary number
of images.
For this, imagine a single, large image formed by the ensemble of the
output images, for which the measure is computed.
Note, all pixels of a class declared as 'ignore' are ignored for the
computation of the measures.
pixel_accuracy
The pixel accuracy is simply the ratio of all pixels that have been predicted with the correct class-label to the total number of pixels.
( 1) | ( 2) | ( 3) |
class_pixel_accuracy
The per-class pixel accuracy considers only pixels of a single class. It is defined as the ratio between the correctly predicted pixels and the total number of pixels labeled with this class.
In case a class does not occur it gets a class_pixel_accuracy
value of -1 and does not contribute to the average value,
mean_accuracy
.
mean_accuracy
The mean accuracy is defined as the averaged
per-class pixel accuracy, class_pixel_accuracy
, of all
occurring classes.
class_iou
The per-class intersection over union (IoU) gives for a specific class the ratio of correctly predicted pixels to the union of annotated and predicted pixels. Visually this is the ratio between the intersection and the union of the areas, see the image below.
In case a class does not occur it gets a class_iou
value of
-1 and does not contribute to the mean_iou
.
( 1) | ( 2) | ( 3) |
mean_iou
The mean IoU is defined as the averaged
per-class intersection over union, class_iou
, of all
occurring classes.
Note that every occurring class has the same impact on this measure,
independent of the number of pixels they contain.
frequency_weighted_iou
As for the mean IoU, the per-class IoU is calculated first. But the contribution of each occurring class to this measure is weighted by the ratio of pixels that belong to that class. Note that classes with many pixels can dominate this measure.
pixel_confusion_matrix
The concept of a confusion matrix is explained in the section “Supervising the training” within the chapter Deep Learning. It applies for semantic segmentation, where the instances are single pixels.