This chapter explains how to use semantic segmentation based on deep learning, both for the training and inference phases.
With semantic segmentation we assign each pixel of the input image to a class using a deep learning (DL) network.
The result of semantic segmentation is an output image, in which the pixel value signifies the assigned class of the corresponding pixel in the input image. Thus, in HALCON the output image is of the same size as the input image. For general DL networks the deeper feature maps, representing more complex features, are usually smaller than the input image (see the section “The Network and the Training Process” in Deep Learning). To obtain an output of the same size as the input, HALCON uses segmentation networks with two components: an encoder and a decoder. The encoder determines features of the input image as done, e.g., for deep-learning-based classification. As this information is 'encoded' in a compressed format, the decoder is needed to reconstruct the information to the desired outcome, which, in this case, is the assignment of each pixel to a class. Note that, as pixels are classified, overlapping instances of the same class are not distinguished as distinct.
Semantic segmentation with deep learning is implemented within the more general deep learning model of HALCON. For more information to the latter one, see the chapter Deep Learning / Model.
The following sections are introductions to the general workflow needed for semantic segmentation, information related to the involved data and parameters, and explanations to the evaluation measures.
In this paragraph, we describe the general workflow for a semantic
segmentation task based on deep learning.
Thereby we assume, your dataset is already labeled, see also the section
“Data” below.
Have a look at the HDevelop example series
segment_pill_defects_deep_learning
for an application.
Note, this example is split into the four parts
'Preprocess', 'Training', 'Evaluation', and 'Inference', which
give guidance on possible implementations.
This part is about how to preprocess your data.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_1_preprocess.hdev
.
The information what is to be found in which image of your training dataset needs to be transferred. This is done by the procedure
read_dl_dataset_segmentation
.
Thereby a dictionary
is created, which serves as
a database and stores all necessary information about your data.
For more information about the data and the way it is transferred,
see the section “Data” below and the chapter
Deep Learning / Model.
DLDataset
Split the dataset represented by the dictionary
. This can be done using the procedure
DLDataset
split_dl_dataset
.
The resulting split will be saved over the key
in
each sample entry of split
.
DLDataset
Now you can preprocess your dataset. For this, you can use the procedure
preprocess_dl_dataset
.
In case of custom preprocessing, this procedure offers guidance on the implementation.
To use this procedure,
specify the preprocessing parameters as e.g., the image size.
For this latter one you should select the smallest possible image
size at which the regions to segment are still well recognizable.
Store all the parameter with their values in a dictionary
, wherefore you can use the procedure
DLPreprocessParam
create_dl_preprocess_param
.
We recommend to save this dictionary
in order to have access to the preprocessing parameter values
later during the inference phase.
DLPreprocessParam
During the preprocessing of your dataset also the images
will be generated for the training dataset by
weight_image
preprocess_dl_dataset
.
They assign each class the weight ('class weights') its
pixels get during training (see the section “Model Parameters and
Hyperparameters” below).
This part is about how to train a DL semantic segmentation model.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_2_train.hdev
.
A network has to be read using the operator
The model parameters need to be set via the operator
Such parameters are e.g.,
and
image_dimensions
,
see the documentation of class_ids
.
get_dl_model_param
You can always retrieve the current parameter values using the operator
Set the training parameters and store them
in the dictionary 'TrainingParam'
.
These parameters include:
the hyperparameters, for an overview see the section “Model Parameters and Hyperparameters” below and the chapter Deep Learning.
parameters for possible data augmentation (optional).
parameters for the evaluation during training.
parameters for the visualization of training results.
parameters for serialization.
This can be done using the procedure
create_dl_train_param
.
Train the model. This can be done using the procedure
train_dl_model
.
The procedure expects:
the model handle DLSegmentationHandle
the dictionary with the data information DLDataset
the dictionary with the training parameter
'TrainParam'
the information, over how many epochs the training shall run.
In case the procedure train_dl_model
is used, the total loss
as well as optional evaluation measures are visualized.
In this part we evaluate the semantic segmentation model.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_3_evaluate.hdev
.
Set the model parameters which may influence the evaluation, as
e.g., 'batch_size'
, using the operator
The evaluation can conveniently be done using the procedure
evaluate_dl_model
.
The dictionary
holds the asked
evaluation measures.
You can visualize your evaluation results using the procedure
EvaluationResults
dev_display_segmentation_evaluation
.
This part covers the application of a DL semantic segmentation model.
The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_4_infer.hdev
.
Set the parameters as e.g., 'batch_size'
using the operator
Generate a data dictionary
for each image.
This can be done using the procedure
DLSample
gen_dl_samples_from_images
.
Preprocess the image as done for the training. We recommend to do this using the procedure
preprocess_dl_samples
.
When you saved the dictionary
during
the preprocessing step, you can directly use it as input to specify
all parameter values.
DLPreprocessParam
Apply the model using the operator
Retrieve the results from the dictionary
'DLResultBatch'
.
The regions of the particular classes can be selected using e.g.,
the operator
on the segmentation image.
threshold
We distinguish between data used for training and evaluation, and data
for inference.
The latter ones consist of bare images.
The first ones consist of images with their information and ground truth
annotations. You provide this information defining for each pixel,
to which class it belongs (over the
, see
below for further explanations).
segmentation_image
As basic concept, the model handles data over dictionaries, meaning it
receives the input data over a dictionary
and
returns a dictionary DLSample
and DLResult
,
respectively. More information on the
data handling can be found in the chapter Deep Learning / Model.
DLTrainResult
The training data is used to train a network for your specific task.
The dataset consists of images and corresponding information.
They have to be provided in a way the model can process them.
Concerning the image requirements, find more information in the
section “Images” below.
The information about the images and their ground truth annotations
is provided over the dictionary
and for every
sample the respective DLDataset
, defining the class
for every pixel.
segmentation_image
The different classes are the sets or categories differentiated by
the network.
They are set in the dictionary
and are passed
to the model via the operator DLDataset
.
set_dl_model_param
In semantic segmentation, we call your attention to two special cases: the class 'background' and classes declared as 'ignore':
'background' class:
The networks treats the background class like any other
class. It is also not necessary to have a background class.
But if you have different classes in your dataset you are
not interested in although they have to be learned by the network,
you can set them all as 'background'.
As a result, the class background will be more diverse.
See the procedure preprocess_dl_samples
for more
information.
'ignore' classes:
There is the possibility to declare one or multiple classes as
'ignore'. Pixels assigned to a 'ignore' class are ignored by the
loss as well as for all measures and evaluations.
Please see the section “The Network and the Training Process” in
the chapter Deep Learning for more information about the
loss. The network does not classify any pixel into a class declared
as 'ignore'. Also, the pixels labeled to belong to such a class
will be classified by the network like every other pixel into a
non-'ignore' class.
In the example given in the image below, this means the network will
classify also the pixels of the class 'border', but it will not
classify any pixel into the class 'border'.
You can declare a class as 'ignore' using the parameter
'ignore_class_ids'
of
.
set_dl_model_param
DLDataset
This dictionary serves as a database, this means, it stores all
information about your data necessary for the network as, e.g., the
names and paths to the images, the classes, ...
Please see the documentation of Deep Learning / Model for the
general concept and key entries.
Keys only applicable for semantic segmentation concern the
(see the entry below).
Over the keys segmentation_image
and
segmentation_dir
you provide the information how they
are named and where they are saved.
segmentation_file_name
segmentation_image
In order that the network can learn, how the member of different
classes look like, you tell for each pixel of every image in the
training dataset to which
class it belongs.
This is done by storing for every pixel of the input image the class
encoded as pixel value in the corresponding
.
These annotations are the ground truth annotations.
segmentation_image
(1) | (2) |
You need enough training data to split it into three subsets, one used
for training, one for validation and one for testing the network. These
subsets are preferably independent and identically distributed
(see the section “Data” in the chapter Deep Learning.
For the splitting you can use the procedure split_dl_data_set
.
Regardless of the application, the network poses requirements on the
images regarding the image dimensions, the gray value range, and the
type.
The specific values depend on the network itself, see the documentation
of
for the specific values of different networks.
For a loaded network they can be queried with
read_dl_model
.
In order to fulfill these requirements, you may have to preprocess your
images.
Standard preprocessing of an entire sample and therewith also the
image is implemented in get_dl_model_param
preprocess_dl_samples
.
In case of custom preprocessing this procedure offers guidance on the
implementation.
As training output, the operator will return a dictionary
with the current value of the total loss as
well as values for all other losses included in your model.
DLTrainResult
As inference and evaluation output, the network will return a dictionary
for every sample.
For semantic segmentation, this dictionary will include for each input
image the handles of the two following images:
DLResult
: An image where each pixel has a
value corresponding to the class its corresponding pixel has been
assigned to (see the illustration below).
segmentation_image
: An image, where each pixel has
the confidence value out of the classification of the according pixel
in the input image (see the illustration below).
segmentation_confidence
(1) | (2) |
Next to the general DL hyperparameters explained in Deep Learning, there is a further hyperparameter relevant for semantic segmentation:
'class weights', see the explanations below.
For a semantic segmentation model, the model parameters as well as the
hyperparameters (with the exception of 'class weights') are set using
.
The model parameters are explained in more detail in
set_dl_model_param
.
get_dl_model_param
Note, due to large memory usage, typically only small batch sizes are
possible for training. As a consequence, training is rather slow and we
advice to use a momentum higher than e.g., for classification.
The HDevelop example
segment_pill_defects_deep_learning_2_train.hdev
provides
good initial parameter values for the training of a segmentation network
in HALCON.
With the hyperparameter 'class weights' you can assign each class the weight its pixels get during training. Giving the unique classes a different weight, it is possible to force the network to learn the classes with different importance. This is useful in cases where a class dominates the images, as e.g., defect detection, where the defects take up only a small fraction within an image. In such a case a network classifying every pixel as background (thus, 'not defect') would achieve generally good loss results. Assigning different weights to the distinct classes helps to re-balance the distribution. In short, you can focus the loss to train especially on those pixels you determine to be important.
The network obtains these weights over
, an image
which is created for every training sample.
In weight_image
, every pixel value corresponds to
the weight the corresponding pixel of the input image gets during
training.
You can create these images with the help of the following two
procedures:
weight_image
calculate_class_weights_segmentation
helps you to
create the class weights.
The procedure uses the concept of inverse class frequency weights.
gen_dl_segmentation_weight_images
uses the class
weights and generates the
.
weight_image
This step has to be done before the training. Usually it is done during
the preprocessing and it is part of the procedure
preprocess_dl_dataset
.
Note, this hyperparameter is referred as
or
class_weights
within procedures.
An illustration, how such an image with different weights looks like,
is shown in the figure below.
ClassWeights
Note, giving a specific part of the image the weight 0.0, these pixels do not contribute to the loss (see the section “The network and its training” in Deep Learning for more information about the loss).
(1) | (2) |
For semantic segmentation, the following evaluation measures are supported
in HALCON.
Note that for computing such a measure for an image, the related ground
truth information is needed.
All the measure values explained below for a single image
(e.g.,
) can also be calculated for an arbitrary number
of images.
For this, imagine a single, large image formed by the ensemble of the
output images, for which the measure is computed.
Note, all pixels of a class declared as 'ignore' are ignored for the
computation of the measures.
mean_iou
pixel_accuracy
(1) | (2) | (3) |
class_pixel_accuracy
The per-class pixel accuracy considers only pixels of a single class. It is defined as the ratio between the correctly predicted pixels and the total number of pixels labeled with this class.
In case a class does not occur it gets a
value of -1 and does not contribute to the average value,
class_pixel_accuracy
.
mean_accuracy
mean_accuracy
The mean accuracy is defined as the averaged
per-class pixel accuracy,
, of all
occuring classes.
class_pixel_accuracy
class_iou
The per-class intersection over union (IoU) gives for a specific class the ratio of correctly predicted pixels to the union of annotated and predicted pixels. Visually this is the ratio between the intersection and the union of the areas, see the image below.
In case a class does not occur it gets a
value of
-1 and does not contribute to the class_iou
.
mean_iou
(1) | (2) | (3) |
mean_iou
The mean IoU is defined as the averaged
per-class intersection over union,
, of all
occuring classes.
Note that every occuring class has the same impact on this measure,
independent of the number of pixels they contain.
class_iou
frequency_weighted_iou
As for the mean IoU, the per-class IoU is calculated first. But the contribution of each occuring class to this measure is weighted by the ratio of pixels that belong to that class. Note that classes with many pixels can dominate this measure.
pixel_confusion_matrix
The concept of a confusion matrix is explained in the section “Supervising the training” within the chapter Deep Learning. It applies for semantic segmentation, where the instances are single pixels.