This chapter explains the general concept of the deep learning model in HALCON and the data handling.
By concept, a deep learning model in HALCON is a deep neural network. Each deep neural network has an architecture defining its function, i.e., the tasks it can be used for. There can be several possible network architectures for one functionality. Currently, networks for the following functionalities, also referred to as methods or types, are implemented in HALCON as model:
Object detection, see Deep Learning / Object Detection.
Semantic segmentation, see Deep Learning / Semantic Segmentation.
For the implemented methods you can find further information about the specific workflow, data requirements, and validation measures in the corresponding chapters. Information to deep learning (DL) in general are in the chapter Deep Learning.
In this chapter you find the information, which data a DL model needs and returns as well as how this data is transferred.
Independent of the deep learning method used, the
data has to be provided to the model following certain conventions.
As basic concept, the model handles data over dictionaries.
More precisely, the model receives for every input image a dictionary
.
Such a dictionary contains the image and, in case of training and
evaluation, information like e.g., the ground truth annotations.
As output the model returns a dictionary DLSample with the
results. An illustration is given in the figure below.
DLResult
| (1) | (2) |
DLSample includes the image
as well as information about the image and its content.
For visibility purpose BatchSize is set to three and
only few entries are registered.
(2) Inference: DLSample contains only the bare image.
These dictionaries can be passed one at a time or within a tuple.
During training and evaluation, an additional dictionary
serves as a database and collects all the individual
image information dictionaries (stored in the key DLDataset).
Out of this database the model input dictionaries, samples,
are created, see the illustration below and the section
“Training and evaluation input data”.
DLSample
BatchSize is set to three.
In this example we have samples. Thereof a tuple
DLSampleBatch is created containing the DLSample
of the three randomly chosen samples.
Although not necessary for the model itself, the dictionary
is used from the training and evaluation procedures.
Therefore we highly recommend to create a dictionary DLDataset
out of your data.
Its necessary entries are described below.
This dictionary is directly created when labeling your data using the
MVTec Deep Learning Tool.
Alternatively it is created when reading in your data using one of the
following procedures:
DLDatasetread_dl_dataset_from_coco (object detection with
'instance_type' = 'rectangle1') and
read_dl_dataset_segmentation (semantic segmentation).
Please see the respective procedure documentation for the requirements on
the data in order to use these procedures.
In case you create in an other way, it has to
contain at least the entries not marked with a number in the description
below.
During the preprocessing of your dataset the respective procedures include
the further entries of the dictionary DLDataset.
DLDataset
In the following, we explain the different data and the involved dictionaries. Thereby, the following abbreviations mark for which methods (m) the entry applies:
'A': any method
'D': object detection
In case the entry is only applicable for a certain
'instance_type',
the specification 'r1': 'rectangle1',
'r2': 'rectangle2' is added
'S': semantic segmentation
The entries only applicable for certain methods are described more extensively in the corresponding chapter reference.
The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below.
The information about the images and the dataset is
represented in a dictionary ,
which serves as a database.
More precisely, it stores the general information about the dataset and
the dictionaries of the individual samples collected under
DLDataset.
When the actual image data is needed, a dictionary samples
is created (or read if it already exists) for each image required.
The relation of these dictionaries is illustrated in the figure above.
In the following we will explain these dictionaries with their key/value
pairs in more details.
DLSample
DLDatasetDLDataset
stores general information about the dataset and collects the
dictionaries of the individual samples. Thereby iconic data is not
included in DLDataset but the paths to the respective
images.
Depending on the model type, this dictionary can have the following
entries:
| key | description | format | m |
|
common base path to all images | string | A |
[1] |
common base path of all sample files (if present) | string | A |
|
names of all classes that are to be distinguished | tuple of strings | A |
|
IDs of all classes that are to be distinguished (range: 0-65534) | tuple of integers | A |
[1] |
all parameter values used during preprocessing | dictionary | A |
|
collection of sample descriptions | tuple of dictionaries | A |
[1] |
weights of the different classes | tuple of reals | S |
|
common base path of all segmentation images | string | S |
read_dl_dataset_from_coco (object detection with
'instance_type' = 'rectangle1') and
read_dl_dataset_segmentation (semantic segmentation).
The entries marked with [1] are included from the preprocessing
procedures.
samplesDLDataset key samples gets a tuple of
dictionaries as value, one for each sample in the dataset.
These dictionaries contain the information concerning an individual
sample of the dataset.
Depending on the model type, this dictionary can have the following
entries:
| key | description | format | m |
|
file name of the image and its path relative to |
string | A |
|
unique image ID (encoding format: UINT8) | integer | A |
[2] |
specifies the assigned split subset ('train','validation','test') |
string | A |
[3] |
file name of the corresponding dictionary and its path relative to |
string | A |
|
file name of the ground truth segmentation image and its path relative to |
string | S |
|
ground truth labels for the bounding boxes (in the form of ) |
tuple of integers | D |
[4] |
BBoxGT: upper left corner, row coordinate | tuple of reals | D:r1 |
[4] |
BBoxGT: upper left corner, column coordinate | tuple of reals | D:r1 |
[4] |
BBoxGT: lower right corner, row coordinate | tuple of reals | D:r1 |
[4] |
BBoxGT: lower right corner, column coordinate | tuple of reals | D:r1 |
|
optional. It contains for every within this image a dictionary with all raw COCO annotation information |
tuple of dictionaries | D:r1 |
[4] |
BBoxGT: center point, row coordinate | tuple of reals | D:r2 |
[4] |
BBoxGT: center point, column coordinate | tuple of reals | D:r2 |
[4] |
BBoxGT: angle phi | tuple of reals | D:r2 |
[4] |
BBoxGT: half length of edge 1 | tuple of reals | D:r2 |
[4] |
BBoxGT: half length of edge 2 | tuple of reals | D:r2 |
DLDataset and thus they are
created concurrently.
An exception are the entries with a mark in the table,
[2]: the procedure split_dl_dataset adds split,
[3]: the procedure preprocess_dl_samples adds
dlsample_file_name.
[4]: For the parameters of the ground truth bounding boxes (BBoxGT),
pixel centered, subpixel accurate coordinates are used.
DLSample
The dictionary serves as input for the model.
For a batch, they are handed over as the entries of the tuple
DLSample.
DLSampleBatch
A dictionary is created out of
DLSample for every image sample by the procedure
DLDatasetgen_dl_samples.
It contains all ground truth annotations of an image.
If preprocessing is done using the standard procedure
preprocess_dl_samples, they are created automatically
therein.
Note, preprocessing steps may lead to an update of the corresponding
dictionary.
DLSample
DLSample can have the
following entries:
| key | description | format | m |
|
input image | image | A |
|
unique image ID (as in ) |
integer | A |
|
image with the ground truth segmentations, read from |
image | S |
[5] |
image with the pixel weights | image | S |
|
ground truth labels for the image part within the bounding box (in the form of ) |
tuple of integers | D |
[4] |
BBoxGT: upper left corner, row coordinate | tuple of reals | D:r1 |
[4] |
BBoxGT: upper left corner, column coordinate | tuple of reals | D:r1 |
[4] |
BBoxGT: lower right corner, row coordinate | tuple of reals | D:r1 |
[4] |
BBoxGT: lower right corner, column coordinate | tuple of reals | D:r1 |
[4] |
BBoxGT: center point, row coordinate | tuple of reals | D:r2 |
[4] |
BBoxGT: center point, column coordinate | tuple of reals | D:r2 |
[4] |
BBoxGT: angle phi | tuple of reals | D:r2 |
[4] |
BBoxGT: half length of edge 1 | tuple of reals | D:r2 |
[4] |
BBoxGT: half length of edge 2 | tuple of reals | D:r2 |
gen_dl_samples.
An exception is the entry marked in the table above, [5]: created
by the procedure gen_dl_segmentation_weights.
[4]: For the parameters of the ground truth bounding boxes (BBoxGT),
pixel centered, subpixel accurate coordinates are used.
Note, in case these should be stored, use
the procedure DLSamplewrite_dl_samples. You can read them with the
procedure read_dl_samples.
The inference input data consists of bare images. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the subsection “Images” below.
The model is set up to hand over all data through a dictionary
.
For the inference, such a dictionary containing only the image can be
created using the procedure DLDatasetgen_dl_samples_from_images.
These dictionaries can be passed one at a time or within a tuple
.
DLSampleBatch
As output from the operator , the model
will return a dictionary train_dl_classifier_batch.
In this dictionary you will find the current value
of the total loss as the value for the key DLTrainResult as well as the
values for all other losses included in your model.
total_loss
apply_dl_model, the model will return
a dictionary DLResult for each sample.
Depending on the model type, this dictionary can have the following
entries:
| key | description | format | m |
|
image with the segmentation result | image | S |
|
image with the confidences of the segmentation result | image | S |
|
inferred class for the bounding box (in the form of ) |
tuple of integers | D |
|
confidence value of the inference for the bounding box | tuple of reals | D |
[6] |
BBoxInf: upper left corner, row coordinate | tuple of reals | D:r1 |
[6] |
BBoxInf: upper left corner, column coordinate | tuple of reals | D:r1 |
[6] |
BBoxInf: lower right corner, row coordinate | tuple of reals | D:r1 |
[6] |
BBoxInf: lower right corner, row coordinate | tuple of reals | D:r1 |
[6] |
BBoxInf: center point, row coordinate | tuple of reals | D:r2 |
[6] |
BBoxInf: center point, column coordinate | tuple of reals | D:r2 |
[6] |
BBoxInf: angle phi | tuple of reals | D:r2 |
[6] |
BBoxInf: half length of edge 1 | tuple of reals | D:r2 |
[6] |
BBoxInf: half length of edge 2 | tuple of reals | D:r2 |
For a further explanation to the output values we refer to the chapter Deep Learning / Semantic Segmentation and Deep Learning / Object Detection, respectively.
Regardless of the application, the network poses requirements on the
images. The specific values depend on the network itself and can be
queried using .
In order to fulfill these requirements, you may have to preprocess your
images.
Standard preprocessing of the entire dataset and therewith also the
images is implemented in get_dl_model_parampreprocess_dl_samples.
In case of custom preprocessing this procedure offers guidance on the
implementation.
apply_dl_modelclear_dl_modeldeserialize_dl_modelget_dl_model_paramread_dl_modelserialize_dl_modelset_dl_model_paramtrain_dl_model_batchwrite_dl_model