Model [HALCON Operator Reference / Version 19.05.0.0]

This chapter explains the general concept of the deep learning model in HALCON and the data handling.

By concept, a deep learning model in HALCON is a deep neural network. Each deep neural network has an architecture defining its function, i.e., the tasks it can be used for. There can be several possible network architectures for one functionality. Currently, networks for the following functionalities, also referred to as methods or types, are implemented in HALCON as model:

For the implemented methods you can find further information about the specific workflow, data requirements, and validation measures in the corresponding chapters. Information to deep learning (DL) in general are in the chapter Deep Learning.

In this chapter you find the information, which data a DL model needs and returns as well as how this data is transferred.

Data

Independent of the deep learning method used, the data has to be provided to the model following certain conventions. As basic concept, the model handles data over dictionaries. More precisely, the model receives for every input image a dictionary DLSampleDLSampleDLSampleDLSampleDLSample. Such a dictionary contains the image and, in case of training and evaluation, information like e.g., the ground truth annotations. As output the model returns a dictionary DLResultDLResultDLResultDLResultDLResult with the results. An illustration is given in the figure below.

During training and evaluation, an additional dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset serves as a database and collects all the individual image information dictionaries (stored in the key samplessamplessamplessamplessamples). Out of this database the model input dictionaries, DLSampleDLSampleDLSampleDLSampleDLSample, are created, see the illustration below and the section “Training and evaluation input data”.

Although not necessary for the model itself, the dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset is used from the training and evaluation procedures. Therefore we highly recommend to create a dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset out of your data. Its necessary entries are described below. This dictionary is directly created when labeling your data using the MVTec Deep Learning Tool. Alternatively it is created when reading in your data using one of the following procedures: read_dl_dataset_from_coco (object detection with 'instance_type'"instance_type""instance_type""instance_type""instance_type" = 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1") and read_dl_dataset_segmentation (semantic segmentation). Please see the respective procedure documentation for the requirements on the data in order to use these procedures. In case you create DLDatasetDLDatasetDLDatasetDLDatasetDLDataset in an other way, it has to contain at least the entries not marked with a number in the description below. During the preprocessing of your dataset the respective procedures include the further entries of the dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset.

In the following, we explain the different data and the involved dictionaries. Thereby, the following abbreviations mark for which methods (m) the entry applies:

The entries only applicable for certain methods are described more extensively in the corresponding chapter reference.

Training and evaluation input data

The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below.

The information about the images and the dataset is represented in a dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset, which serves as a database. More precisely, it stores the general information about the dataset and the dictionaries of the individual samples collected under samplessamplessamplessamplessamples. When the actual image data is needed, a dictionary DLSampleDLSampleDLSampleDLSampleDLSample is created (or read if it already exists) for each image required. The relation of these dictionaries is illustrated in the figure above. In the following we will explain these dictionaries with their key/value pairs in more details.

DLDatasetDLDatasetDLDatasetDLDatasetDLDataset

The dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset stores general information about the dataset and collects the dictionaries of the individual samples. Thereby iconic data is not included in DLDatasetDLDatasetDLDatasetDLDatasetDLDataset but the paths to the respective images. Depending on the model type, this dictionary can have the following entries:

key description format m

image_dirimage_dirimage_dirimage_dirimageDir common base path to all images string A

dlsample_dirdlsample_dirdlsample_dirdlsample_dirdlsampleDir [1] common base path of all sample files (if present) string A

class_namesclass_namesclass_namesclass_namesclassNames names of all classes that are to be distinguished tuple of strings A

class_idsclass_idsclass_idsclass_idsclassIds IDs of all classes that are to be distinguished (range: 0-65534) tuple of integers A

preprocess_parampreprocess_parampreprocess_parampreprocess_parampreprocessParam [1] all parameter values used during preprocessing dictionary A

samplessamplessamplessamplessamples collection of sample descriptions tuple of dictionaries A

class_weightsclass_weightsclass_weightsclass_weightsclassWeights [1] weights of the different classes tuple of reals S

segmentation_dirsegmentation_dirsegmentation_dirsegmentation_dirsegmentationDir common base path of all segmentation images string S

This dictionary is directly created when labeling your data using the MVTec Deep Learning Tool. It is also created by the following procedures: read_dl_dataset_from_coco (object detection with 'instance_type'"instance_type""instance_type""instance_type""instance_type" = 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1") and read_dl_dataset_segmentation (semantic segmentation). The entries marked with [1] are included from the preprocessing procedures.

samplessamplessamplessamplessamples

The DLDatasetDLDatasetDLDatasetDLDatasetDLDataset key samplessamplessamplessamplessamples gets a tuple of dictionaries as value, one for each sample in the dataset. These dictionaries contain the information concerning an individual sample of the dataset. Depending on the model type, this dictionary can have the following entries:

key description format m

image_file_nameimage_file_nameimage_file_nameimage_file_nameimageFileName file name of the image and its path relative to image_dirimage_dirimage_dirimage_dirimageDir string A

image_idimage_idimage_idimage_idimageId unique image ID (encoding format: UINT8) integer A

splitsplitsplitsplitsplit [2] specifies the assigned split subset ('train'"train""train""train""train",'validation'"validation""validation""validation""validation",'test'"test""test""test""test") string A

dlsample_file_namedlsample_file_namedlsample_file_namedlsample_file_namedlsampleFileName [3] file name of the corresponding dictionary DLSampleDLSampleDLSampleDLSampleDLSample and its path relative to dlsample_dirdlsample_dirdlsample_dirdlsample_dirdlsampleDir string A

segmentation_file_namesegmentation_file_namesegmentation_file_namesegmentation_file_namesegmentationFileName file name of the ground truth segmentation image and its path relative to segmentation_dirsegmentation_dirsegmentation_dirsegmentation_dirsegmentationDir string S

bbox_label_idbbox_label_idbbox_label_idbbox_label_idbboxLabelId ground truth labels for the bounding boxes (in the form of class_idsclass_idsclass_idsclass_idsclassIds) tuple of integers D

bbox_row1bbox_row1bbox_row1bbox_row1bboxRow1 [4] BBoxGT: upper left corner, row coordinate tuple of reals D:r1

bbox_col1bbox_col1bbox_col1bbox_col1bboxCol1 [4] BBoxGT: upper left corner, column coordinate tuple of reals D:r1

bbox_row2bbox_row2bbox_row2bbox_row2bboxRow2 [4] BBoxGT: lower right corner, row coordinate tuple of reals D:r1

bbox_col2bbox_col2bbox_col2bbox_col2bboxCol2 [4] BBoxGT: lower right corner, column coordinate tuple of reals D:r1

coco_raw_annotationscoco_raw_annotationscoco_raw_annotationscoco_raw_annotationscocoRawAnnotations optional. It contains for every bbox_label_idbbox_label_idbbox_label_idbbox_label_idbboxLabelId within this image a dictionary with all raw COCO annotation information tuple of dictionaries D:r1

bbox_rowbbox_rowbbox_rowbbox_rowbboxRow [4] BBoxGT: center point, row coordinate tuple of reals D:r2

bbox_colbbox_colbbox_colbbox_colbboxCol [4] BBoxGT: center point, column coordinate tuple of reals D:r2

bbox_phibbox_phibbox_phibbox_phibboxPhi [4] BBoxGT: angle phi tuple of reals D:r2

bbox_length1bbox_length1bbox_length1bbox_length1bboxLength1 [4] BBoxGT: half length of edge 1 tuple of reals D:r2

bbox_length2bbox_length2bbox_length2bbox_length2bboxLength2 [4] BBoxGT: half length of edge 2 tuple of reals D:r2

These dictionaries are part of DLDatasetDLDatasetDLDatasetDLDatasetDLDataset and thus they are created concurrently. An exception are the entries with a mark in the table, [2]: the procedure split_dl_dataset adds splitsplitsplitsplitsplit, [3]: the procedure preprocess_dl_samples adds dlsample_file_namedlsample_file_namedlsample_file_namedlsample_file_namedlsampleFileName. [4]: For the parameters of the ground truth bounding boxes (BBoxGT), pixel centered, subpixel accurate coordinates are used.

DLSampleDLSampleDLSampleDLSampleDLSample

The dictionary DLSampleDLSampleDLSampleDLSampleDLSample serves as input for the model. For a batch, they are handed over as the entries of the tuple DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch.

A dictionary DLSampleDLSampleDLSampleDLSampleDLSample is created out of DLDatasetDLDatasetDLDatasetDLDatasetDLDataset for every image sample by the procedure gen_dl_samples. It contains all ground truth annotations of an image. If preprocessing is done using the standard procedure preprocess_dl_samples, they are created automatically therein. Note, preprocessing steps may lead to an update of the corresponding DLSampleDLSampleDLSampleDLSampleDLSample dictionary.

Depending on the model type, DLSampleDLSampleDLSampleDLSampleDLSample can have the following entries:

key description format m

imageimageimageimageimage input image image A

image_idimage_idimage_idimage_idimageId unique image ID (as in DLDatasetDLDatasetDLDatasetDLDatasetDLDataset) integer A

segmentation_imagesegmentation_imagesegmentation_imagesegmentation_imagesegmentationImage image with the ground truth segmentations, read from segmentation_file_namesegmentation_file_namesegmentation_file_namesegmentation_file_namesegmentationFileName image S

weight_imageweight_imageweight_imageweight_imageweightImage [5] image with the pixel weights image S

bbox_label_idbbox_label_idbbox_label_idbbox_label_idbboxLabelId ground truth labels for the image part within the bounding box (in the form of class_idsclass_idsclass_idsclass_idsclassIds) tuple of integers D