Model [HALCON Operator Reference / Version 20.05.0.0]

Model

This chapter explains the general concept of the deep learning (DL) model in HALCON and the data handling.

By concept, a deep learning model in HALCON is an internal representation of a deep neural network. Each deep neural network has an architecture defining its function, i.e., the tasks it can be used for. There can be several possible network architectures for one functionality. Currently, networks for the following functionalities are implemented in HALCON as model:

Anomaly Detection, see Deep Learning / Anomaly Detection.
Classification, see Deep Learning / Classification.
Object detection, see Deep Learning / Object Detection.
Semantic segmentation, see Deep Learning / Semantic Segmentation.

Each functionality is identified by its unique model type. For the implemented methods you can find further information about the specific workflow, data requirements, and validation measures in the corresponding chapters. Information to deep learning in general are given in the chapter Deep Learning.

In this chapter you find the information, which data a DL model needs and returns as well as how this data is transferred.

Data

Deep Learning applications have different types of data to be distinguished. Roughly spoken these are: The raw images with possible annotations, data preprocessed in a way suitable for the model, and output data.

Before the different types of data and the entries of the specific dictionaries are explained, we will have a look how the data is connected. Thereby, symbols and colors refer to the schematic overviews given below.

In brief, the data structure for training or evaluation starts with the raw images and their ground truth annotations (gray frames). With the read data the following dictionaries are created: A dictionary DLDataset (red), which serves as database and refers to a specific dictionary (yellow) for every input image. The dictionary DLSample (orange) contains the data for a sample in the way the network can process it. A batch of DLSample is handed to the model in DLSampleBatch. For evaluation, DLResultBatch is returned, a tuple of dictionaries DLResult (dark blue), one for every sample. They are needed to obtain the evaluation results EvaluationResults. For training, the training results (e.g., loss values) are returned in the dictionary DLTrainResult (light blue). The most important steps concerning modifying or creating a dictionary:

reading the raw data (symbol: paper with arrow)
preprocessing the data (symbol: cogs)
training (symbol: transparent brain in an arc)
evaluation of the model (symbol: graph)
evaluation of a sample (symbol: loupe)

Schematic overview of the data structure during training and evaluation.

For inference no annotations are needed. Thus, the data structure starts with the raw images (gray frames). The dictionary DLSample (orange) contains the data for a sample in the way the network can process it. The results for a sample are returned in a dictionary DLResult (dark blue). The most important steps concerning modifying or creating a dictionary:

reading the raw data (symbol: paper with arrow)
preprocessing the data (symbol: cogs)
inference (symbol: brain in a circle)
evaluation of a sample (symbol: loupe)

Schematic overview of the data connection during inference.

In order for the model to process the data, the data needs to follow certain conventions about what is needed and how it is given to the model. As visible from the figures above, in HALCON the data is transferred using dictionaries.

In the following we explain the involved dictionaries, how they can be created, and their entries. Thereby, we group them according to the main step of a deep learning application they are created in and whether they serve as input or output data. The following abbreviations mark for which methods the entry applies:

'Any': any method
'AD': anomaly detection
'CL': classification
'OD': object detection

In case the entry is only applicable for a certain 'instance_type'"instance_type""instance_type""instance_type""instance_type", the specification 'r1': 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1", 'r2': 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2" is added
'SE': semantic segmentation

The entries only applicable for certain methods are described more extensively in the corresponding chapter reference.

Training and evaluation input data

The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below.

The information about the images and the dataset is represented in a dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset, which serves as a database. More precisely, it stores the general information about the dataset and the dictionaries of the individual samples collected under the key samplessamplessamplessamplessamples. When the actual image data is needed, a dictionary DLSampleDLSampleDLSampleDLSampleDLSample is created (or read if it already exists) for each image required. The relation of these dictionaries is illustrated in the figure below.

Schematic illustration of the different dataset dictionaries used for training and evaluation. For visibility purpose only few entries are registered and BatchSizeBatchSizeBatchSizeBatchSizebatchSize is set to three. In this example we have samples. Thereof three samples are chosen randomly: i,j, and k. The corresponding dictionaries DLSampleDLSampleDLSampleDLSampleDLSample are created and joined in the tuple DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch.

In the following we look at these dictionaries.

DLDatasetDLDatasetDLDatasetDLDatasetDLDataset

The dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset serves as a database. It stores general information about the dataset and collects the dictionaries of the individual samples. Thereby iconic data is not included in DLDatasetDLDatasetDLDatasetDLDatasetDLDataset but the paths to the respective images. The dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset is used by the training and evaluation procedures. It is not necessary for the model, but we highly recommend to create it. Its necessary entries are described below. This dictionary is directly created when labeling your data using the MVTec Deep Learning Tool. Alternatively it is created when reading in your data using one of the following procedures:

read_dl_dataset_anomaly (anomaly detection)
read_dl_dataset_classification (classification)
read_dl_dataset_from_coco (object detection with 'instance_type'"instance_type""instance_type""instance_type""instance_type" = 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1")
read_dl_dataset_segmentation (semantic segmentation).

Please see the respective procedure documentation for the requirements on the data in order to use these procedures. In case you create DLDatasetDLDatasetDLDatasetDLDatasetDLDataset in an other way, it has to contain at least the entries not marked with a number in the description below. During the preprocessing of your dataset the respective procedures include the further entries of the dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset.

Depending on the model type, this dictionary can have the following entries:

image_dirimage_dirimage_dirimage_dirimageDir: Any

Common base path to all images.

format: string

dlsample_dirdlsample_dirdlsample_dirdlsample_dirdlsampleDir: Any [1]

Common base path of all sample files (if present).

format: string

class_namesclass_namesclass_namesclass_namesclassNames: Any

Names of all classes that are to be distinguished.

format: tuple of strings

class_idsclass_idsclass_idsclass_idsclassIds: Any

IDs of all classes that are to be distinguished (range: 0-65534).

format: tuple of integers

preprocess_parampreprocess_parampreprocess_parampreprocess_parampreprocessParam: Any [1]

All parameter values used during preprocessing.

format: dictionary

samplessamplessamplessamplessamples: Any

Collection of sample descriptions.

format: tuple of dictionaries

class_weightsclass_weightsclass_weightsclass_weightsclassWeights: CL, SE [1]

Weights of the different classes.

format: tuple of reals

anomaly_diranomaly_diranomaly_diranomaly_diranomalyDir: AD

Common base path of all anomaly regions (regions indicating anomalies in the image).

format: string

segmentation_dirsegmentation_dirsegmentation_dirsegmentation_dirsegmentationDir: SE

Common base path of all segmentation images.

format: string

This dictionary is directly created when labeling your data using the MVTec Deep Learning Tool. It is also created by the procedures mentioned above for reading in your data. The entries marked with [1] are added by the preprocessing procedures.

samplessamplessamplessamplessamples

The DLDatasetDLDatasetDLDatasetDLDatasetDLDataset key samplessamplessamplessamplessamples gets a tuple of dictionaries as value, one for each sample in the dataset. These dictionaries contain the information concerning an individual sample of the dataset. Depending on the model type, this dictionary can have the following entries:

image_file_nameimage_file_nameimage_file_nameimage_file_nameimageFileName: Any

File name of the image and its path relative to image_dirimage_dirimage_dirimage_dirimageDir.

format: string

image_idimage_idimage_idimage_idimageId: Any

Unique image ID (encoding format: UINT8).

format: integer

splitsplitsplitsplitsplit: Any [2]

Specifies the assigned split subset ('train'"train""train""train""train",'validation'"validation""validation""validation""validation",'test'"test""test""test""test").

format: string

dlsample_file_namedlsample_file_namedlsample_file_namedlsample_file_namedlsampleFileName: Any [3]

File name of the corresponding dictionary DLSampleDLSampleDLSampleDLSampleDLSample and its path relative to dlsample_dirdlsample_dirdlsample_dirdlsample_dirdlsampleDir.

format: string

anomaly_file_nameanomaly_file_nameanomaly_file_nameanomaly_file_nameanomalyFileName: AD

Optional. Path to region files with ground truth annotations (relative to anomaly_diranomaly_diranomaly_diranomaly_diranomalyDir).

format: string

anomaly_labelanomaly_labelanomaly_labelanomaly_labelanomalyLabel: AD

Ground truth anomaly label on image level (in the form of class_namesclass_namesclass_namesclass_namesclassNames).

format: string

image_label_idimage_label_idimage_label_idimage_label_idimageLabelId: CL

Ground truth label for the image (in the form of class_idsclass_idsclass_idsclass_idsclassIds).

format: tuple of integers

bbox_label_idbbox_label_idbbox_label_idbbox_label_idbboxLabelId: OD

Ground truth labels for the bounding boxes (in the form of class_idsclass_idsclass_idsclass_idsclassIds).

format: tuple of integers

bbox_row1bbox_row1bbox_row1bbox_row1bboxRow1: OD:r1 [4]

Ground truth bounding boxes: upper left corner, row coordinate.

format: tuple of reals

bbox_col1bbox_col1bbox_col1bbox_col1bboxCol1: OD:r1 [4]

Ground truth bounding boxes: upper left corner, column coordinate.

format: tuple of reals

bbox_row2bbox_row2bbox_row2bbox_row2bboxRow2: OD:r1 [4]

Ground truth bounding boxes: lower right corner, row coordinate.

format: tuple of reals

bbox_col2bbox_col2bbox_col2bbox_col2bboxCol2: OD:r1 [4]

Ground truth bounding boxes: lower right corner, column coordinate.

format: tuple of reals

coco_raw_annotationscoco_raw_annotationscoco_raw_annotationscoco_raw_annotationscocoRawAnnotations: OD:r1

Optional. It contains for every bbox_label_idbbox_label_idbbox_label_idbbox_label_idbboxLabelId within this image a dictionary with all raw COCO annotation information.

format: tuple of dictionaries

bbox_rowbbox_rowbbox_rowbbox_rowbboxRow: OD:r2 [4]

Ground truth bounding boxes: center point, row coordinate.

format: tuple of reals

bbox_colbbox_colbbox_colbbox_colbboxCol: OD:r2 [4]

Ground truth bounding boxes: center point, column coordinate.

format: tuple of reals

bbox_phibbox_phibbox_phibbox_phibboxPhi: OD:r2 [4]

Ground truth bounding boxes: angle phi.

format: tuple of reals

bbox_length1bbox_length1bbox_length1bbox_length1bboxLength1: OD:r2 [4]

Ground truth bounding boxes: half length of edge 1.

format: tuple of reals

bbox_length2bbox_length2bbox_length2bbox_length2bboxLength2: OD:r2 [4]

Ground truth bounding boxes: half length of edge 2.

format: tuple of reals

segmentation_file_namesegmentation_file_namesegmentation_file_namesegmentation_file_namesegmentationFileName: SE

File name of the ground truth segmentation image and its path relative to segmentation_dirsegmentation_dirsegmentation_dirsegmentation_dirsegmentationDir.

format: string

These dictionaries are part of DLDatasetDLDatasetDLDatasetDLDatasetDLDataset and thus they are created concurrently. An exception are the entries with a mark in the table, [2]: the procedure split_dl_dataset adds splitsplitsplitsplitsplit, [3]: the procedure preprocess_dl_samples adds dlsample_file_namedlsample_file_namedlsample_file_namedlsample_file_namedlsampleFileName. [4]: Used coordinates: Pixel centered, subpixel accurate coordinates.

DLSampleDLSampleDLSampleDLSampleDLSample

The dictionary DLSampleDLSampleDLSampleDLSampleDLSample serves as input for the model. For a batch, they are handed over as the entries of the tuple DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch. They are created out of DLDatasetDLDatasetDLDatasetDLDatasetDLDataset for every image sample by the procedure gen_dl_samples. If preprocessing is done using the standard procedure preprocess_dl_samples, they are created automatically therein. Note, preprocessing steps may lead to an update of the corresponding DLSampleDLSampleDLSampleDLSampleDLSample dictionary.

DLSampleDLSampleDLSampleDLSampleDLSample contains the preprocessed image and, in case of training and evaluation, all ground truth annotations. Depending on the model type, it can have the following entries:

imageimageimageimageimage: Any

Input image.

format: image

image_idimage_idimage_idimage_idimageId: Any

Unique image ID (as in DLDatasetDLDatasetDLDatasetDLDatasetDLDataset).

format: integer

anomaly_ground_truthanomaly_ground_truthanomaly_ground_truthanomaly_ground_truthanomalyGroundTruth: AD

Anomaly image or region, read from anomaly_file_nameanomaly_file_nameanomaly_file_nameanomaly_file_nameanomalyFileName.

format: image or region

anomaly_labelanomaly_labelanomaly_labelanomaly_labelanomalyLabel: AD

Ground truth anomaly label on image level (in the form of class_namesclass_namesclass_namesclass_namesclassNames).

format: string

anomaly_label_idanomaly_label_idanomaly_label_idanomaly_label_idanomalyLabelId: AD

Ground truth anomaly label ID on image level (in the form of class_idsclass_idsclass_idsclass_idsclassIds).

format: integer

image_label_idimage_label_idimage_label_idimage_label_idimageLabelId: CL

Ground truth label for the image (in the form of class_idsclass_idsclass_idsclass_idsclassIds).

format: tuple of integers

bbox_label_idbbox_label_idbbox_label_idbbox_label_idbboxLabelId: OD

Ground truth labels for the image part within the bounding box (in the form of class_idsclass_idsclass_idsclass_idsclassIds).

format: tuple of integers

bbox_row1bbox_row1bbox_row1bbox_row1bboxRow1: OD:r1 [4]

Ground truth bounding boxes: upper left corner, row coordinate.

format: tuple of reals

bbox_col1bbox_col1bbox_col1bbox_col1bboxCol1: OD:r1 [4]

Ground truth bounding boxes: upper left corner, column coordinate.

format: tuple of reals

bbox_row2bbox_row2bbox_row2bbox_row2bboxRow2: OD:r1 [4]

Ground truth bounding boxes: lower right corner, row coordinate.

format: tuple of reals

bbox_col2bbox_col2bbox_col2bbox_col2bboxCol2: OD:r1 [4]

Ground truth bounding boxes: lower right corner, column coordinate.

format: tuple of reals

bbox_rowbbox_rowbbox_rowbbox_rowbboxRow: OD:r2 [4]

Ground truth bounding boxes: center point, row coordinate.

format: tuple of reals

bbox_colbbox_colbbox_colbbox_colbboxCol: OD:r2 [4]

Ground truth bounding boxes: center point, column coordinate.

format: tuple of reals

bbox_phibbox_phibbox_phibbox_phibboxPhi: OD:r2 [4]

Ground truth bounding boxes: angle phi.

format: tuple of reals

bbox_length1bbox_length1bbox_length1bbox_length1bboxLength1: OD:r2 [4]

Ground truth bounding boxes: half length of edge 1.

format: tuple of reals

bbox_length2bbox_length2bbox_length2bbox_length2bboxLength2: OD:r2 [4]

Ground truth bounding boxes: half length of edge 2.

format: tuple of reals

segmentation_imagesegmentation_imagesegmentation_imagesegmentation_imagesegmentationImage: SE

Image with the ground truth segmentations, read from segmentation_file_namesegmentation_file_namesegmentation_file_namesegmentation_file_namesegmentationFileName.

format: image

weight_imageweight_imageweight_imageweight_imageweightImage: SE [5]

Image with the pixel weights.

format: image

These dictionaries are created by the procedure gen_dl_samples. An exception is the entry marked in the table above, [5]: created by the procedure gen_dl_segmentation_weights. [4]: Used coordinates: Pixel centered, subpixel accurate coordinates.

Note, in case these DLSampleDLSampleDLSampleDLSampleDLSample should be stored, use the procedure write_dl_samples. You can read them with the procedure read_dl_samples.

Inference input data

The inference input data consists of bare images. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the subsection “Images” below.

The model is set up to hand over all data through a dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDataset. For the inference, such a dictionary containing only the image can be created using the procedure gen_dl_samples_from_images. These dictionaries can be passed one at a time or within a tuple DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch.

Training output data

The training output data is given in the dictionary DLTrainResultDLTrainResultDLTrainResultDLTrainResultDLTrainResult. Its entries depend on the model and thus on the operator used (for further information see the documentation of the corresponding operator):

CL, OD, SE:

The operator train_dl_model_batchtrain_dl_model_batchTrainDlModelBatchTrainDlModelBatchTrainDlModelBatch returns

total_losstotal_losstotal_losstotal_losstotalLoss
possible further losses included in your model

AD:

The operator train_dl_model_anomaly_datasettrain_dl_model_anomaly_datasetTrainDlModelAnomalyDatasetTrainDlModelAnomalyDatasetTrainDlModelAnomalyDataset returns

final_errorfinal_errorfinal_errorfinal_errorfinalError
final_epochfinal_epochfinal_epochfinal_epochfinalEpoch

Inference and evaluation output data

As output from the operator apply_dl_modelapply_dl_modelApplyDlModelApplyDlModelApplyDlModel, the model will return a dictionary DLResultDLResultDLResultDLResultDLResult for each sample. An illustration is given in the figure below. The evaluation is based on these results and the annotations. Evaluation results are stored in the dictionary EvaluationResultsEvaluationResultsEvaluationResultsEvaluationResultsevaluationResults.


(1)	(2)

Schematic illustration of the dictionaries serving as model input: (1) Evaluation: DLSampleDLSampleDLSampleDLSampleDLSample includes the image as well as information about the image and its content. This data serves as basis for the evaluation. For visibility purpose BatchSizeBatchSizeBatchSizeBatchSizebatchSize is set to three (containing the randomly chosen samples i,j,and k, see above) and only few entries are registered. (2) Inference: DLSampleDLSampleDLSampleDLSampleDLSample contains only the image. These dictionaries can be passed one at a time or within a tuple.

Depending on the model type, the dictionary DLResultDLResultDLResultDLResultDLResult can have the following entries:

anomaly_imageanomaly_imageanomaly_imageanomaly_imageanomalyImage: AD

Single channel image whose gray values are scores, indicating how likely the corresponding pixel in the input image belongs to an anomaly.

format: image

anomaly_scoreanomaly_scoreanomaly_scoreanomaly_scoreanomalyScore: AD

Anomaly score on image level calculated from anomaly_imageanomaly_imageanomaly_imageanomaly_imageanomalyImage.

format: real

classification_class_idsclassification_class_idsclassification_class_idsclassification_class_idsclassificationClassIds: CL

Inferred class ids for the image sorted by confidence values.

format: tuple of integers

classification_class_namesclassification_class_namesclassification_class_namesclassification_class_namesclassificationClassNames: CL

Inferred class names for the image sorted by confidence values.

format: tuple of strings

classification_confidencesclassification_confidencesclassification_confidencesclassification_confidencesclassificationConfidences: CL

Confidence values of the image inference for each class.

format: tuple of reals

bbox_class_idbbox_class_idbbox_class_idbbox_class_idbboxClassId: OD

Inferred class for the bounding box (in the form of class_idsclass_idsclass_idsclass_idsclassIds).

format: tuple of integers

bbox_confidencebbox_confidencebbox_confidencebbox_confidencebboxConfidence: OD

Confidence value of the inference for the bounding box.

format: tuple of reals

bbox_row1bbox_row1bbox_row1bbox_row1bboxRow1: OD:r1 [6]

Inferred bounding boxes: upper left corner, row coordinate.

format: tuple of reals

bbox_col1bbox_col1bbox_col1bbox_col1bboxCol1: OD:r1 [6]

Inferred bounding boxes: upper left corner, column coordinate.

format: tuple of reals

bbox_row2bbox_row2bbox_row2bbox_row2bboxRow2: OD:r1 [6]

Inferred bounding boxes: lower right corner, row coordinate.

format: tuple of reals

bbox_col2bbox_col2bbox_col2bbox_col2bboxCol2: OD:r1 [6]

Inferred bounding boxes: lower right corner, row coordinate.

format: tuple of reals

bbox_rowbbox_rowbbox_rowbbox_rowbboxRow: OD:r2 [6]

Inferred bounding boxes: center point, row coordinate.

format: tuple of reals

bbox_colbbox_colbbox_colbbox_colbboxCol: OD:r2 [6]

Inferred bounding boxes: center point, column coordinate.

format: tuple of reals

bbox_phibbox_phibbox_phibbox_phibboxPhi: OD:r2 [6]

Inferred bounding boxes: angle phi.

format: tuple of reals

bbox_length1bbox_length1bbox_length1bbox_length1bboxLength1: OD:r2 [6]

Inferred bounding boxes: half length of edge 1.

format: tuple of reals

bbox_length2bbox_length2bbox_length2bbox_length2bboxLength2: OD:r2 [6]

Inferred bounding boxes: half length of edge 2.

format: tuple of reals

segmentation_imagesegmentation_imagesegmentation_imagesegmentation_imagesegmentationImage: SE

Image with the segmentation result.

format: image

segmentation_confidencesegmentation_confidencesegmentation_confidencesegmentation_confidencesegmentationConfidence: SE

Image with the confidence values of the segmentation result.

format: image

[6]: Used coordinates: Pixel centered, subpixel accurate coordinates.

For a further explanation to the output values we refer to the chapters of the respective method, e.g., Deep Learning / Semantic Segmentation.

Images

Regardless of the application, the network poses requirements on the images. The specific values depend on the network itself and can be queried using get_dl_model_paramget_dl_model_paramGetDlModelParamGetDlModelParamGetDlModelParam. In order to fulfill these requirements, you may have to preprocess your images. Standard preprocessing of the entire dataset and therewith also the images is implemented in preprocess_dl_samples. In case of custom preprocessing this procedure offers guidance on the implementation.

List of Operators

apply_dl_modelApplyDlModelApplyDlModelapply_dl_model: Apply a deep-learning-based network on a set of images for inference.

clear_dl_modelClearDlModelClearDlModelclear_dl_model: Clear a deep learning model.

deserialize_dl_modelDeserializeDlModelDeserializeDlModeldeserialize_dl_model: Deserialize a deep learning model.

gen_dl_model_heatmapGenDlModelHeatmapGenDlModelHeatmapgen_dl_model_heatmap: Infer the sample and generate a heatmap.

get_dl_model_paramGetDlModelParamGetDlModelParamget_dl_model_param: Return the parameters of a deep learning model.

read_dl_modelReadDlModelReadDlModelread_dl_model: Read a deep learning model from a file.

serialize_dl_modelSerializeDlModelSerializeDlModelserialize_dl_model: Serialize a deep learning model.

set_dl_model_paramSetDlModelParamSetDlModelParamset_dl_model_param: Set the parameters of a deep learning model.

train_dl_model_batchTrainDlModelBatchTrainDlModelBatchtrain_dl_model_batch: Train a deep learning model.

write_dl_modelWriteDlModelWriteDlModelwrite_dl_model: Write a deep learning model in a file.

Operators