Operators |
create_class_mlp — Create a multilayer perceptron for classification or regression.
create_class_mlp( : : NumInput, NumHidden, NumOutput, OutputFunction, Preprocessing, NumComponents, RandSeed : MLPHandle)
create_class_mlp creates a neural net in the form of a multilayer perceptron (MLP), which can be used for classification or regression (function approximation), depending on how OutputFunction is set. The MLP consists of three layers: an input layer with NumInput input variables (units, neurons), a hidden layer with NumHidden units, and an output layer with NumOutput output variables. The MLP performs the following steps to calculate the activations of the hidden units from the input data (the so-called feature vector):
The activation function used in the output layer can be determined by setting OutputFunction. For OutputFunction = 'linear' , the data are simply copied:
For OutputFunction = 'logistic' , the activations are computed as follows:
For OutputFunction = 'softmax' , the activations are computed as follows:
This type of activation function should be used for common classification problems with multiple (NumOutput) mutually exclusive classes as output. In particular, OutputFunction = 'softmax' must be used for the classification of pixel data with classify_image_class_mlp.
The parameters Preprocessing and NumComponents can be used to specify a preprocessing of the feature vectors. For Preprocessing = 'none' , the feature vectors are passed unaltered to the MLP. NumComponents is ignored in this case.
For all other values of Preprocessing, the training data set is used to compute a transformation of the feature vectors during the training as well as later in the classification or evaluation.
For Preprocessing = 'normalization' , the feature vectors are normalized by subtracting the mean of the training vectors and dividing the result by the standard deviation of the individual components of the training vectors. Hence, the transformed feature vectors have a mean of 0 and a standard deviation of 1. The normalization does not change the length of the feature vector. NumComponents is ignored in this case. This transformation can be used if the mean and standard deviation of the feature vectors differs substantially from 0 and 1, respectively, or for data in which the components of the feature vectors are measured in different units (e.g., if some of the data are gray value features and some are region features, or if region features are mixed, e.g., 'circularity' (unit: scalar) and 'area' (unit: pixel squared)). In these cases, the training of the net will typically require fewer iterations than without normalization.
For Preprocessing = 'principal_components' , a principal component analysis is performed. First, the feature vectors are normalized (see above). Then, an orthogonal transformation (a rotation in the feature space) that decorrelates the training vectors is computed. After the transformation, the mean of the training vectors is 0 and the covariance matrix of the training vectors is a diagonal matrix. The transformation is chosen such that the transformed features that contain the most variation is contained in the first components of the transformed feature vector. With this, it is possible to omit the transformed features in the last components of the feature vector, which typically are mainly influenced by noise, without losing a large amount of information. The parameter NumComponents can be used to determine how many of the transformed feature vector components should be used. Up to NumInput components can be selected. The operator get_prep_info_class_mlp can be used to determine how much information each transformed component contains. Hence, it aids the selection of NumComponents. Like data normalization, this transformation can be used if the mean and standard deviation of the feature vectors differs substantially from 0 and 1, respectively, or for feature vectors in which the components of the data are measured in different units. In addition, this transformation is useful if it can be expected that the features are highly correlated.
In contrast to the above three transformations, which can be used for all MLP types, the transformation specified by Preprocessing = 'canonical_variates' can only be used if the MLP is used as a classifier with OutputFunction = 'softmax' ). The computation of the canonical variates is also called linear discriminant analysis. In this case, a transformation that first normalizes the training vectors and then decorrelates the training vectors on average over all classes is computed. At the same time, the transformation maximally separates the mean values of the individual classes. As for Preprocessing = 'principal_components' , the transformed components are sorted by information content, and hence transformed components with little information content can be omitted. For canonical variates, up to min(NumOutput - 1, NumInput) components can be selected. Also in this case, the information content of the transformed components can be determined with get_prep_info_class_mlp. Like principal component analysis, canonical variates can be used to reduce the amount of data without losing a large amount of information, while additionally optimizing the separability of the classes after the data reduction.
For the last two types of transformations ('principal_components' and 'canonical_variates' ), the actual number of input units of the MLP is determined by NumComponents, whereas NumInput determines the dimensionality of the input data (i.e., the length of the untransformed feature vector). Hence, by using one of these two transformations, the number of input variables, and thus usually also the number of hidden units can be reduced. With this, the time needed to train the MLP and to evaluate and classify a feature vector is typically reduced.
Usually, NumHidden should be selected in the order of magnitude of NumInput and NumOutput. In many cases, much smaller values of NumHidden already lead to very good classification results. If NumHidden is chosen too large, the MLP may overfit the training data, which typically leads to bad generalization properties, i.e., the MLP learns the training data very well, but does not return very good results on unknown data.
create_class_mlp initializes the above described weights with random numbers. To ensure that the results of training the classifier with train_class_mlp are reproducible, the seed value of the random number generator is passed in RandSeed. If the training results in a relatively large error, it sometimes may be possible to achieve a smaller error by selecting a different value for RandSeed and retraining an MLP.
After the MLP has been created, typically training samples are added to the MLP by repeatedly calling add_sample_class_mlp or read_samples_class_mlp. After this, the MLP is typically trained using train_class_mlp. Hereafter, the MLP can be saved using write_class_mlp. Alternatively, the MLP can be used immediately after training to evaluate data using evaluate_class_mlp or, if the MLP is used as a classifier (i.e., for OutputFunction = 'softmax' ), to classify data using classify_class_mlp.
The training of the MLP will usually result in very sharp boundaries between the different classes, i.e., the confidence for one class will drop from close to 1 (within the region of the class) to close to 0 (within the region of a different class) within a very narrow “band” in the feature space. If the classes do not overlap, this transition happens at a suitable location between the classes; if the classes overlap, the transition happens at a suitable location within the overlapping area. While this sharp transition is desirable in many applications, in some applications a smoother transition between different classes (i.e., a transition within a wider “band” in the feature space) is desirable to reflect a level of uncertainty within the region in the feature space between the classes. Furthermore, as described above, it may be desirable to prevent overfitting of the MLP to the training data. For these purposes, the MLP can be regularized by using set_regularization_params_class_mlp.
An MLP, as defined above, has no inherent capability for novelty detection, i.e., it will classify a random feature vector into one of the classes with a confidence close to 1 (unless the random feature vector happens to lie in a region of the feature space in which the training samples of different classes overlap). In some applications, however, it is desirable to reject feature vectors that do not lie close to any class, where “closesness” defined by the proximity of the feature vector to the collection of feature vectors in the training set. To provide an MLP with the ability for novelty detection, i.e., to reject feature vectors that do not belong to any class, an explicit rejection class can be created by setting NumOutput to the number of actual classes plus 1. Then, set_rejection_params_class_mlp can be used to configure train_class_mlp to automatically generate samples for this rejection class.
The combination of regularization and an automatic generation of a rejection class is useful in many applications since it provides a smooth transition between the actual classes and from the actual classes to the rejection class. This reflects the requirement of these applications that only feature vectors within the area of the feature space that corresponds to the training samples of each class should have a confidence close to 1, whereas random feature vectors not belonging to any class should have a confidence close to 0, and that transitions between the classes should be smooth, reflecting a growing degree of uncertainty the farther a feature vector lies from the respective class. In particular, OCR applications sometimes have this requirement (see create_ocr_class_mlp).
A comparison of the MLP and the support vector machine (SVM) (see create_class_svm) typically shows that SVMs are generally faster at training, especially for huge training sets, and achieve slightly better recognition rates than MLPs. The MLP is faster at classification and should therefore be preferred in time critical applications. Please note that this guideline assumes optimal tuning of the parameters.
This operator returns a handle. Note that the state of an instance of this handle type may be changed by specific operators even though the handle is used as an input parameter by those operators.
Number of input variables (features) of the MLP.
Default value: 20
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100
Restriction: NumInput >= 1
Number of hidden units of the MLP.
Default value: 10
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150
Restriction: NumHidden >= 1
Number of output variables (classes) of the MLP.
Default value: 5
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150
Restriction: NumOutput >= 1
Type of the activation function in the output layer of the MLP.
Default value: 'softmax'
List of values: 'linear' , 'logistic' , 'softmax'
Type of preprocessing used to transform the feature vectors.
Default value: 'normalization'
List of values: 'canonical_variates' , 'none' , 'normalization' , 'principal_components'
Preprocessing parameter: Number of transformed features (ignored for Preprocessing = 'none' and Preprocessing = 'normalization' ).
Default value: 10
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100
Restriction: NumComponents >= 1
Seed value of the random number generator that is used to initialize the MLP with random values.
Default value: 42
MLP handle.
* Use the MLP for regression (function approximation) create_class_mlp (1, NumHidden, 1, 'linear', 'none', 1, 42, MLPHandle) * Generate the training data * D = [...] * T = [...] * Add the training data for J := 0 to NumData-1 by 1 add_sample_class_mlp (MLPHandle, D[J], T[J]) endfor * Train the MLP train_class_mlp (MLPHandle, 200, 0.001, 0.001, Error, ErrorLog) * Generate test data * X = [...] * Compute the output of the MLP on the test data for J := 0 to N-1 by 1 evaluate_class_mlp (MLPHandle, X[J], Y) endfor clear_class_mlp (MLPHandle) * Use the MLP for classification create_class_mlp (NumIn, NumHidden, NumOut, 'softmax', \ 'normalization', NumIn, 42, MLPHandle) * Generate and add the training data for J := 0 to NumData-1 by 1 * Generate training features and classes * Data = [...] * Class = [...] add_sample_class_mlp (MLPHandle, Data, Class) endfor * Train the MLP train_class_mlp (MLPHandle, 100, 1, 0.01, Error, ErrorLog) * Use the MLP to classify unknown data for J := 0 to N-1 by 1 * Extract features * Features = [...] classify_class_mlp (MLPHandle, Features, 1, Class, Confidence) endfor clear_class_mlp (MLPHandle)
If the parameters are valid, the operator create_class_mlp returns the value 2 (H_MSG_TRUE). If necessary, an exception is raised.
add_sample_class_mlp, set_regularization_params_class_mlp, set_rejection_params_class_mlp
read_dl_classifier, create_class_svm, create_class_gmm
clear_class_mlp, train_class_mlp, classify_class_mlp, evaluate_class_mlp
Christopher M. Bishop: “Neural Networks for Pattern Recognition”;
Oxford University Press, Oxford; 1995.
Andrew Webb: “Statistical Pattern Recognition”; Arnold, London;
1999.
Foundation
Operators |