select_feature_set_gmmT_select_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmmselect_feature_set_gmm (Operator)
Name
select_feature_set_gmmT_select_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmmselect_feature_set_gmm
— Selects an optimal combination from a set of features to classify the
provided data.
Signature
void SelectFeatureSetGmm(const HTuple& ClassTrainDataHandle, const HTuple& SelectionMethod, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* GMMHandle, HTuple* SelectedFeatureIndices, HTuple* Score)
HTuple HClassGmm::SelectFeatureSetGmm(const HClassTrainData& ClassTrainDataHandle, const HString& SelectionMethod, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* Score)
HTuple HClassGmm::SelectFeatureSetGmm(const HClassTrainData& ClassTrainDataHandle, const HString& SelectionMethod, const HString& GenParamName, double GenParamValue, HTuple* Score)
HTuple HClassGmm::SelectFeatureSetGmm(const HClassTrainData& ClassTrainDataHandle, const char* SelectionMethod, const char* GenParamName, double GenParamValue, HTuple* Score)
HTuple HClassGmm::SelectFeatureSetGmm(const HClassTrainData& ClassTrainDataHandle, const wchar_t* SelectionMethod, const wchar_t* GenParamName, double GenParamValue, HTuple* Score)
(Windows only)
HClassGmm HClassTrainData::SelectFeatureSetGmm(const HString& SelectionMethod, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const
HClassGmm HClassTrainData::SelectFeatureSetGmm(const HString& SelectionMethod, const HString& GenParamName, double GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const
HClassGmm HClassTrainData::SelectFeatureSetGmm(const char* SelectionMethod, const char* GenParamName, double GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const
HClassGmm HClassTrainData::SelectFeatureSetGmm(const wchar_t* SelectionMethod, const wchar_t* GenParamName, double GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const
(Windows only)
static void HOperatorSet.SelectFeatureSetGmm(HTuple classTrainDataHandle, HTuple selectionMethod, HTuple genParamName, HTuple genParamValue, out HTuple GMMHandle, out HTuple selectedFeatureIndices, out HTuple score)
HTuple HClassGmm.SelectFeatureSetGmm(HClassTrainData classTrainDataHandle, string selectionMethod, HTuple genParamName, HTuple genParamValue, out HTuple score)
HTuple HClassGmm.SelectFeatureSetGmm(HClassTrainData classTrainDataHandle, string selectionMethod, string genParamName, double genParamValue, out HTuple score)
HClassGmm HClassTrainData.SelectFeatureSetGmm(string selectionMethod, HTuple genParamName, HTuple genParamValue, out HTuple selectedFeatureIndices, out HTuple score)
HClassGmm HClassTrainData.SelectFeatureSetGmm(string selectionMethod, string genParamName, double genParamValue, out HTuple selectedFeatureIndices, out HTuple score)
Description
select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmmSelectFeatureSetGmmselect_feature_set_gmm
selects an optimal subset from a set of
features to solve a given classification problem.
The classification problem has to be specified with annotated training data
in ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandleclass_train_data_handle
and will be classified by a
Gaussian Mixture Model. Details of the properties of this
classifier can be found in create_class_gmmcreate_class_gmmCreateClassGmmCreateClassGmmCreateClassGmmcreate_class_gmm
.
The result of the operator is a trained classifier that is returned in
GMMHandleGMMHandleGMMHandleGMMHandleGMMHandlegmmhandle
. Additionally, the list of indices or names of
the selected features
is returned in SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndicesselected_feature_indices
. To use this classifier,
calculate for new input data all features mentioned in
SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndicesselected_feature_indices
and pass them to the classifier.
A possible application of this operator can be a comparison of
different parameter sets for certain feature extraction techniques. Another
application is to search for a feature that is discriminating between
different classes.
To define the features that should be selected from
ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandleclass_train_data_handle
, the dimensions of the
feature vectors in ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandleclass_train_data_handle
can be grouped into
subfeatures by calling set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_data
.
A subfeature can contain several subsequent elements of a feature vector.
select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmmSelectFeatureSetGmmselect_feature_set_gmm
decides for each of these subfeatures,
if it is better to use it for the classification or leave it out.
The indices of the selected subfeatures are returned in
SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndicesselected_feature_indices
.
If names were set in set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_data
, these
names are returned instead of the indices.
If set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_data
was not called for
ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandleclass_train_data_handle
before, each element of the feature vector
is considered as a subfeature.
The selection method
SelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethodselection_method
is either a greedy search 'greedy'"greedy""greedy""greedy""greedy""greedy"
(iteratively add the feature with highest gain)
or the dynamically oscillating search 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating"
(add the feature with highest gain and test then if any of the already added
features can be left out without great loss).
The method 'greedy'"greedy""greedy""greedy""greedy""greedy" is generally preferable, since it is faster.
Only in cases when the subfeatures are low-dimensional or redundant,
the method 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" should be chosen.
The optimization criterion is the classification rate of
a two-fold cross-validation of the training data.
The best achieved value is returned in ScoreScoreScoreScorescorescore
.
The following generic parameters can be set in GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name
and
GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value
:
- 'min_centers'"min_centers""min_centers""min_centers""min_centers""min_centers":
-
Minimal number of clusters to represent a class in the training data.
Possible values: '1'"1""1""1""1""1", '2'"2""2""2""2""2"
Default value: '1'"1""1""1""1""1"
- 'max_center'"max_center""max_center""max_center""max_center""max_center":
-
Maximal number of clusters to represent a class in the training data.
Possible values: '1'"1""1""1""1""1", '5'"5""5""5""5""5", '10'"10""10""10""10""10"
Default value: '1'"1""1""1""1""1"
- 'covar_type'"covar_type""covar_type""covar_type""covar_type""covar_type":
-
Type of the covariance to represent the size of a cluster.
Possible values: 'spherical'"spherical""spherical""spherical""spherical""spherical", 'diag'"diag""diag""diag""diag""diag",
'full'"full""full""full""full""full"
Default value: 'spherical'"spherical""spherical""spherical""spherical""spherical"
- 'random_seed'"random_seed""random_seed""random_seed""random_seed""random_seed":
-
Random seed.
Default value: '42'"42""42""42""42""42"
- 'threshold'"threshold""threshold""threshold""threshold""threshold":
-
Training threshold.
Default value: '0.001'"0.001""0.001""0.001""0.001""0.001"
- 'regularize'"regularize""regularize""regularize""regularize""regularize":
-
Regularization value.
Default value: '0.0001'"0.0001""0.0001""0.0001""0.0001""0.0001"
- 'randomize'"randomize""randomize""randomize""randomize""randomize":
-
Randomize the input vector.
Default value: '0'"0""0""0""0""0"
- 'class_priors'"class_priors""class_priors""class_priors""class_priors""class_priors":
-
Mode to determine the a-priori probabilities of the classes.
Possible values: 'training'"training""training""training""training""training", 'uniform'"uniform""uniform""uniform""uniform""uniform"
Default value: 'training'"training""training""training""training""training"
A more exact description of those parameters can be found in
create_class_gmmcreate_class_gmmCreateClassGmmCreateClassGmmCreateClassGmmcreate_class_gmm
and train_class_gmmtrain_class_gmmTrainClassGmmTrainClassGmmTrainClassGmmtrain_class_gmm
.
Attention
This operator may take considerable time, depending on the size of the
data set in the training file, and the number of features.
Please note, that this operator should not be called, if only a small
set of training data is available. Due to the risk of overfitting the
operator select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmmSelectFeatureSetGmmselect_feature_set_gmm
may deliver a classifier with
a very high score. However, the classifier may perform poorly when tested.
Execution Information
- Multithreading type: reentrant (runs in parallel with non-exclusive operators).
- Multithreading scope: global (may be called from any thread).
- Automatically parallelized on internal data level.
This operator returns a handle. Note that the state of an instance of this handle type may be changed by specific operators even though the handle is used as an input parameter by those operators.
Parameters
ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandleclass_train_data_handle
(input_control) class_train_data →
HClassTrainData, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)
Handle of the training data.
SelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethodselection_method
(input_control) string →
HTuplestrHTupleHtuple (string) (string) (HString) (char*)
Method to perform the selection.
Default value:
'greedy'
"greedy"
"greedy"
"greedy"
"greedy"
"greedy"
List of values: 'greedy'"greedy""greedy""greedy""greedy""greedy", 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating"
GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name
(input_control) string(-array) →
HTupleMaybeSequence[str]HTupleHtuple (string) (string) (HString) (char*)
Names of generic parameters to configure
the classifier.
Default value: []
List of values: 'class_priors'"class_priors""class_priors""class_priors""class_priors""class_priors", 'covar_type'"covar_type""covar_type""covar_type""covar_type""covar_type", 'max_center'"max_center""max_center""max_center""max_center""max_center", 'min_centers'"min_centers""min_centers""min_centers""min_centers""min_centers", 'random_seed'"random_seed""random_seed""random_seed""random_seed""random_seed", 'randomize'"randomize""randomize""randomize""randomize""randomize", 'regularize'"regularize""regularize""regularize""regularize""regularize", 'threshold'"threshold""threshold""threshold""threshold""threshold"
GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value
(input_control) number(-array) →
HTupleMaybeSequence[Union[int, str, float]]HTupleHtuple (real / integer / string) (double / int / long / string) (double / Hlong / HString) (double / Hlong / char*)
Values of generic parameters to configure
the classifier.
Default value: []
Suggested values: 1, 2, 3, 'spherical'"spherical""spherical""spherical""spherical""spherical", 'diag'"diag""diag""diag""diag""diag", 'full'"full""full""full""full""full", 42, 0.001, 0.0001, 0
GMMHandleGMMHandleGMMHandleGMMHandleGMMHandlegmmhandle
(output_control) class_gmm →
HClassGmm, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)
A trained GMM classifier using only the selected
features.
SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndicesselected_feature_indices
(output_control) string-array →
HTupleSequence[str]HTupleHtuple (string) (string) (HString) (char*)
The selected feature set, contains indices or names.
ScoreScoreScoreScorescorescore
(output_control) real-array →
HTupleSequence[float]HTupleHtuple (real) (double) (double) (double)
The achieved score using two-fold cross-validation.
Example (HDevelop)
* Find out which of the two features distinguishes two Classes
NameFeature1 := 'Good Feature'
NameFeature2 := 'Bad Feature'
LengthFeature1 := 3
LengthFeature2 := 2
* Create training data
create_class_train_data (LengthFeature1+LengthFeature2,\
ClassTrainDataHandle)
* Define the features which are in the training data
set_feature_lengths_class_train_data (ClassTrainDataHandle, [LengthFeature1,\
LengthFeature2], [NameFeature1, NameFeature2])
* Add training data
* |Feat1| |Feat2|
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1, 2,1 ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2, 2,1 ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1, 3,4 ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2, 3,4 ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [0,0,1, 5,6 ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,3,2, 5,6 ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [0,0,1, 5,6 ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,3,2, 5,6 ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [0,0,1, 5,6 ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,3,2, 5,6 ], 1)
* Add more data
* ...
* Select the better feature with a GMM
select_feature_set_gmm (ClassTrainDataHandle, 'greedy', [], [], GMMHandle,\
SelectedFeatureGMM, Score)
* Use the classifier
* ...
Result
If the parameters are valid, the operator select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmmSelectFeatureSetGmmselect_feature_set_gmm
returns the value TRUE. If necessary, an exception is raised.
Possible Predecessors
create_class_train_datacreate_class_train_dataCreateClassTrainDataCreateClassTrainDataCreateClassTrainDatacreate_class_train_data
,
add_sample_class_train_dataadd_sample_class_train_dataAddSampleClassTrainDataAddSampleClassTrainDataAddSampleClassTrainDataadd_sample_class_train_data
,
set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_data
Possible Successors
classify_class_gmmclassify_class_gmmClassifyClassGmmClassifyClassGmmClassifyClassGmmclassify_class_gmm
Alternatives
select_feature_set_mlpselect_feature_set_mlpSelectFeatureSetMlpSelectFeatureSetMlpSelectFeatureSetMlpselect_feature_set_mlp
,
select_feature_set_knnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnnSelectFeatureSetKnnselect_feature_set_knn
,
select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvmSelectFeatureSetSvmselect_feature_set_svm
See also
create_class_gmmcreate_class_gmmCreateClassGmmCreateClassGmmCreateClassGmmcreate_class_gmm
,
gray_featuresgray_featuresGrayFeaturesGrayFeaturesGrayFeaturesgray_features
,
region_featuresregion_featuresRegionFeaturesRegionFeaturesRegionFeaturesregion_features
Module
Foundation