select_feature_set_svmT_select_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvm (Operator)
Name
select_feature_set_svmT_select_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvm — Selects an optimal combination of features to classify the provided data.
Signature
void SelectFeatureSetSvm(const HTuple& ClassTrainDataHandle, const HTuple& SelectionMethod, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* SVMHandle, HTuple* SelectedFeatureIndices, HTuple* Score)
HTuple HClassSvm::SelectFeatureSetSvm(const HClassTrainData& ClassTrainDataHandle, const HString& SelectionMethod, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* Score)
HTuple HClassSvm::SelectFeatureSetSvm(const HClassTrainData& ClassTrainDataHandle, const HString& SelectionMethod, const HString& GenParamName, double GenParamValue, HTuple* Score)
HTuple HClassSvm::SelectFeatureSetSvm(const HClassTrainData& ClassTrainDataHandle, const char* SelectionMethod, const char* GenParamName, double GenParamValue, HTuple* Score)
HTuple HClassSvm::SelectFeatureSetSvm(const HClassTrainData& ClassTrainDataHandle, const wchar_t* SelectionMethod, const wchar_t* GenParamName, double GenParamValue, HTuple* Score)  
            (Windows only)
          
HClassSvm HClassTrainData::SelectFeatureSetSvm(const HString& SelectionMethod, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const
HClassSvm HClassTrainData::SelectFeatureSetSvm(const HString& SelectionMethod, const HString& GenParamName, double GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const
HClassSvm HClassTrainData::SelectFeatureSetSvm(const char* SelectionMethod, const char* GenParamName, double GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const
HClassSvm HClassTrainData::SelectFeatureSetSvm(const wchar_t* SelectionMethod, const wchar_t* GenParamName, double GenParamValue, HTuple* SelectedFeatureIndices, HTuple* Score) const  
            (Windows only)
          
 
static void HOperatorSet.SelectFeatureSetSvm(HTuple classTrainDataHandle, HTuple selectionMethod, HTuple genParamName, HTuple genParamValue, out HTuple SVMHandle, out HTuple selectedFeatureIndices, out HTuple score)
HTuple HClassSvm.SelectFeatureSetSvm(HClassTrainData classTrainDataHandle, string selectionMethod, HTuple genParamName, HTuple genParamValue, out HTuple score)
HTuple HClassSvm.SelectFeatureSetSvm(HClassTrainData classTrainDataHandle, string selectionMethod, string genParamName, double genParamValue, out HTuple score)
HClassSvm HClassTrainData.SelectFeatureSetSvm(string selectionMethod, HTuple genParamName, HTuple genParamValue, out HTuple selectedFeatureIndices, out HTuple score)
HClassSvm HClassTrainData.SelectFeatureSetSvm(string selectionMethod, string genParamName, double genParamValue, out HTuple selectedFeatureIndices, out HTuple score)
 
Description
select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvmSelectFeatureSetSvm selects an optimal subset from a set of
features to solve a given classification problem.
The classification problem has to be specified with annotated training data 
in ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle and will be classified by a 
support vector machine (SVM). Details of the properties of this 
classifier can be found in create_class_svmcreate_class_svmCreateClassSvmCreateClassSvmCreateClassSvm. 
The result of the operator is a trained classifier that is returned in 
SVMHandleSVMHandleSVMHandleSVMHandleSVMHandle. Additionally, the list of indices or names of the 
selected features 
is returned in SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices. To use this classifier, 
calculate for new input data all features mentioned in 
SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices and pass them to the classifier.
A possible application of this operator can be a comparison of 
different parameter sets for certain feature extraction techniques. Another
application is to search for a feature that is discriminating between 
different classes.
Additionally, the values for 'nu'"nu""nu""nu""nu" and 
'gamma'"gamma""gamma""gamma""gamma" can be estimated for the SVM. To only estimate these 
two parameters without altering the feature set, 
the feature vector has to be specified as one large subfeature.
To define the features that should be selected from
ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle,  the dimensions of the 
feature vectors in ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle can be grouped into 
subfeatures by calling set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData. 
A subfeature can contain several subsequent elements of a feature vector.
The operator decides for each of these subfeatures, if it is better to 
use it for the classification or leave it out.
The indices of the selected subfeatures are returned in 
SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices. 
If names were set in set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData, these
names are returned instead of the indices.
If set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData was not called for 
ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle before, each element of the feature vector 
is considered as a subfeature. 
The selection method 
SelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethod is either a greedy search 'greedy'"greedy""greedy""greedy""greedy"
(iteratively add the feature with highest gain)
or the dynamically oscillating search 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating"
(add the feature with highest gain and test then if any of the already added 
features can be left out without great loss).
The method  'greedy'"greedy""greedy""greedy""greedy" is generally preferable, since it is faster.
Only in cases when the subfeatures are low-dimensional or redundant,
the method 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" should be chosen.
The optimization criterion is the classification rate of 
a two-fold cross-validation of the training data.  
The best achieved value is returned in ScoreScoreScoreScorescore.
The parameters 'nu'"nu""nu""nu""nu" and 'gamma'"gamma""gamma""gamma""gamma" for the SVM that is used 
to classify can be set to 'auto'"auto""auto""auto""auto" by using the 
parameters GenParamNameGenParamNameGenParamNameGenParamNamegenParamName and GenParamValueGenParamValueGenParamValueGenParamValuegenParamValue. If they are 
set to 'auto'"auto""auto""auto""auto", the estimated optimal 'nu'"nu""nu""nu""nu" and/or
'gamma'"gamma""gamma""gamma""gamma" is estimated. The automatic estimation of 'nu'"nu""nu""nu""nu" 
and 'gamma'"gamma""gamma""gamma""gamma" can take a substantial amount of time (up to days,
depending on the data set and the number of features).
Additionally, there 
is the parameter 'mode'"mode""mode""mode""mode" which can be either set to 
'one-versus-all'"one-versus-all""one-versus-all""one-versus-all""one-versus-all"  or 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one". An explanation of 
the two modes as well as of the parameters 'nu'"nu""nu""nu""nu" and 
'gamma'"gamma""gamma""gamma""gamma" as the kernel parameter of the radial basis function (RBF)
kernel can be found in create_class_svmcreate_class_svmCreateClassSvmCreateClassSvmCreateClassSvm. 
Attention
This operator may take considerable time, depending on the size of the 
data set in the training file, and the number of features.
Please note, that this operator should not be called, if only a small
set of training data is available. Due to the risk of overfitting the 
operator select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvmSelectFeatureSetSvm may deliver a classifier with 
a very high score. However, the classifier may perfom poorly when tested.
Execution Information
  - Multithreading type: reentrant (runs in parallel with non-exclusive operators).
- Multithreading scope: global (may be called from any thread).
- Automatically parallelized on internal data level.
This operator returns a handle. Note that the state of an instance of this handle type may be changed by specific operators even though the handle is used as an input parameter by those operators.
Parameters
  
ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle (input_control)  class_train_data → HClassTrainData, HTupleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)
 
Handle of the training data.
 
  
SelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethod (input_control)  string → HTupleHTupleHtuple (string) (string) (HString) (char*)
 
Method to perform the selection.
Default value: 
    'greedy'
    "greedy"
    "greedy"
    "greedy"
    "greedy"
List of values: 'greedy'"greedy""greedy""greedy""greedy", 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating"
 
  
GenParamNameGenParamNameGenParamNameGenParamNamegenParamName (input_control)  string(-array) → HTupleHTupleHtuple (string) (string) (HString) (char*)
 
Names of generic parameters to configure the 
selection process and the classifier.
Default value: []
List of values: 'gamma'"gamma""gamma""gamma""gamma", 'mode'"mode""mode""mode""mode", 'nu'"nu""nu""nu""nu"
 
  
GenParamValueGenParamValueGenParamValueGenParamValuegenParamValue (input_control)  number(-array) → HTupleHTupleHtuple (real / integer / string) (double / int / long / string) (double / Hlong / HString) (double / Hlong / char*)
 
Values of generic parameters to configure the 
selection process and the classifier.
Default value: []
Suggested values: 0.02, 0.05, 'auto'"auto""auto""auto""auto", 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one", 'one-versus-all'"one-versus-all""one-versus-all""one-versus-all""one-versus-all"
 
  
SVMHandleSVMHandleSVMHandleSVMHandleSVMHandle (output_control)  class_svm → HClassSvm, HTupleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)
 
A trained SVM classifier using only the selected 
features.
 
  
SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices (output_control)  string-array → HTupleHTupleHtuple (string) (string) (HString) (char*)
 
The selected feature set, contains 
indices.
 
  
ScoreScoreScoreScorescore (output_control)  real-array → HTupleHTupleHtuple (real) (double) (double) (double)
 
The achieved score using two-fold cross-validation.
 
Example (HDevelop)
* Find out which of the two features distinguishes two Classes
NameFeature1 := 'Good Feature'
NameFeature2 := 'Bad Feature'
LengthFeature1 := 3
LengthFeature2 := 2
* Create training data
create_class_train_data (LengthFeature1+LengthFeature2,\
  ClassTrainDataHandle)
* Define the features which are in the training data
set_feature_lengths_class_train_data (ClassTrainDataHandle, [LengthFeature1,\
  LengthFeature2], [NameFeature1, NameFeature2])
* Add training data
*                                                         |Feat1| |Feat2|
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1,  2,1  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2,  2,1  ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1,  3,4  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2,  3,4  ], 1)
* Add more data 
* ...
* Select the better feature with a SVM
select_feature_set_svm (ClassTrainDataHandle, 'greedy', [], [], SVMHandle,\
  SelectedFeatureSVM, Score)
* Use the classifier
* ...
Result
If the parameters are valid, the operator select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvmSelectFeatureSetSvm
returns the value 2 (H_MSG_TRUE). If necessary, an exception is raised.
Possible Predecessors
create_class_train_datacreate_class_train_dataCreateClassTrainDataCreateClassTrainDataCreateClassTrainData, 
add_sample_class_train_dataadd_sample_class_train_dataAddSampleClassTrainDataAddSampleClassTrainDataAddSampleClassTrainData, 
set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData
Possible Successors
classify_class_svmclassify_class_svmClassifyClassSvmClassifyClassSvmClassifyClassSvm
Alternatives
select_feature_set_mlpselect_feature_set_mlpSelectFeatureSetMlpSelectFeatureSetMlpSelectFeatureSetMlp, 
select_feature_set_knnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnnSelectFeatureSetKnn, 
select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmmSelectFeatureSetGmm
See also
select_feature_set_trainf_svmselect_feature_set_trainf_svmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvm, 
gray_featuresgray_featuresGrayFeaturesGrayFeaturesGrayFeatures, 
region_featuresregion_featuresRegionFeaturesRegionFeaturesRegionFeatures
Module
Foundation