printlogo
http://www.ethz.ch/index_EN
Institute of Microbiology
 
print
  

N - SVM-window: Training a Model

>> Output >> SVM

A new graphical user interface opens. This window deals with all training issues and the settings of the SVM.

Objects trained in current session

Box in the upper left corner. This refers to objects trained in this session of Enhanced CellClassifier (in Enhanced CellClassifier also sometimes referred to as "original cells"); loaded objects from previous CC sessions are dealt with in the right part of the SVM window.

Generally objects are divided into a Training population and a Validation population. If a model is trained the information from the objects of the training population is used, the model is validated using objects from the validation population.

Dividing objects of the current session into these two populations can be done using the big slider, or by entering a number in the field: "Number Objects Training". The box "Number Objects Validation" cannot be modified; Enhanced CellClassifier uses all remaining objects not used for training. Per default, 75% of the objects are used for training, 25% for validation.

All objects from the current session can be completely disabled with the respective checkbox. Training and validation is now limited to objects loaded into the boxes in the right part of the SVM-window.

Other Objects

BROWSE: CC allows loading 10 different sets of objects already trained and saved in the CC main window. Please use the browse button to select the respective files of saved cells.

CLEAR: to clear already loaded sets of objects. Objects need to be cleared before a new set can be loaded.

Training and Validation windows: Please enter the number objects you want to use for training and for validation. Invalid entries are not accepted.

The total number of objects selected for training and validation are summarized in the windows in the lower left corner.

TRAIN

Training and Validation set: The objects of the current session (box in the upper left of the window) and loaded objects (10 groups of boxes in the right part of the SVM-window). The training and validation set, respectively, is the sum of both: the selected objects from the current session and the loaded objects.

After training, a figure is generated, containing 3x3 plots. The three plots in the upper row above inform about the training population and training accuracy; the 3 plots in the middle about the validation objects and validation accuracy.

The plots to the left show the distribution of the 5 possible classes among the objects (the absolute number of objects is plotted). In the plots in the middle, the model is used to predict the objects of the respective classes. The percentage of correctly predicted objects is plotted in green, incorrectly predicted in red. On the right, the confusion matrix is plotted. CC shows for each class, what class (correctly or incorrectly) the model predicts each object to be. The most important information is contained in the plot in the middle above = training accuracy and in the plot in the center = validation accuracy.

The image can now be saved. The model should be given a name; this name will later be displayed in the "SVM"-window and the "SaveExcelFile" window. After the name is entered, the model is available in other CC-windows.

TEST

"TEST" requires a model to be available (either directly trained or loaded). This model is now used to test the "objects from the current session" and all other loaded object populations. For each object population a new image is generated. This image shows in several plots: 1) the distribution of the classes for the whole object population, 2) the percentage of correctly and incorrectly predicted objects (green and red, respectively), 3) the percentage of correctly and incorrectly predicted objects as numbers, 4) the confusion matrix as a color matrix, 5) the confusion matrix as numbers - percentage of input, and 6) the confusion matrix as absolute numbers. All images can be saved.

Return

Return closes the SVM-window and returns to the main Enhanced CellClassifier window. The user is asked, whether the current model and other information should be kept. If the answer is yes, the model is passed back to main window and is also available in other CC-windows.

Note: Without closing the SVM-window the model is not anywhere outside the SVM-window. In CC no information is passed between simultaneously open windows.

Load and Save Model

>> Model >> Load Model

>> Model >> Save Model

Any trained model can be loaded and saved. Before loading, CC checks, whether the object features (selected CellProfiler measurements) are compatible with the object features of the current session. An incompatible model is not loaded.

Parameter optimization

>> Model >> Parameter Optimization

The SVM training procedure can be influenced by 2 parameters: C and gamma. Training success critically depends on the optimal selection of these 2 parameters (for a more detailed discussion see: www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, Enhanced CellClassifier uses an RBF kernel = radial basis function kernel).

For a model to be useful, it needs to predict unknown objects correctly (objects which have not been used for training). One commonly used way is 5-fold cross-validation. Thereby the data set is divided in a training set (80% of objects) and a validation set (20% of objects). The model is trained with the first object population, and later used to predict the other. This is done five times; the average of the percent correctly predicted cells is the 5-fold cross-validation (5x-cv) accuracy.

For "Parameter optimization" CC takes the selected training objects and performs a grid search for the best C- and gamma values for the 5x-cv.

CC asks for the lower and upper values of C and gamma for the search and the step size (as log2 values). The results of the grid search are presented in three ways: as a heat map, as lines and as a surface. The maximum 5x-cv value is selected, and the respective parameters are displayed.

CC now allows the user to save these parameters; they are now displayed at the SVM-window. They are saved in the settings file (after choosing "Save Settings" in the CC-"Main window"). Reasonable C and gamma-values could also be entered as default values in the "Define Settings" window.

Please not that an extensive grid search with many cells might take a long time. It might be reasonable to start with a crude grid and refine in a promising region.

Enter Parameters C and gamma

>> Model >>à Enter Parameters C and gamma

Parameters C and gamma (perhaps calculated in a previous session with similar objects) can also be entered directly.

Adjust weights for classes

>> Model >> Adjust weights for classes

If your training set is unbalanced (i.e. it contains much more objects of one type than of any other), after training objects of the classes with few objects might not be sufficiently well reflected in the resulting model. Adjusting weights for classes might help in these instances. Default values for weight are 1. Values larger than 1 give objects of the respective class a higher weight, values lower than one a lower weight. Technically speaking, the punishment for the classifier for misclassification is increased or decreased.

Object number optimization

>> Model >> Object Number Optimization

How many objects should at least be trained to calculate a robust model? With this function CC calculates the 5-fold cross-validation accuracy (see above) as a function of the numbers of objects added.

First you can decide on how many different conditions (object numbers) you want to test, useful numbers are between 5 and 20 (a higher number is not accepted). After this you can decide on the number of objects for each condition, CC makes some suggestions. Now CC randomly selects objects out of the training set for each group and calculates the 5x-cv accuracy. The object number when the 5x-cv accuracy becomes stable is an estimate for the lower number of objects needed for a useful classification. Please do this test several times and interpret results with caution.

Feature Elimination

>> Model >> Feature_Elimination

CC loops over all selected features, and CC tests the quality of the model without this feature. The feature after whose elimination the model is best had been the least useful and will be eliminated. If several features give the same 5x-cv results the choice is made randomly. Points are given, if the feature survives the elimination process for a long time.

First you have a choice between 5x-cv (see above) and simple predictions for testing the quality of the model. After this you can decide on the number rounds of testing (how of the test will be repeated) and the number of features will be displayed.

High points indicate useful features, low points indicate features which can possibly be eliminated (for instance with the data usage mask, settings file).

Caution: this test might take a very long time for large cell numbers. Moreover, results are not always conclusive. This feature will be replaced in the future by something more useful.

Plot Histograms

>> Model >> Plot Histograms

Each object has been measured by CellProfiler. Sometimes it is useful to visualize differences of all measurements for different classes. This allows for identification of useful (or useless) features for the "Data usage mask" and provides a feedback for the performance of the currently used CellProfiler object measurements. Histograms of the data from the currently selected Training objects (closed line) and Validation objects (dashed line) are plotted, to visualize differences. Each class gets a different color. All images can be saved.

Data usage mask

>> Model >> Data usage mask

This feature allows for exclusion of apparently useless object measurement features while training the model. An indication for a useless feature might be a complete overlap of the curves for two classes in the "Plot Histograms" window. To exclude features, simply uncheck the boxes with the names of the features. The model then needs to be re-trained. Excluding features accelerates training.

Fuse Training Objects

>> Objects >> Fuse Training Objects

All cell groups (object groups) which have been loaded in the 10 slots at the right part of the SVM-window will be fused (combined) to one group of cells (objects). It does not matter, whether cells have been selected for training or validation. Original cells (upper right in the window) are ignored for the fusion process. For the fusion, the objects will first be reloaded, after this combined object per object. After this, the combined objects can be saved under a new name.

Fuse cells allows the user to overcome the limit of 10 different sets of objects that can be loaded in the SVM-window. Moreover, combined cells can be compared as a group with other trained cell groups (for instance with Test).

Note: the combined (fused) objects cannot be loaded with Load cells in the main window; they can only be loaded into the SVM-window.

Export to WEKA

>> Objects >> Fuse Training Objects

WEKA is an open source program that has several classifiers incorporated. It allows for a comparison of the performance of the different classifiers, features selection and much more. After selecting that option you can chose a file name, after that the currently selected training set will be exported as a text file which one can import into WEKA.

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2014 Microbiology ETH Zürich | Imprint | Disclaimer | 23 December 2009
top