Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
SVM parameter tuning and number of SVs (Matlab libsvm)
30-10-2009, 04:12 PM (This post was last modified: 02-11-2009 11:17 AM by Scribe.)
Post: #1
SVM parameter tuning and number of SVs (Matlab libsvm)
Hello,

Several questions have popped during some classification experiments I have been running lately. I tried to follow the guidelines in svmlib documentation about how applying SVMs (for beginners, which I still am), but these questions were not answered.

Some info about the data
Input data has 35 features (dim), 945 points, 5 classes (all points labellled). Data is split in half randomly for training and testing set (random selection is done class by class).

Q1
Feature selection and SVM parameter tuning (C, gamma) have to be done but in which order ? I started with cross-validation on training set to choose C and gamma, using all the features. Then feature selection was performed, using my own implementation of Sequential Forward Selection using cross-validation accuracy returned by svmtrain.

Q2
I chose the 10 best features out of 35, and trained again SVM with that set of features and the chosen C, gamma parameters. Training and testing accuracy are between 75%-80% (satisfactory for my application), but I end up with about half of training points as support vectors. Is that too many Support Vectors in the model ? Is there a way/rule of thumb to tell whether the model is overfitting - the consistency between training and testing accuracy doesn't suggest that there is overfitting.

Thanks in advance for your help.

French-speaking people in Finland
Find all posts by this user
Quote this message in a reply
03-11-2009, 10:36 AM
Post: #2
RE: SVM parameter tuning and number of SVs (Matlab libsvm)
229 views and no replies, was my question so stupid ?RolleyesTongue

French-speaking people in Finland
Find all posts by this user
Quote this message in a reply
03-11-2009, 12:12 PM
Post: #3
RE: SVM parameter tuning and number of SVs (Matlab libsvm)
No, your questions are very interesting, but they haven't evident answers. By the way, question to you - what was Training and Testing accuracy before your reduction of feature amounts? Do you see significant improvment for testing accuracy, did you see early (for 35 features) large difference between training and validation sets accuracy?
About your questions:
1. You can use PCA (or your private method) before SVM using, but also can use SVM results for feature amount reduction (e.g., it is very simple for Linear Kernel using). But I don't understand (see my above question) - is it really necessary to reduce amount of features. For my point of view, it isn't large value for comparison with amoint of points (35/945), but is is also depends of each class positive/negative (imbalanced or not).
2. "Half of training points as support vectors. Is that too many Support Vectors in the model?". Generally speaking - no, it is Okay. But it is depends of your real situation and kernel type .
Find all posts by this user
Quote this message in a reply
03-11-2009, 02:59 PM
Post: #4
RE: SVM parameter tuning and number of SVs (Matlab libsvm)
Thanks Sasha for your time and answers.

Sasha Wrote:By the way, question to you - what was Training and Testing accuracy before your reduction of feature amounts? Do you see significant improvment for testing accuracy, did you see early (for 35 features) large difference between training and validation sets accuracy?

Using my own Sequential Forward Feature Selection algorithm based on SVM (gaussian kernel), I plotted a curve of cross-validation testing error vs number of features. It looked as taken from a textbook on feature selection, strong increase of accuracy with first features, no significant improvement after 10 features. We chose the 10 or 11 "best" features after that experiment.

PCA is excluded in this application. The input features have physical meaning, we can keep some and discard others, but recombining them or finding PCA is not an option.

Still about feature selection, a conceptual question. It seems to me that one should not select features with a given classifier (say Linear Discriminant Analysis LDA), then train another classifier (here SVM with gaussian kernel) with the selected features. What would justify that features that are good for LDA are good for SVM ? In a first series of experiments features were selected in SPSS but they did not give the best results on my SVM classifier. Which makes me thinks it is better to use the same classification scheme for feature selection and model building, is it correct or irrelevant ?

Quote:is it really necessary to reduce amount of features. For my point of view, it isn't large value for comparison with amoint of points (35/945), but is is also depends of each class positive/negative (imbalanced or not).

Precisely, I forgot to mention the classes are imbalanced. First three classes have roughly equal number of samples, and constitute most of the data. Two remaining classes may have 20-30 samples in the training set.

Quote:"Half of training points as support vectors. Is that too many Support Vectors in the model?". Generally speaking - no, it is Okay. But it is depends of your real situation and kernel type .

Kernel type was gaussian, with C = 256 and g = 2^(-8). Data has been scaled to [-1 1].

French-speaking people in Finland
Find all posts by this user
Quote this message in a reply
03-11-2009, 03:22 PM
Post: #5
RE: SVM parameter tuning and number of SVs (Matlab libsvm)
Hi, there are two types of feature selection; filter and wrapper. What you did with SPSS was a filter and with SVM CV was wrapper (if I got your point). Wrappers are obviously favourable if time is not an issue. Another option would be to use Recursive Feature Elimination (SVM-RFE) which eliminates least important features in a recursive fashion. But for this you need to use linear kernel. If you want to stick with gaussians then I will say use wrappers. You can also use something randomized algorithm (e.g. Genetic Algorithm) for that. Look at Yiming Yang's publictaions on feature selection to see how different strategies affect SVM (e.g. http://nyc.lti.cs.cmu.edu/yiming/Publications/icml97.ps ).
Find all posts by this user
Quote this message in a reply
03-11-2009, 10:30 PM (This post was last modified: 06-11-2009 01:24 PM by Scribe.)
Post: #6
RE: SVM parameter tuning and number of SVs (Matlab libsvm)
Thanks a bunch kap, I will look into this.

French-speaking people in Finland
Find all posts by this user
Quote this message in a reply
04-11-2009, 05:33 PM
Post: #7
RE: SVM parameter tuning and number of SVs (Matlab libsvm)
Hello,
Can anyone tell me which kernels is best to be used for support vector machine and their limitation. If possible pls tell me where to find these informations.....
Find all posts by this user
Quote this message in a reply
Post Reply 


Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  How to convert other data formats to LIBSVM format? Amherstclane 0 61 13-11-2009 10:58 AM
Last Post: Amherstclane
Sad How to install libsvm-mat??urgent...please help Shamim 5 510 09-11-2009 09:48 AM
Last Post: nikgary
Question Convert CSV to LIBSVM Dale 1 186 09-11-2009 05:13 AM
Last Post: olivekenz
  How to determine the range of the parameter(c, g)(libsvm)? wangyoang 5 574 02-11-2009 02:30 PM
Last Post: kap
  A question about using libsvm in MTALAB  for regression. mycrew 9 3,458 30-10-2009 09:48 AM
Last Post: wangyoang
  Find good C and gamma parameter in Libsvm beastie666 3 630 22-10-2009 08:27 AM
Last Post: Edikut
  Are gaussian kernel SVM accuracy affected by number of class to be differentiated?? sukalpa 1 456 24-09-2009 10:06 PM
Last Post: kap
  parameter selection?? no change uny 3 653 13-09-2009 06:40 PM
Last Post: kap
Toungue how to use multi-class classification as for libSVM? myml09 10 2,053 13-09-2009 03:28 AM
Last Post: myml09
  Parameter selection LibSVM MrOracle 1 742 24-08-2009 04:06 PM
Last Post: kap

Forum Jump: