Use of webpage interface for C5.0

  1. Upload positive and negative files and choose flags for training. Here, positive and negative are files of equal length amino acid sequences (e.g. sequences like AILYR of length 5). Sequences appear in the positive and negative files, one per line. In addition to the 20 single letter amino acid codes, you may have X (uncertain code). The scripts could be modified (but are not yet) to allow ? in place of X.
  2. When you train on the positive and negative files, you must NOT select the rules option (-r), though you can boost. Additionally, you should not select cross validation, since the decision trees are not saved.
  3. On the training webpage, you can perform cross validation, boosting, and output confusion matrices to see the algorithm's performance. Read tutorial for more examples of C5.0 use.
  4. To use C5.0 for prediction, use the test webpage. Here you upload a test file, which is in the same format as the training file; i.e. a single file consisting of sequences, all of the same length as the sequences in the training files.