Skip to content

Configs

Przemysław edited this page Apr 12, 2019 · 6 revisions

Configuration Parameters

Changing options

  • All options can be set in the configuration file.
  • Options that are not read only during the program run can be changed using the conf command

List of options

TrainingFileName

  • Path to the file with training data.
  • The only obligatory option in the configuration file.
  • Read only for every run command.

TestFileName

  • Path to the file with test data.
  • Not obligatory in the configuration file and for a run.
    • If empty program will split the training data with the ratio set in the TestExtractPercentage option .
  • Read only for every run command.

OutputFolder

  • Path to s place for saving results by the save command
  • Not obligatory in the configuration file and for a run.
    • If not provided the results will be written only basing on the save command path info
  • Can be changed for every run command.

TreeCount

  • Number of trees to be generated in a forest.
  • Not obligatory in the configuration file.
  • Cannot be less than 1.
  • Can be changed for every run command.

MinSplitCount

  • How many elements a tree node needs to contain to allow it to be split for two leaves.
  • Not obligatory in the configuration file.
  • Cannot be less than 2.
    • Be assure that this value do not exceed the number of training data, as it can result in infinite loop[TO BE CHANGED].
  • Can be changed for every run command.

MinElemsInLeaf

  • What is the minimum number of elements a tree node/leaf can have.
  • Not obligatory in the configuration file.
  • Cannot be less than 1.
    • Be assure that this value do not exceed the number of training data, as it can result in infinite loop[TO BE CHANGED].
  • Can be changed for every run command.

MaxFeaturesPerNode

  • How many features may be used in splitting each node.
  • Not obligatory in the configuration file.
  • Cannot be less than 1.
    • Can be set for more than features provided; however algorithm will only take distinct features anyway.
  • Can be changed for every run command.

MaxDeepness

  • How deep a forest can be.
  • Not obligatory in the configuration file.
  • Cannot be less than 1.
  • Can be changed for every run command.

TestExtractPercentage

  • How many percentage of elements from training data will be extracted and used for testing purposes.
    • If the TestFileName is provided this option is not active.
  • Not obligatory in the configuration file.
  • Cannot be less than 1; cannot be more than 50.
  • Can be changed for every run command.

ForceTestExtract

  • Even if the TestFileName is set, algorithm will merge both bank of data and the TestExtractPercentage will be used for extracting testing data.
    • Be aware that it cannot be undone during the program run!. The testing data from testing file name will be merged with training data to the end of the program.
  • Not obligatory in the configuration file.
  • Can be set to 0 (off) or to 1(on).
  • Can be changed for every run command.

VotingType

  • if set to 0, we sum up the probabilities from the leaves of each tree that where choose during classification of a element. More about summing up probabilities here
  • if set to 1, we take 1 point for the most common class in leaf, and 0 for the others, and then we sum up all the points from all the trees. A class with the most points will be set for a element under classification.
  • Not obligatory in the configuration file.
  • Can be set to 0 (summing) or to 1(voting).
  • Can be changed for every run command.

CvType (Cross-Validation Type)

  • if set to 0 it is off.
  • if set to 1 it will make the LOO Cross-Validation for training data.
  • if set between 2 or 10 inclusively it will make the k-fold Cross-Validation for training data
  • Be aware that during a Cross-Validation run, data order is changed from the original order.
  • Not obligatory in the configuration file.
  • Can be changed for every run command.

Default values

Default values can be found here