R – What does seed do in random forest

machine learningrrandom-forest

I know that seed is set in general is used so that we can reproduce the same result. But, what does setting up the seed actually do in random forest part. Does it change any of the arguments of randomForest() function in R like nTree or sampSize.

I am using different seeds for my random forest model each time, but want to know how different seeds affect a random forest model.

Best Answer

Trees grow from seeds and so do forests ;-) (scnr)

There are different ways to built a random forest, however, all in common is that multiple trees are built. To improve classification accuracy over a single decision tree, the individual trees in a random forest need to differ, as you would have nTree times the same tree. This difference is achieved by introducing randomness in the generation of the trees. The randomness is influenced by the seed and what is most important about the seed is that using the same seed should always generate the same result.

How does the randomness influence the tree build? There are multiple ways. - build the tree for a random subset. This is for each individual tree of the forest a subset of training example are drawn and then a tree is build for this subset - at each decision point in the tree, the decision attribute is selected randomly.

Often these two elements are combined.

http://link.springer.com/article/10.1023%2FA%3A1010933404324#page-1

Related Topic