Decision Trees (part 2)
Ensemble learning is one of the most interesting and most used concepts in machine learning applied to finance. It consist of two main ideas:
Boosting (This concept we’ll cover in further posts)
This is a short term for short term bootstrap aggregation. In statistics, bootstrap means, sampling a subset of a data set. The main idea is to enhance the performance and reduce over-fitting. This can be done by training many models and then averaging their inference in a majority voting fashion.
Remember than when using Decision Trees models, this are more prone to over-fitting, than any other model, that’s why we use Ensemble Learning to reduce over-fitting.
Another important concept in Bagging is: Evaluation Metric, which is also called Out of Bag (OOB) score, this simply means that evaluating the performance of a given tree by testing it on remaining of n-1 data sets. The n-1 data sets are used for training other estimators. The idea is to use unseen samples.
Another advantage of Bagging is that estimators can be trained in parallel using a cloud of servers, this is very handy when facing calculations at a large scale.
So the previous concepts should have laid down the ground, and made your brain ready to absorb a new concept called Random Forests.
Subscribe to our Awesome Newsletter.
Building and implementing a Random Forest AlgorithmNow we'll be implementing a simple Random Forest algorithm into code using Python 3. Let's get started! Importing the necessary packages and dependencies: import pandas as pd import numpy as np from...
Decision Trees (part 1)In this post we’ll learn the basics but very powerful concepts that will guide us in the quest to find a powerful (and profitable) machine learning model that in further posts, we’ll be implementing it to a real life Algorithmic Trading Bot....
Random ForestRandom Forest, as the name suggest, it’s the bagging of many Decision Trees, in this way we get better results than using just a single tree. Random Forest not only sample subsets of a given data set but also sample subsets or features, letting each model...