Decision Trees (part 1)
In this post we’ll learn the basics but very powerful concepts that will guide us in the quest to find a powerful (and profitable) machine learning model that in further posts, we’ll be implementing it to a real life Algorithmic Trading Bot. We’ll learn the basic foundations of:
- Decisions Trees
- Random Forest
- Regression Trees
- Mathematics Involved (Simplified)
Also we’l be reviewing important concepts such as:
- Information Theory
- Machine Learning
One of the most used models of Machine Learning (ML) applied to finance is Decision Trees, it’s a classic tool for rule based entrance, which means that based in certain conditions, the three will continue growing accordingly. For example, Imagine that we want to classify fruits using this model, then we’d have:
Now we’ll list some of the advantages of why Decision Trees is one of the favorite ML models:
Easy model to explain why it ended up with certain conclusion or result.
One of the fastest models for classifications and regressions
Unlike other ML models, it can be trained with little data
However, we have to be careful because this model can easily produce over-fitting solutions, this means that the model will fail to generalize well new data samples that it wasn’t trained on.
Bias and Variance
Bias: it means that the model is not learning enough about the data, due to its low capacity. This can be solved by increasing the model capacity, size and parameters. Bias also means that the model fails on the training data.
Bias is another word for under-fitting.
Variance: Variance means that the model has learned its training data, but it fails to generalize well to new data that it wasn’t trained on.
Variance also means over-fitting.
What we desire is Low Bias and Low Variance.
Bias and Variance are mostly inversely proportional, except in the global minimum, where both are minimum.
The total error is the sum of both Bias and Variance.
Next, we’re going to learn an important concept very related to Decision Trees, called: Ensemble Learning. Click here to continue to Ensemble Learning
Subscribe to our Awesome Newsletter.
Building and implementing a Random Forest AlgorithmNow we'll be implementing a simple Random Forest algorithm into code using Python 3. Let's get started! Importing the necessary packages and dependencies: import pandas as pd import numpy as np from...
Random ForestRandom Forest, as the name suggest, it’s the bagging of many Decision Trees, in this way we get better results than using just a single tree. Random Forest not only sample subsets of a given data set but also sample subsets or features, letting each model...
Decision Trees (part 2)Ensemble Learning Ensemble learning is one of the most interesting and most used concepts in machine learning applied to finance. It consist of two main ideas: Bagging Boosting (This concept we’ll cover in further posts) Bagging This is a short...