Select Page

Decision Trees (part 1)

In this post we’ll learn the basics but very powerful concepts that will guide us in the quest to find a powerful (and profitable) machine learning model that in further posts, we’ll be implementing it to a real life Algorithmic Trading Bot. We’ll learn the basic foundations of:

  • Decisions Trees
  • Random Forest
  • Bagging
  • Regression Trees
  • Mathematics Involved (Simplified) 😉

Also we’l be reviewing important concepts such as:

  • Probability
  • Statistics
  • Information Theory
  • Machine Learning

Let’s start!

Decision Trees

One of the most used models of Machine Learning (ML) applied to finance is Decision Trees, it’s a classic tool for rule based entrance, which means that based in certain conditions, the three will continue growing accordingly. For example, Imagine that we want to classify fruits using this model, then we’d have:

 

 

 

As we see in the above graph, the three starts by asking a simple question to the data, in this case, which color; then based in the answer(s) the tree will continue asking more questions until get the desired outcome (classify our batch of fruits).

Now we’ll list some of the advantages of why Decision Trees is one of the favorite ML models:

  • Easy model to explain why it ended up with certain conclusion or result.

  • One of the fastest models for classifications and regressions

  • Unlike other ML models, it can be trained with little data

However, we have to be careful because this model can easily produce over-fitting solutions, this means that the model will fail to generalize well new data samples that it wasn’t trained on.

 


Bias and Variance

Bias: it means that the model is not learning enough about the data, due to its low capacity. This can be solved by increasing the model capacity, size and parameters. Bias also means that the model fails on the training data.

Bias is another word for under-fitting.

Variance: Variance means that the model has learned its training data, but it fails to generalize well to new data that it wasn’t trained on.

Variance also means over-fitting.

 

 

What we desire is Low Bias and Low Variance.

 

Bias and Variance are mostly inversely proportional, except in the global minimum, where both are minimum.

The total error is the sum of both Bias and Variance.

 

 

Next, we’re going to learn an important concept very related to Decision Trees, called: Ensemble LearningClick here to continue to Ensemble Learning

 

 

Want new articles before they get published?
Subscribe to our Awesome Newsletter.

Random Forest

Random ForestRandom Forest, as the name suggest, it’s the bagging of many Decision Trees, in this way we get better results than using just a single tree. Random Forest not only sample subsets of a given data set but also sample subsets or features, letting each model...

Decision Trees (part 2)

Decision Trees (part 2)Ensemble Learning Ensemble learning is one of the most interesting and most used concepts in machine learning applied to finance. It consist of two main ideas: Bagging Boosting (This concept we’ll cover in further posts) Bagging This is a short...