Decision Trees vs. Clustering Algorithms vs. Linear Regression

The goal of someone learning ML should be to use it to improve everyday tasks—whether work-related or personal. To do this, it's important to first understand algorithms.

The topic of this article is credited to DZone’s excellent Editorial team. It is a part of DZone’s recently launched Bounty Board — a remarkable initiative that helps writers work on topics suggested by the DZone editors. I think this is somewhat similar to an extempore and helps a writer to go beyond; challenges them to write on subjects beyond their favorite, well-crafted topics. So, if you are struggling to think of a topic to write or want to go beyond your imagination and win some exciting gifts, then join the Bounty Hunter Contest (goes until October 2).

Machine learning is alive. The whole world is talking about machine learning, and everyone is aspiring to be a data scientist or machine learning engineer. But when it comes to real life applications, it seems rare and limited. The reason? Most of the people are not learning it with the end purpose in mind. The ultimate goal of a person learning machine learning should be to use it to improve the things we do every day, whether they’re at work or in our personal lives. If we just learn statistics, study machine learning algorithms, and practice R/Python programming, we’ll be an ML taskmaster — but not an ML jobmaster. So, to become an ML jobmaster, it is important to start asking three important questions when we start studying machine learning: why to use a certain machine learning algorithm, which machine algorithm to choose, and when to use the machine learning algorithm.

“Your job in a world of intelligent machines is to keep making sure they do what you want, both at the input (setting the goals) and at the output (checking that you got what you asked for).” — Pedro Domingos

In this article, I will try to explain three important algorithms: decision trees, clustering, and linear regression. These are extensively used and readily accepted for enterprise implementations.

We can distinguish and summarize these three algorithms as follows:

If we have no idea about the data and want to group data points to understand their collective behavior, clustering is one of the go-to methods.
If we want to predict numbers before they occur, then regression methods are used. Linear regression is one of the regression methods, and one of the algorithms tried out first by most machine learning professionals.
If there is a need to classify objects or categories based on their historical classifications and attributes, then classification methods like decision trees are used.
- Note: Decision trees can be utilized for regression, as well.

Let’s dive a little deeper.

Clustering Algorithms (Unsupervised Learning)

Clustering techniques can group attributes into a few similar segments where data within each group is similar to each other and distinctive across groups. It is an unsupervised learning process finding logical relationships and patterns from the structure of the data. It can be used for cases that involve:

Discovering the underlying rules that collectively define a cluster (i.e. topic generation)
Partitioning (i.e. customer segmentation or market segmentation)
Discovering the internal structure of the data (i.e. gene clustering)

Decision Trees (Supervised Learning: Classification)

Decision trees are widely used classifiers in enterprises/industries for their transparency on describing the rules that lead to a classification/prediction. They are arranged in a hierarchical tree-like structure and are simple to understand and interpret. They are not susceptible to outliers. Decision trees can be well-suited for cases in which we need the ability to explain the reason for a particular decision. For example, sales and marketing departments might need a complete description of rules that influence the acquisition of a customer before they start their campaign activities.

Linear Regression (Supervised Learning: Regression)

Linear regression is the oldest and most-used regression analysis. It is studied rigorously and used extensively in practical applications. Linear regression is an approach for deriving the relationship between a dependent variable (Y) and one or more independent/exploratory variables (X). With linear regression, this relationship can be used to predict an unknown Y from known Xs.

Linear regression has many functional use cases, but most applications fall into one of the following two broad categories:

If the goal is a prediction or forecasting, it can be used to implement a predictive model to an observed data set of dependent (Y) and independent (X) values.
Linear regression analysis can be applied to quantify the change in Y for a given value of X that assists in determining the strength of the relationship between dependent (Y) and independent (X) values.

Categories: Machine Learning

Decision Trees vs. Clustering Algorithms vs. Linear Regression