• Hristo Piyankov

Artificial intelligence, machine learning, predictive and prescriptive analytics, inference –

Updated: Nov 23, 2020

… and this is just the “marketing” subset of terms. If we deep dive further into the technical details, you will hear about classification, clustering, neural nets, supervised and unsupervised learning and so much more.  While each of those bears some form of distinction from the others, in the end, I would argue that they all boil down to one thing – discovering patterns in large data sets.  This is done by using a set of data analysis techniques, which at their very core are also not very different. With the risk of oversimplifying things, each of those techniques tries to describe data in a certain way, in order to minimise a value. This value is usually a measure of the “difference” between the technique output (“description” of the data) and the actual data. Other times it’s a measure of how two variables within the data interact with one another.

This article will not focus on explaining the difference in the definitions of those terms. Any google search can do this for you. Although admittedly if you open ten different sites, it’s likely you will end up with at least five different definitions. In contrast, I would like to focus on classifying groups of techniques depending on their purpose. Afterwards comparing them to one other based on those purposes. At the end, this can give you an idea on what you need to focus on, depending on what you want to achieve.

Classification of models

The most straightforward way we can classify models is by what they are trying to achieve (their final added value). It is important to note that one technique, can be classified into more than one group of models. A great example of this is the Regression. It is so fundamental and so many other techniques build on top of it. Thus it can be classified both as statistical, predictive, machine learning and artificial intelligence model. Its real classification is not dependent on the technique itself, but rather on how it is being used.

In order to illustrate the difference in models better, we will focus on a practical example. Let’s take a look at what a particular group of models might output if they were analysing cancer data.

Relation models. The main focus of those models is describing the relationship between different variables in the data. Alternatively,  between different samples of the data. Those kinds of models is most commonly used in scientific studies and controlled experiments such as drug testing. In our cancer example, a typical output would look lie: “red meat causes cancer with an odds ratio of 1.67″. It is important to note here, that the main goal is not a prediction for an individual person and it’s accuracy. The main goal is describing the relation between variables or samples of data, as accurately as possible.

Terms with typically relate to this group are:

  1. Statistical models

  2. Regression

  3. Pattern recognition

  4. Inference

  5. Odds ratios

  6. Association (market basket) analysis

Predictive models. This set of models generally predicts an individual outcome. Also to provide at least a partial explanation for the reasons behind it. Example: “because you are male, 35 years old, smoke and your red meat consumption are 2 times higher than the average, the probability of you having cancer within the next 5 years is 2%”. The focus here is the individual prediction and the main drivers behind it. This, however, means that you cannot take the drivers in a vacuum (compare them to the outcome individually). Likewise, you cannot make any generalised conclusions regarding the causal effect.

In the area of predictive modelling, you usually hear about:

  1. Regression

  2. Scorecards

  3. Probabilities

  4. Predictions

  5. Correlation & causation

  6. Decision trees

Machine learning models. At their core, machine learning models, focus exclusively on the accuracy of the individual prediction. As a result, they largely ignore any transparency of the modelling technique or the drivers behind the prediction. This is also why they are more commonly used in situations where all the data is available and there are no variables which might be unknown.

It is possible to have a machine learning model saying “within the net 5 years there is 2.345% probability you will have cancer”. However, it is much more common to have model analyse a picture of a tumour and try to classify it as malignant or benign. The reason is that in the first case, we do not really know all the factors which cause cancer in general and there might also be a lot of data which is not available to the model. As a result, we cannot fully trust it without understanding the drivers behind the prediction. In the second example, we have already taken out a lot of the unknown variables and we are just focusing on classifying a subset of the data – a single picture. In this case, there is no additional unknown data.

Machine learning generally refers to:

  1. Neural networks

  2. Support vector machines

  3. Random forests

  4. Deep learning

Forecasting models. Forecasting is about observing change over time and extrapolating this change from past to future periods. Forecasting is actually the category which is most different from the rest. All other models generally focus on a prediction for a pre-defined future period. If they need to evaluate a second, third, etc period – this constitutes creating separate models.  Forecasting, on the other hand, focuses precisely on the change over time. You know that something will happen, the question is what will be the magnitude with relation to time. Furthermore, it works better on pre-aggregated clusters of data, while all other models generally perform better on the lowest level of raw data available. In our cancer example, a forecast would look like “the spending on cancer medications is expected to increase  1.2, 1.4, 1.7 times over the next 1,2,3 years”.

It is also worth noting that the techniques used in forecasting are a lot different that other models. They tend more towards the descriptive side rather than the predictive side. The premise is that we already know the main drivers behind an event. As such we do not need to find them, just to estimate the event in future periods.

Terms usually associated with forecasting are:

  1. Moving averages

  2. Exponential smoothing

  3. Seasonality

  4. Budgeting

  5. Business cases

Action-based models. The goal of those models is obtaining an accurate prediction for a given outcome and making a decision (next best action) based on it. Alternatively, the prediction might be the action itself.  AI takes this one step further, by taking the action and evaluating the result in order to optimise future action. This is generally done through a combination of different models implemented in a decision-making structure. The structure chooses which model to use based on environmental conditions and then decides what to do with the result.

As an example, this could be a website, which based on the series of guided user inputs tries to define the presence, type and severity of a disease. Then recommends follow-up actions. While theoretically this can be done with only one model (in case the subject is very narrow), this implementation calls for an algorithm, which should both guide the user inputs (as they cannot be standard) and chooses the appropriate model for each classification task. Finally, the follow-up action should also be different on a case by case basis. This calls for a separate decision engine behind them.

Those include:

  1. Artificial intelligence

  2. Prescriptive analytics

And what about the rest?

As you might have noticed, I left some of the terms not classified in any of the categories.

  1. Data mining – this is a term which actually encompasses everything mentioned so far. On top of it I would even put data scraping, extraction, transformation and collection in this category. As such, I am usually using it as an umbrella term for everything model-related.

  2. Data modelling – is again too broad of a term. Even more so than Data mining. You could say that any kind of data manipulation is data modelling. Even calculating a simple average. This is why I generally refrain from using it. Some might argue that it is a synonym for “Statistical models”. I believe it is broader than this.

Model characteristics

As you can see, there is a certain degree of overlap between the different kinds of models. The choice which one to use is dependent on a series of model characteristics and how they fit our final model application. While there are lots and lots of characteristics across which we can classify models, at the end, for me it boils down to those three:

  1. Predictive power – the capability of the model to output accurate predictions with regards to a target variable or period.

  2. Transparency – ease of understanding, how the model prediction was obtained. Also the capability of the model to accurately assess the interaction and dependency between variables and/or samples.

  3. Automation – what is the level of autonomy from human involvement after the model is deployed and during the re-calibration of the model with new data.

It is also important to note that the models’ performance will vary depending on the fullness of the observed data. We can classify the data into two groups:

  1. Full – either data obtained in a specialized environment such as a controlled experiment. Or data which is complete with regards to the task. For example, classifying the content of a picture. The task and data are part of the picture itself.

  2. Partial – also referred to as observational, is data where some significant variables might be missing. Likewise, if the data was not obtained in a controlled environment. For example, if we are examining bank statements to determine the financial situation of a person, we are definitely missing significant other information such as investment, available cash, spouse’s income, other assets, accounts in other banks.

Predictive power vs transparency

Predictive power vs transparency in different kinds for modeling - machine learning

When the data is complete, each model is performing best at what it was designed for. Relation models give you a good sense of how variables interact. Machine learning is giving the most accurate predictions. I have ranked AI slightly higher than machine learning, due to the fact, that when properly implemented, it can adjust to the environmental or situational conditions and thus output slightly better performance. In its core, however, the principle algorithms are the same.

Although previously I put AI and Prescriptive models in the same category, it’s worth noting that they have some significant differences. Their end goal is still the same – to predict and in the case of AI to even take – the next best action. On the other hand, the way they achieve it is different. AI is all about automation and autonomy, while Prescriptive models aim to expose the result to further human interpretation and decision making.

If we consider it at the highest possible level Predictive models and forecasts, in essence, strive for the same thing – accurate predictions for future periods. While at the same time keeping some transparency about how it was obtained.  This is why on this scale they rank close to each other. In detail, however, they are fundamentally different by scope and means of obtaining the prediction.

As soon as our data becomes observational, both of the characteristics take a significant hit. This goes to show the importance that data plays in the process of model building. For better or worse usually, the better model is the which has access to more data. Not necessarily the one which is built the best way and fine-tuned the most.

It is important to note here that transparency will generally have a bigger drop than the prediction power when we start missing data. This does not mean that the model results would be harder to read and understand. Rather it means that the uncovered dependencies might be missing important pieces of the general picture and might be unreliable. This is why in medical studies, findings based on observational data are always met with scepticism until controlled experiments confirm them. This is not to say that observational studies are not valuable, quite the opposite.  However, they cannot be taken as definitive findings.

It is interesting to see that forecasts are pretty resilient to data incompleteness. This is due to the fact that by definition they operate on a subset of the data. Furthermore, the areas where forecasts are generally applied (finance for example) by definition operate with incomplete data.

Predictive power vs transparency in different kinds for modeling - machine learning

Predictive power vs automation

Predictive power vs automation in different kinds for modeling - machine learning

Machine learning is king here. This should come as no surprise as this is what those algorithms are designed for. The human interaction during the modelling process is more related to picking the correct data structure & formats and the correct model architecture & parameters. Understanding the model outputs and data transformation & calculation in this particular instance is far less important. Furthermore, if your data is complete there is a low probability that you will need to re-tune your algorithm in the near future unless your data changes significantly itself.

The reason why AI ranks lower is that it has more “moving parts”. We have algorithms on top of other algorithms, so small changes in one part could have larger repercussions in another one.

Forecasts can be, surprisingly, automated to a high degree as well. While it depends on the level of prediction granularity, overall they tend to have stable outputs. Stability, in turn, translates in the possibility for automation.

Predictive, prescriptive, (and most of all) relation model, usually depend heavily on the interpretation and knowledge of the subject area by the person building them. As such the level of automation is somewhat limited. It is the case with those models (and all other for that matter) where the practical application of the model, usually changes the modelled area. As such the models are auto-depreciating themselves. However, as they are not “machine” but “human” learning, not a lot of automation can be achieved.

On the flip side, when the completeness of data takes a hit, AI and Machine Learning suffer more in terms of automation than others. Any significant change or noise in the data means the model has to be re-run. Re-running with different data structure is tedious since this is at the core of the model’s architecture.

Forecasting retains its title as one of the more robust approaches and does not take a big hit. This is driven by the fact that by definition forecasts are build on a smaller subset of reliable data.

The reason why predictive, prescriptive and relation models also do not suffer a lot in terms of automation, is because they did not have a lot of it, to begin with. With a lot or with less data, those algorithms would generally have to be re-done after a certain amount of time.

Predictive power vs automation in different kinds for modelling- machine learning

All that said, it is clear that there is no one approach to solving every modelling problem. Do you want to predict which customer will buy your product? Or what is driving them to do so? Maybe it’s the total number of sales within 1,2,3 years? Do you need to automate your sales? Or give your sales advisors the right tools to close the deals?

All those questions ask for different kinds of models with a different focus on the outputs produced. It could be that your job focuses predominantly on one of those more. Whatever the case is, make sure you have a basic understanding of the principles and applications of all of them in order to have a tool for every scenario.

#Datamining #Fundamentals #Inference #Modeling