Machine Learning in Insurance Pricing

Have you ever wondered whether you should be applying Machine Learning in Insurance Pricing? According to Gartner,  Machine Learning is one of the hottest technology trends of 2016 and is revolutionising the way many companies do business. In this blog post I take a look at machine learning from an insurance pricing stand point, highlighting the advantages and challenges of applying machine learning in insurance pricing. In order to help you get started I also provide free R code, so you can try these exciting algorithms on your own insurance data.
machine learning in insurance pricing

Statistical Modelling vs. Machine learning in Insurance Pricing

Machine Learning has gained in popularity in recent years – but wait… isn’t Machine Learning just Statistical Modelling!? Lets start with some definitions:

  • In Statistical Modelling we estimate the model parameters which describe the relationship between the explanatory variables and the dependent variable.
  • There are several different categories of Machine Learning. Supervised Learning is the Machine Learning task of learning a function which takes a set of inputs and produces a prediction.

Clearly then there are similarities between supervised learning and statistical modelling. There are several notable differences however, which are easy to see once we consider the two competing goals of a modelling project:

  1. Prediction – to be able to predict what the responses are going to be for future input variables
  2. Information – to extract some information about how nature is associating the response variables to the input variables.

Statistical Modelling

Statistical modelling is focused on extracting information and allows the user to infer the process that generated the data. The analysis starts with assuming that the observed data is generated by taking independent samples from a stochastic data model. In Insurance Pricing our data model of choice is typically the Generalised Linear Model (GLM). During model fitting the parameters are estimated from the data and the model is then used for information and/or prediction.

Machine Learning

Machine learning is focused on prediction. The approach is to find a function f(x) – an algorithm that operates on x to predict the responses y. We are not restricted to using data models such as GLM’s but can use any algorithm. Typical algorithm choices include Decision Trees, Random Forests, Gradient Boosting Machines, Neural Networks and Support Vector Machines. These algorithms are very flexible and can offer improvements in predictive accuracy compared to statistical data models, especially with high dimensional data and non-linear relationships.

Challenges with applying Machine Learning in Insurance Pricing


The traditional tools used within Insurance Pricing such as Emblem and SAS (Stat) are really good at what they do and have dominated the market in recent years. However these tools do not provide the same breadth of machine learning algorithms that are available through open source projects such as R, Python and Spark.

machine learning in insurance pricing tools


The most predictive algorithms need lots of computer processing power, requiring investment in powerful servers or knowledge of cloud computing. In order to fully explore the machine learning landscape insurers will need to transition to these new technologies and infrastructures, either by investing in the development of their own staff or bringing in external consultants

Regulatory Concerns

The ability to justify and explain your pricing models is important from a regulatory point of view. Generalised Linear Models are easy to explain, as the effect of each input variable on the response is encoded directly in the model coefficients.  However the unfortunate reality is that as we push towards higher accuracy models, algorithms become more complex and harder to interpret. The classic example is the Neural Network which has excellent predictive power but provides little in the way of information. This is almost always the trade-off we make when prediction accuracy is the goal. Insurers will need to decide how much interpretability they are willing to sacrifice when pushing for more accurate pricing models.

Advantages of Machine Learning in Insurance Pricing

As insurers look to gain competitive advantage through more accurate pricing models, they supplement their databases with external data. As a result the datasets used to build pricing models have become unmanageably wide. The pricing professional is often required to test several hundred variables which is a daunting and time consuming task.

Shorter Model Build Times

Statistical models rely on the user to select which explanatory variables should appear in the model, and what degree of smoothing/grouping should take place. With machine learning model complexity is determined in an automated way, which results in much quicker model builds. The ability to build models in less time is one reason why Insurers are looking to utilise Machine Learning in Insurance Pricing.

More Accurate Models

The primary driver behind the adoption of Machine Learning in Insurance Pricing is a desired improvement in predictive accuracy over more traditional methods. As I show in this benchmark test, the estimated improvement in loss ratio performance from adopting these methods is at least 1%.

Cost Saving AND Cutting Edge Pricing Models

As well as the obvious saving in expensive software license fees, transitioning to open source software has other benefits.You become closer to the cutting edge. When an academic paper is published with a new algorithm often an implementation is provided as an R package, allowing you to immediately try the algorithm on your own data.

How do I get started?!

Rather than  jumping straight in with the most powerful black box algorithms such as Neural Networks, I would suggest experimenting with a technique called regularisation. Often referred to as Penalised Regression, regularisation can be used to fit Generalised Linear Models in an automated way, and can provide significant increases in predictive performance whilst retaining the interpretability of a GLM.

To get up to speed quickly in these powerful techniques the following one day interactive courses are now available.

All courses come complete with free R code so you can start using these methods on your own data right away!