I am delighted to announce that the article I co-authored on Text Mining has been awarded the article of the month!
Unstructured Text data such as customer comments and social media data contain valuable insights – but obtaining those insights is difficult unless you have the right tools at your disposal. Text Analytics is an exciting area which allows the data scientist to extract high quality information from unstructured text data. I first started experimenting with Text Analytics back in 2011 using R to draw out the key themes from an insurance claim dataset. That work was published in the May 2011 edition of the Actuary Magazine and can be downloaded here The Actuary May 2011 (the article starts on page 34)
I used various clustering algorithms (k-means, hierarchical) to group claims together based on the information in the claim description. However in all honesty I was never really that happy with the results….
Four years on and I discovered a new approach to Text Analytics called Topic Modelling (otherwise known as Latent Dirichlet Allocation) – and I have been blown away by the results!! I will be posting a blog update in a few weeks about this topic but until then read about how I applied this technique to a client in the gaming industry here.