An article published by the Wall Street Journal on Jan. 30, 2019 got me thinking about the challenges of using unstructured data in modeling. The article discusses how New York’s Department of Financial Services is allowing life insurers to use social media, as well as other nontraditional sources, to set premium rates. The crux: the data cannot unfairly discriminate.
I finished the article with three questions on my mind. The first: How does a company convert unstructured data into something useful? The article mentions that insurers are leveraging public information – like motor vehicle records and bankruptcy documents – in addition to social media. Surely, though, this information is not in a structured format to facilitate querying and model builds.
Second: How does a company ensure the data is good quality? Quality here doesn’t only mean the data is clean and useful, it also means the data is complete and unbiased. A lot of effort will be required to take this information and make it model ready. Otherwise, the models will at best provide spurious output and at worst provide biased output.
The third: With all this data available what “new” modeling techniques can be leveraged? I suspect many people read that last sentence and thought AI. That is one option. However, the key is to make sure the model does not unfairly discriminate. Using a powerful machine learning algorithm right from the start might not be the best option. Just ask Amazon about its AI recruiting tool.
The answers to these questions are not simple, and they do require a blend of technological aptitude and machine learning sophistication. Stay tuned for future blog posts as we provide answers to these questions.
Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.