CECL – Data (As Usual) Drives Everything

To appropriately prepare for CECL a financial institution (FI) must have a hard heart-to-heart with itself about its data. Almost always, simply collecting data in a worksheet, reviewing it for gaps, and then giving it the thumbs up is insufficient.

Data drives all parts of the CECL process. The sections below, by no means exhaustive, provide key areas where your data, simply being by your data, constrains your options.


Paragraph 326-20-30-2 of the Financial Accounting Standards Board (FASB) standards update[1] states: “An entity shall measure expected credit losses of financial assets on a collective (pool) basis when similar risk characteristic(s) exist.” It then points to paragraph 326-20-55-5 which provides examples of risk characteristics, some of which are: risk rating, financial asset type, and geographical location.

Suggestion: prior to reviewing your data consider what risk profiles are in your portfolio. After that, review your data to see if it can adequately capture those risk profiles. As part of that process consider reviewing:

  • Frequency of missing values in important variables
  • Consistency in values of variables
  • Definitional consistency[2]
Methodology Selection

The FASB standard update does not provide guidance as to which methodologies to use[3]. That decision is entirely up to the FI[4]. However, the methodologies that are available to the FI are limited by the data it has. For example, if an FI has limited history then any of the methodologies that are rooted in historical behavior (e.g., vintage analysis or loss component) are likely out of the question.

Suggestion: review the historical data and ask yourself these questions: 1) do I have sufficient data to capture the behavior for a given risk profile?; 2) is my historical data of good quality?; 3) are there gaps in my history?

Granularity of Model

Expected credit loss can be determined on three different levels of granularity: loan, segment (i.e., risk profile), and portfolio. Each granularity level has a set of pros and cons but which level an FI can use depends on the data.

Suggestion: review variables that are account specific (e.g., loan-to-value, credit score, number of accounts with institution) and ask yourself: are the sources of these variables reliable? Do they get refreshed often enough to capture changes in customer or macroeconomic environment behavior?

Hopefully, this post has started you critically thinking about your data. While data review might seem daunting, I cannot stress enough—it’s needed, it’s critical, it’s worth the effort.


Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.


[1] You can find the update here

[2] More on what these mean in a future blog post

[3] Paragraph 326-20-30-3

[4] A future blog post will cover some questions to ask to guide in this decision.



CECL—The Caterpillar to Butterfly Evolution of Data for Model Development

Avoiding Discrimination in Unstructured Data

An article published by the Wall Street Journal on Jan. 30, 2019  got me thinking about the challenges of using unstructured data in modeling. The article discusses how New York’s Department of Financial Services is allowing life insurers to use social media, as well as other nontraditional sources, to set premium rates. The crux: the data cannot unfairly discriminate.  

I finished the article with three questions on my mind. The first: How does a company convert unstructured data into something useful? The article mentions that insurers are leveraging public information – like motor vehicle records and bankruptcy documents – in addition to social media. Surely, though, this information is not in a structured format to facilitate querying and model builds.

Second: How does a company ensure the data is good quality? Quality here doesn’t only mean the data is clean and useful, it also means the data is complete and unbiased. A lot of effort will be required to take this information and make it model ready. Otherwise, the models will at best provide spurious output and at worst provide biased output.

The third: With all this data available what “new” modeling techniques can be leveraged? I suspect many people read that last sentence and thought AI. That is one option. However, the key is to make sure the model does not unfairly discriminate. Using a powerful machine learning algorithm right from the start might not be the best option. Just ask Amazon about its AI recruiting tool.[1]

The answers to these questions are not simple, and they do require a blend of technological aptitude and machine learning sophistication. Stay tuned for future blog posts as we provide answers to these questions.


[1] Amazon scraps secret AI recruiting tool that showed bias against women


Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

Subscribe to our blog!