Handling Missing Data for CECL Requirements


Most financial institutions (FI’s) find that data is the biggest hurdle when it comes to regulatory requirements: they don’t have enough information, they have the wrong information, or they simply have missing information. With the CECL accounting standard, the range of data required to estimate expected credit losses (e.g., reasonable and supportable forecasts) grew from what was previously required. While this is a good thing in the long run (as the requirements gradually help FI’s build up their inventory of clean, model-ready data), many FI’s are finding it difficult to address data problems right now. In particular, how to handle missing data is a big concern.

Missing data becomes a larger issue because not all missing data is the same. Classifications, based on the root causes of the missing data, are used as guidance in choosing the appropriate method for data replacement. The classifications consist of:

  1. Not missing at random – the cause of the missing data is related to the missing values
    • For example, CLTV values are missing when previous values have exceeded 100.
  2. Missing at random (MAR) – the cause of the missing data is related to observed values of other variables
    • For example, DTI values are missing when the number of borrowers is 2 or more.
  3. Missing completely at random (MCAR) – the cause of the missing data is unrelated to values of the variable or other variables; data is missing due to an entirely random process
    • For example, LTV values are missing because a system outage caused recently loaded data to be reset to default value of missing.

Once a classification is made for the reason of missing data, it is easier to determine its resolution. For example, if the data is MCAR there is no pattern and therefore, involves no loss of information if those observations with the missing values are dropped. Unfortunately, data is rarely MCAR.

The following table represents some methods (not meant to be all inclusive) a FI may use to handle other, more common, data issues.

MethodDescriptionProsCons
Last observation carried forward / backwardFor a given account, use a non-missing value in that variable to fill missing values before and/or after it• Simple
• Uses actual value that the account has
• Useful for origination variables
• Assumes stability in account behavior
• Assumes data is MCAR

Mean ImputationUser of the verage value of the observed observations for the missing value• Simple• Distorts empirical distribution of data
• Does not use all information in the data set
Hot decking and cold deckingReplace missing values with a value from a similar observation in the sample (cold decking is when one uses a similar observation out of sample)• Conceptually straightforward
• Uses existing relationships in the data
• Can be difficult to define characteristics of a similar observation
• Continuous data can be problematic
• Assumes data is MAR
RegressionUse univariate or multivariate regression models to impute missing value – dependent variable is the variable that is missing• Fairly easy to implement
• Uses existing relationships in the data
• Can lead to overstating relationships among the variables
• Estimated values may fall out of accepted ranges
• Assumes data is MAR

Understanding why the data is missing is an important first step in resolving the issue. Using the imputation methods outlined above can provide a temporary solution in creating clean historical data for methodology development. However, in the long run, FI’s will benefit from establishing a more permanent solution by constructing data standards/procedures and implementing a robust on-going monitoring process to ensure the data is accurate, clean, and consistent.

 

Resources:

  1. FASB Accounting Standards Update, No. 2016-13, Financial Instruments – Credit Losses (Topic 326).

Samantha Zerger, business analytics consultant with FRG, is skilled in technical writing. Since graduating from the North Carolina State University’s Financial Mathematics Master’s program in 2017 and joining FRG, she has taken on leadership roles in developing project documentation as well as improving internal documentation processes.

Stress Testing Private Equity


FRG, partnered with Preqin, has developed a system for simulating cash flows for private capital investments (PCF).  PCF allows the analyst to change assumptions about future economic scenarios and investigate the changes in the output cash flows.  This post will pick a Venture fund, shock the economy for a mild recession in the following quarters, and view the change in cash flow projections.

FRG develops scenarios for our clients.  Our most often used scenarios are the “Growth” or “Base” scenario, and the “Recession” scenario.  Both scenarios are based on the Federal Reserve’s CCAR scenarios “Base” and “Adverse”, published yearly and used for banking stress tests.

The “Growth” scenario (using the FED “Base” scenario) assumes economic growth more or less in line with recent experience.

The “Recession” scenario (FED “Adverse”) contains a mild recession starting late 2019, bottoming in Q2 2020.  GDP recovers back to its starting value in Q2 2021.  The recovery back to trend line (potential) GDP goes through Q2 2023.

Real GPD Growth chart

 

The economic drawdown is mild, the economy only loses 1.4% from the high.

Start DateTrough DateRecovery DateFull PotentialDepth
Q4 2019Q2 2020Q2 2021Q2 2023-1.4%

Equity market returns are a strong driver of performance in private capital.  The total equity market returns in the scenarios include a 34% drawdown in the index.  The market fully bottoms in Q1 2022, and has recovered to new highs by Q1 2023.

This draw down is shallow compared to previous history and the recovery period shorter:

Begin DateTrough DateRecovery DateDepthTotal LengthTrough Recovery
06/30/200009/30/200212/31/2006-47%271017
12/31/200703/31/200903/31/2013-49%22616
12/31/201903/31/202203/31/2024-34%18108

The .COM and Global Financial Crisis (GFC) recessions took off nearly 50% of the market value.  This recession only draws down 34%.  The time from the peak to the trough is 10 and 6 quarters for the .COM and GCF respectively.  Here we are inline with the .COM crash with a 10-quarter peak to trough period.  This recovery is faster by nearly double than either of the recent large drawdowns at 8 quarters versus 17 and 16.

We start by picking a 2016 vintage venture capital fund.  This fund has called around 89% of its committed capital, has an RVPI of 0.85 and currently sports about an 18% IRR.  For this exercise, we assume a $10,000,000 commitment.

Feeding the two scenarios, this fund, and a few other estimates into the PCF engine, we can see a dramatic shift in expected J-curve.

Under the “Growth” scenario, the fund’s payback date (date where total cash flow is positive) is Q1 2023.  The recession prolongs the payback period, with the expected payback date being Q3 2025, an additional 2.5 years.  Further, the total cash returned to investors is much lower.

This lower cash returned as well as the lengthening of the payback period have a dramatic effect on the fund IRR.

That small recession drops the expected IRR of the fund a full 7% annualized.  The distribution shown in the box and whisker plot above illustrates the dramatic shift in possible outcomes.  Whereas before, there were only a few scenarios where the fund returned a negative IRR, in the recession nearly a quarter of all scenarios produced a negative return.  There are more than a few cases where the fund’s IRR is well below -10% annually!

This type of analysis should provide investors in private capital food for thought.  How well do your return expectations hold up during an economic slowdown?  What does the distribution of expected cash flows and returns tell you about the risk in your portfolio?

At FRG, we specialize in helping people answer these questions.  If you would like to learn more, please visit www.frgrisk.com/vor-pcf  or contact us.

Dominic Pazzula is a Director with FRG, specializing in asset allocation and risk management. He has more than 15 years of experience evaluating risk at a portfolio level and managing asset allocation funds. He is responsible for product design of FRG’s asset allocation software offerings and consults with clients helping to apply the latest technologies to solve their risk, reporting, and allocation challenges.

 

 

How Embracing SR 11-7 Guidelines Can Support the CECL Process

The Board of Governors of the Federal Reserve System’s SR 11-7 supervisory guidance (2011) provides an effective model risk management framework for financial institutions (FI’s). SR 11-7 covers everything from the definition of a model to the robust policies/procedures that should exist within a model risk management framework. To reduce model risk, any FI should consider following the guidance throughout internal and regulatory processes as its guidelines are comprehensive and reflect a banking industry standard.

The following items and quotations represent an overview of the SR 11-7 guidelines (Board of Governors of the Federal Reserve System, 2011):

  1. The definition of a model – “the term model refers to a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.”
  2. A focus on the purpose/use of a model – “even a fundamentally sound model producing accurate outputs consistent with the design objective of the model may exhibit high model risk if it is misapplied or misused.”
  3. The three elements of model risk management:
    • Robust model development, implementation, and use – “the design, theory, logic underlying the model should be well documented and generally supported by published research and sound industry practice.”
    • Sound model validation process – “an effective validation framework should include three core elements: evaluation of conceptual soundness, …, ongoing monitoring, …, and benchmarking, outcomes analysis, …”
    • Governance – “a strong governance framework provides explicit support and structure to risk management functions through policies defining relevant risk management activities, procedures that implement those policies, allocation of resources, and mechanisms for evaluating whether policies and procedures are being carried out as specified.”

The majority of what the SR 11-7 guidelines discuss applies to some of the new aspects from the accounting standard CECL (FASB, 2016). Any FI under CECL regulation must provide explanations, justifications, and rationales for the entirety of the CECL process including (but not limited to) model development, validation, and governance. The SR 11-7 guidelines will help FI’s develop effective CECL processes in order to limit model risk.

Some considerations from the SR 11-7 guidelines in regards to the components of CECL include (but are not limited to):

  • Determining appropriateness of data and models for CECL purposes. Existing processes may need to be modified due to some differing CECL requirements (e.g., life of loan loss estimation).
  • Completing comprehensive documentation and testing of model development processes. Existing documentation may need to be updated to comply with CECL (e.g., new models or implementation processes).
  • Accounting for model uncertainty and inaccuracy through the understanding of potential limitations/assumptions. Existing model documentation may need to be re-evaluated to determine if new limitations/assumptions exist under CECL.
  • Ensuring validation independence from model development. Existing validation groups may need to be further separated from model development (e.g., external validators).
  • Developing a strong governance framework specifically for CECL purposes. Existing policies/procedures may need to be modified to ensure CECL processes are being covered.

The SR 11-7 guidelines can provide FI’s with the information they need to start their CECL process. Although not mandated, following these guidelines overall is important in reducing model risk and in establishing standards that all teams within and across FI’s can follow and can regard as a true industry standard.

Resources:

  1. Board of Governors of the Federal Reserve System. “SR 11-7 Guidance on Model Risk Management”. April 4, 2011.
  2. Daniel Brown and Dr. Craig Peters. “New Impairment Model: Governance Considerations”. Moody’s Analytics Risk Perspectives. The Convergence of Risk, Finance, and Accounting: CECL. Volume VIII. November 2016.
  3. Financial Accounting Standards Board (FASB). Financial Instruments – Credit Losses (Topic 326). No. 2016-13. June 2016.

Samantha Zerger, business analytics consultant with FRG, is skilled in technical writing. Since graduating from the North Carolina State University’s Financial Mathematics Master’s program in 2017 and joining FRG, she has taken on leadership roles in developing project documentation as well as improving internal documentation processes.

 

Improve Your Problem-Solving Skills

This is the fifth post in an occasional series about the importance of technical communication in the workplace.

 “Work organisations are not only using and applying knowledge produced in the university but they are also producing, transforming, and managing knowledge by themselves to create innovations (Tynjälä, Slotte, Nieminen, Lonka, & Olkinuora, 2006)”.

Problem-solving skills are rooted in the fact that you must learn how to think, not what to think. Most classes in high schools and colleges teach you what to think (e.g., history dates, mathematical equations, grammar rules), but you must learn further problem-solving skills in order to help you learn how to think.

In the technical workplace, you are expected to be able to be given a problem and come up with a solution; a solution that possibly has never been thought of before. Employers are looking for people that have the right skills in order to do that very thing. Because of this, most interview processes will inevitably include at least one problem-solving question.

  • “How have you handled a problem in your past? What was the result?”
  • “How would you settle the concerns of a client?”
  • “How would you handle a tight deadline on a project?”

The way you answer the problem-solving question usually gives the interviewer a good sense of your problem-solving skills. Unfortunately, for the interviewee, problem solving is grouped into a BROAD skill set made up of:

  • Active listening: in order to identify that there is a problem
  • Research: in order to identify the cause of the problem
  • Analysis: in order to fully understand the problem
  • Creativity: in order to come up with a solution, either based on your current knowledge (intuitively) or using creative thinking skills (systematically)
  • Decision making: in order to make a decision on how to solve the problem
  • Communication: in order to communicate the issue or your solution to others
  • Teamwork: in order to work with others to solve the problem
  • Dependability: in order to solve the problem in a timely manner

So how do you, as the interviewee, convey that you have good problem-solving skills? First, acknowledge the skill set needed to solve the problem relating to each step in the problem-solving process:

Step in Problem SolvingSkill Set Needed
1. Identifying the problemActive listening, research
2. Understanding and structuring the problemAnalysis
3. Searching for possible solutions or coming up with your own solutionCreativity, communication
4. Making a decisionDecision making
5. Implementing a solutionTeamwork, dependability, communication
6. Monitoring the problem and seeking feedbackActive listening, dependability, communication

Then, note how you are either planning to or are improving your problem-solving skills. This may include gaining more technical knowledge in your field, putting yourself in new situations where you may need to problem solve, observing others who are known for their good problem-solving skills, or simply practicing problems on your own. Problem solving involves a diverse skill set and is key to surviving in a technical workplace.

Resources:

  1. Problem-Solving Skills: Definitions and Examples. Indeed Career Guide.
  2. Tynjälä, Päivi & Slotte, Virpi & Nieminen, Juha & Lonka, Kirsti & Olkinuora, Erkki. (2006). From university to working life: Graduates’ workplace skills in practice.

 

Samantha Zerger, business analytics consultant with the Financial Risk Group, is skilled in technical writing. Since graduating from the North Carolina State University’s Financial Mathematics Master’s program in 2017 and joining FRG, she has taken on leadership roles in developing project documentation as well as improving internal documentation processes.

 

 

Documenting CECL

The CECL Standard requires more than just another update in the calculation of a financial institution’s (FI’s) allowance for credit losses; the new standard also pushes institutions to be more involved in the entire allowance process, especially on the management/executive level. From explanations, justifications and rationales to policies and procedures, the standard requires them all. The FI needs to discuss them, understand them, and document them.

The first point is to discuss all decisions that must be made regarding the CECL process. This includes everything from the definition of default to the justification of which methodology to use for which segment of the data. Although these discussions may be onerous, the CECL standard requires full understanding and completeness of all decisions. Once there is understanding, all decisions must be documented for regulation purposes:

CECL Topic 326-20-50-10: An entity shall provide information that enables a financial statement user to do the following:

  1. Understand management’s method for developing its allowance for credit losses.
  2. Understand the information that management used in developing its current estimate of expected credit losses.
  3. Understand the circumstances that caused changes to the allowance for credit losses, thereby affecting the related credit loss expense (or reversal) reported for the period.

CECL Topic 326-20-50-11: To meet the objectives in paragraph 326-20-50-10, an entity shall disclose all of the following by portfolio segment and major security type:

  1. A description of how expected loss estimates are developed
  2. A description of the entity’s accounting policies and methodology to estimate the allowance for credit losses, as well as discussion of the factors that influenced management’s current estimate of expected credit losses, including:
    • Past events
    • Current conditions
    • Reasonable and supportable forecasts about the future
  3. A discussion of risk characteristics relevant to each portfolio segment
  4. Etc.

Although these may seem like surprising jumps in requirements for CECL, these are simply more defined requirements than under existing ALLL guidance. Note that some of the general requirements under the existing guidance will remain relevant under CECL, such as:

  • “the need for institutions to appropriately support and document their allowance estimates”
  • the “…responsibility for developing, maintaining, and documenting a comprehensive, systematic, and consistently applied process for determining the amounts of the ACL and the provision for credit losses.”
  • the requirement “…that allowances be well documented, with clear explanations of the supporting analyses and rationale.”

As you can see, documentation is an important component of the CECL standard. While the documentation will, at least initially, require more effort to produce, it will also give the FI opportunity to fully understand the inner workings of their CECL process.

Lastly, advice to avoid some headache—take the time to document throughout the entire process of CECL. As my math professor always said, “the due date is not the do date.”

Resources:

  1. FASB Accounting Standards Update, No. 2016-13, Financial Instruments – Credit Losses (Topic 326).
  2. Frequently Asked Questions on the New Accounting Standard on Financial Instruments – Credit Losses. FIL-20-2019. April 3, 2019.

Samantha Zerger, business analytics consultant with FRG, is skilled in technical writing. Since graduating from the North Carolina State University’s Financial Mathematics Master’s program in 2017 and joining FRG, she has taken on leadership roles in developing project documentation as well as improving internal documentation processes.

CECL – The Power of Vintage Analysis

I would argue that a critical step in getting ready for CECL is to review the vintage curves of the segments that have been identified. Not only do the resulting graphs provide useful information but the process itself also requires thought on how to prepare the data.

Consider the following graph of auto loan losses for different vintages of Not-A-Real-Bank bank[1]:

 

While this is a highly-stylized depiction of vintage curves, its intent is to illustrate what information can be gleaned from such a graph. Consider the following:

  1. A clear end to the seasoning period can be determined (period 8)
  2. Outlier vintages can be identified (2015Q4)
  3. Visual confirmation that segmentation captures risk profiles (there aren’t a substantial number of vintages acting odd)

But that’s not all! To get to this graph, some important questions need to be asked about the data. For example:

  1. Should prepayment behavior be captured when deriving the loss rates? If so, what’s the definition of prepayment?
  2. At what time period should the accumulation of losses be stopped (e.g., contractual term)?
  3. Is there enough loss[2] behavior to model on the loan level?
  4. How should accounts that renew be treated (e.g., put in new vintage)?

In conclusion, performing vintage analysis is more than just creating a picture with many different colors. It provides insight into the segments, makes one consider the data, and, if the data is appropriately constructed, positions one for subsequent analysis and/or modeling.

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

 

[1] Originally I called this bank ACME Bank but when I searched to see if one existed I got this, this, and this…so I changed the name. I then did a search of the new name and promptly fell into a search engine rabbit hole that, after a while, I climbed out with the realization that for any 1 or 2 word combination I come up with, someone else has already done the same and then added bank to the end.

[2] You can also build vintage curves on defaults or prepayment.

 

RELATED:

CECL—Questions to Consider When Selecting Loss Methodologies

CECL—The Caterpillar to Butterfly Evolution of Data for Model Development

CECLData (As Usual) Drives Everything

CECL—Questions to Consider When Selecting Loss Methodologies

Paragraph 326-20-30-3 of the Financial Accounting Standards Board (FASB) standards update[1] states: “The allowance for credit losses may be determined using various methods”. I’m not sure if any statement, other than “We need to talk”, can be as fear inducing. Why is it scary? Because in the world of details and accuracy, this statement is remarkably vague and not prescriptive.

Below are some questions to consider when determining the appropriate loss methodology approaches for a given segment.

How much history do you have?

If a financial institution (FI) has limited history[2] then the options available to them are, well, limited. To build a model one needs sufficient data to capture the behavior (e.g., performance or payment) of accounts. Without enough data the probability of successfully building a model is low. Worse yet, even if one builds a model, the likelihood of it being useful and robust is minimal. As a result, loss methodology approaches that do not need a lot of data should be considered (e.g., discount cashflow or a qualitative factor approach based on industry information).

Have relevant business definitions been created?

The loss component approach (decomposing loss into PD, LGD, and EAD) is considered a leading practice at banks[3]. However, in order to use this approach definitions of default and, arguably, paid-in-full, need to be created for each segment being modeled. (Note: these definitions can be the same or different across segments.) Without these definitions, one does not know when an account has defaulted or paid-off.

Is there a sufficient number of losses or defaults in the data?

Many of the loss methodologies available for consideration (e.g., loss component or vintage loss rates) require enough losses to discern a pattern. As a result, banks that are blessed with infrequent losses can feel cursed when they try to implement one of those approaches. While low losses do not necessarily rule out these approaches, it does make for a more challenging process.

Are loan level attributes available, accurate, and updated appropriately?

This question tackles the granularity of an approach instead of an approach itself. As mentioned in the post CECL – Data (As Usual) Drives Everything, there are three different data granularity levels a model can be built on. Typically, the decision is between loan-level versus segment level. Loan-level models are great for capturing sensitivities to loan characteristics and macroeconomic events provided the loan characteristics are accurate and updated (if needed) on a regular interval.

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

 

[1]FASB accounting standards update can be found here

[2] There is no consistent rule, at least that I’m aware of, that defines “limited history”. That said, we typically look for clean data reaching back through an economic cycle.

[3] See: Capital Planning at Large Bank Holding Companies: Supervisory Expectations and Range of Current Practice August 2013

RELATED:

CECL—The Caterpillar to Butterfly Evolution of Data for Model Development

CECLData (As Usual) Drives Everything

The Importance of Technical Communication

This is the introduction to a new blog series, The Importance of Technical Communication, which will focus on topics such as verbal and written communication, workplace etiquette, and teamwork in the workplace.

Soft skills, as a general term, include interpersonal skills, leadership, dependability, willingness to learn, and effective communication skills that can be used in any career. These are known by sociologists and anthropologists as skills that are generally required to become a functioning member of society. But, it seems that there are many articles pointing out a lack of these soft skills among college graduates and stating it as a main reason why many cannot get hired. Some headlines include:

Results from a survey by the Workforce Solutions Group at St. Louis Community College regard these deficiencies specifically as applicant shortcomings. In the St. Louis regional survey, it states that poor work habits, lack of critical thinking and problem solving skills, lack of teamwork or collaboration, and lack of communication or interpersonal skills rank the highest in applicant shortcomings within both technology and finance domains.

 TechnologyFinance
Poor work habits56%50%
Lack of critical thinking and problem solving skills44%50%
Lack of teamwork or collaboration49%43%
Lack of communication or interpersonal skills58%38%
Table 1: Applicant Shortcomings – 2018 State of St. Louis Workforce Report to the Region

In today’s society, with tools at our fingertips, communication is key. In the workplace, interpersonal skills are needed at a rapid, daily pace. Often other workplace issues, such as lack of collaboration skills, arise from communication issues. Given these alarming statistics, how do we, in the technology and finance domain, encourage the improvement of these skills within our companies and deal with applicants who lack them? This blog series will discuss these questions and provide tips on how to correctly technically communicate in the workplace.

Samantha Zerger, business analytics consultant with the Financial Risk Group, is skilled in technical writing. Since graduating from the North Carolina State University’s Financial Mathematics Master’s program in 2017 and joining FRG, she has taken on leadership roles in developing project documentation as well as improving internal documentation processes.

 

CECL – Data (As Usual) Drives Everything

To appropriately prepare for CECL a financial institution (FI) must have a hard heart-to-heart with itself about its data. Almost always, simply collecting data in a worksheet, reviewing it for gaps, and then giving it the thumbs up is insufficient.

Data drives all parts of the CECL process. The sections below, by no means exhaustive, provide key areas where your data, simply being by your data, constrains your options.

Segmentation

Paragraph 326-20-30-2 of the Financial Accounting Standards Board (FASB) standards update[1] states: “An entity shall measure expected credit losses of financial assets on a collective (pool) basis when similar risk characteristic(s) exist.” It then points to paragraph 326-20-55-5 which provides examples of risk characteristics, some of which are: risk rating, financial asset type, and geographical location.

Suggestion: prior to reviewing your data consider what risk profiles are in your portfolio. After that, review your data to see if it can adequately capture those risk profiles. As part of that process consider reviewing:

  • Frequency of missing values in important variables
  • Consistency in values of variables
  • Definitional consistency[2]
Methodology Selection

The FASB standard update does not provide guidance as to which methodologies to use[3]. That decision is entirely up to the FI[4]. However, the methodologies that are available to the FI are limited by the data it has. For example, if an FI has limited history then any of the methodologies that are rooted in historical behavior (e.g., vintage analysis or loss component) are likely out of the question.

Suggestion: review the historical data and ask yourself these questions: 1) do I have sufficient data to capture the behavior for a given risk profile?; 2) is my historical data of good quality?; 3) are there gaps in my history?

Granularity of Model

Expected credit loss can be determined on three different levels of granularity: loan, segment (i.e., risk profile), and portfolio. Each granularity level has a set of pros and cons but which level an FI can use depends on the data.

Suggestion: review variables that are account specific (e.g., loan-to-value, credit score, number of accounts with institution) and ask yourself: are the sources of these variables reliable? Do they get refreshed often enough to capture changes in customer or macroeconomic environment behavior?

Hopefully, this post has started you critically thinking about your data. While data review might seem daunting, I cannot stress enough—it’s needed, it’s critical, it’s worth the effort.

 

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

 

[1] You can find the update here

[2] More on what these mean in a future blog post

[3] Paragraph 326-20-30-3

[4] A future blog post will cover some questions to ask to guide in this decision.

 

RELATED:

CECL—The Caterpillar to Butterfly Evolution of Data for Model Development

Avoiding Discrimination in Unstructured Data

An article published by the Wall Street Journal on Jan. 30, 2019  got me thinking about the challenges of using unstructured data in modeling. The article discusses how New York’s Department of Financial Services is allowing life insurers to use social media, as well as other nontraditional sources, to set premium rates. The crux: the data cannot unfairly discriminate.  

I finished the article with three questions on my mind. The first: How does a company convert unstructured data into something useful? The article mentions that insurers are leveraging public information – like motor vehicle records and bankruptcy documents – in addition to social media. Surely, though, this information is not in a structured format to facilitate querying and model builds.

Second: How does a company ensure the data is good quality? Quality here doesn’t only mean the data is clean and useful, it also means the data is complete and unbiased. A lot of effort will be required to take this information and make it model ready. Otherwise, the models will at best provide spurious output and at worst provide biased output.

The third: With all this data available what “new” modeling techniques can be leveraged? I suspect many people read that last sentence and thought AI. That is one option. However, the key is to make sure the model does not unfairly discriminate. Using a powerful machine learning algorithm right from the start might not be the best option. Just ask Amazon about its AI recruiting tool.[1]

The answers to these questions are not simple, and they do require a blend of technological aptitude and machine learning sophistication. Stay tuned for future blog posts as we provide answers to these questions.

 

[1] Amazon scraps secret AI recruiting tool that showed bias against women

 

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

Subscribe to our blog!