AI in FIs: Foundations of Machine Learning in Financial Risk

This blog series will focus on Financial Institutions as a premier business use case for Machine Learning through the lens of financial risk.

Today, opportunities exist for professionals to delegate time-intensive, dense, and complex tasks to machines. Machine Learning (ML) has the ability to automate Artificial Intelligence (AI) and is becoming much more robust as technological advances ease and lessen resource constraints.

Financial Institutions (FI) are constantly under pressure to keep up with evolving technology and regulatory requirements. Compared to what has been used in the past, modern tools have become more user-friendly and flexible; they are also easily integrated with existing systems. This evolution is enabling advanced tools such as ML to regain relevance across industries, including finance.

So, how does ML work? Imagine someone is learning to throw a football. Over time, the to-be quarterback is trained to understand how to adjust the speed of the ball, the strength of the throw, and the path of trajectory to meet the expected routes of the receivers. In a similar way, machines are trained to perform a specific task, such as clustering, by means of an algorithm. Just as the quarterback is trained by a coach, a machine learns to perform a specific task from a ML algorithm. This expands the possibilities for ways technology can be used to add value to the business.

What does this mean for FIs? The benefit of ML is that value can be added in areas where efficiency, prediction, and accuracy are most critical.  To accomplish this, the company aligns these four components: data, applications, infrastructure, and business needs.

 

A flow chart showing the relationship between technology and data

 

The level of data maturity of FIs determines their capacity for effectively utilizing both structured and unstructured data. A well-established data governance framework lays the foundation for proper use of data for a company. Once their structured data is effectively governed, sourced, analyzed, and managed, they can then employ more advanced tools such as ML to supplement their internal operations. Unstructured data can also be used, but the company must first harness the tools and computing power capable of handling it.

Many companies are turning to cloud computing for their business-as-usual processes and for deploying machine learning. There are options for hosting cloud computing either on-premises or with public cloud services, but these are a matter of preference. Either method provides scalable computing power, which is essential when using ML algorithms to unlock the potential value that massive amounts of data provides.

Interested in reading more? Subscribe to the FRG blog to keep up with AI in FIs.

Hannah Wiser is an assistant consultant with FRG. After graduating with her Master’s in Quantitative Economics and Econometrics from East Carolina University in 2019, she joined FRG and has worked on projects focusing on technical communication and data governance.

 

 

 

Data Management – Leveraging Data for a Competitive Advantage

This is the first in a series of blogs that explore how data can be an asset or a risk to organizations in an uncertain economic climate.

Humanity has always valued novelty. Since the advent of the Digital Age, this preference has driven change at an astronomical pace. For example, more data was generated in the last two years than in the entire human history to date, a concept made more staggering by Machine Learning and Artificial Intelligence tools that allow users to access and analyze data as never before. The question now is: how can business leaders and investors best make sense of this information and use it for their competitive advantage?

Traditionally, access to good data has been a limiting factor. Revolutionary business strategies were reserved for those who knew how to obtain, prepare, and analyze it. While top-tier decision making is still data- and insight-driven, today’s data challenges are characterized more by glut than scarcity, both in terms of overall volume of information and the tools available to make sense of it. As of today, only 0.5% of data that is produced is even analyzed.

This overabundance of information and tech tools has ironically led to greater uncertainty for business leaders. Massive data sets and powerful, user-friendly tools often mask underlying issues, resulting in many firms maintaining and processing duplicates of their data, creating silos of critical but unconnected data that must be sorted and reconciled. Analysts still spend between 80% of their time collecting and preparing their data and only 20% analyzing it.

Global interconnectivity is making the world smaller and more competitive. Regulators, who understand the power of data, are increasing controls over it. Now, more than ever, it is critical for firms to take action. To remain competitive, organizations must understand the critical data that drives their business, so they are able to make use of it and alternative data sets for future decision making; otherwise they face obsolescence. These are not just internal concerns. Clients are also requesting more customized services and demanding to understand how firms are using their information. Firms must identify critical data and understand that all data is not, and should not, be treated the same so they can extract the full power of the information and meet client and regulatory requirements.

Let’s picture data as an onion. As the core of the onion supports it outer layers, the ‘core’ or critical enterprise data supports all the functions of a business. When the core is strong, so is the rest of the structure. When the core is contaminated or rotten – that is a problem, for the onion and for your company.

A comparison picture showing an onion with healthy core data vs. an onion with a contaminated core.

Data that is core to a business – information like client IDs, addresses, products and positions, to name a few examples – must be solid and healthy enough to support the outer layers of data use and reporting in the organization. This enterprise data must be defined, clean and unique, or the firm will waste time cleaning and reconciling it, and the client, business and regulatory reports that it supports will be inaccurate.

How do you source, define and store your data to cleanly extract the pieces you need? Look at the onion again. You could take the chainsaw approach to slice the onion, which would give you instant access to everything inside, good and contaminated, and will probably spoil your dish. Likewise, if you use bad data at the core, any calculations you perform on it or reports aggregating the data will not be correct. If you need a clean slice of onion required by a specific recipe (or calculated data required for a particular report), precision and cleanliness of the slice (good core data and unique contextual definition) is key.

Once your core data is unique, supported and available, clients, business and corporate users can combine it with alternative and non-traditional data sets, to extract information, enhance it and add value. As demand for new “recipes” of data (for management, client or regulatory reporting) is ever increasing, firms who do not clean up and leverage their core data effectively will become obsolete. These demands can be anything from data needed for instant access and client reporting across different form factors (i.e. Web, iOS & Android apps), to data visualization and manipulation tools for employees analyzing new and enhanced information to determine trends. Demand also stems from the numerous requirements needed to comply with the complex patchwork of regional financial regulations across the globe. Many different users, many different recipes, all reliant on the health of their core data (onion core).

What is the actionable advice when you read a headline like: “A recent study in the Harvard Business Review found that over 92% of surveyed firms agreed that data analytics for decision making will be more important 2 years from now”? We have some ideas. In this blog series, FRG Data Advisory & Analytics will take you through several use cases to outline what data is foundational or core to business operations and how to achieve the contextual precision demanded from the market and regulators within our current environment of uncertainty, highlighting both how data can be an asset, or a potential risk, if not treated appropriately.

Dessa Glasser, Ph.D., is an FRG Principal Consultant, with 30+ years experience designing and implementing innovative solutions and organizations in data, risk, and analytics. She leads the Data Advisory & Analytics Team for FRG and focuses on data, analytics and regulatory solutions for clients.  

Edward Hanlon is a Senior Consultant and Engagement Manager on FRG’s Data Advisory & Analytics Team. He focuses on development and implementation of data strategy solutions for FRG, leveraging previous experience launching new Digital products and reengineering operational models as a Digital Technology platform owner and program lead in financial services.

 

The Financial Risk Group Is Now FRG

We’re making it official: After more than a decade of operating as “The Financial Risk Group,” we’re changing our name to reflect what our clients have called us since the early days. We are excited to formally debut our streamlined “FRG” brand and logo.

Our new look is a natural progression from where we started 14 years ago, when the three founding partners of this company set a lofty goal. We wanted to become the premier risk management consulting company. It seemed ambitious, considering we were operating out of Ron Holanek’s basement at the time, but we knew we had at least two things going for us: a solid business plan and a drive to do whatever it took to deliver success for our clients.

And look at us now. It would take a while to list everything we’ve accomplished over the last decade plus, but here’s a quick run down of some of the items we’ve crossed off the company bucket list since 2006.

  • We’ve grown our numbers from the original three to more than 50 talented risk consultants, analysts, and developers.
  • We moved out of the basement (it would have been a tight fit, considering). We settled in historic downtown Cary in 2008, but quickly spilled out of our main office there and into several satellite locations. In 2018 we bought an older building a few blocks away and renovated it to a gleaming modern office hub for our US headquarters.
  • We opened offices in Toronto, Canada and Kuala Lumpur, Malaysia, to better serve our clients around the world.
  • We opened several new business units, expanding on our original core focus of delivering automated technology solutions. Adding dedicated Data and Risk, Business Analytics, and Platform Hosting teams enlarged our wheelhouse, so that we have experts that can walk our clients through the entire lifecycle of risk management programs. (Shameless plug: you can learn more about a number of them via a series of videos that are sprinkled throughout the website). We now also work with institutional investors on innovative models and product offerings to help streamline processes and drive excess returns.
  • We formalized our NEET (New Employee Excellence Training) apprenticeship program, so we can nurture and enhance the specific blend of skills that risk management professionals need to solve real-world business challenges. The program has struck a chord with our clients, so we built a program to recruit and develop risk management talent for them, as well.

Obviously, we couldn’t have done any of this without continued trust and support from our clients. Our clientele represents a cross section of the world’s largest banking, capital markets, insurance, energy and commodity firms – stretching across continents and across industries – and we recognize that they’re some very smart people. When they talk, we listen, and what they have been saying for a few years now is that the brand we started with in 2006 should evolve with the evolution of the company.

It is natural for people to streamline words into acronyms.  In our industry, there are many, and knowing them is very important to our job.  Our clients, partners, and even our internal teams used FRG from day one, but now is the time to make it official.  By rebranding and fully embracing the FRG name, we hope that it, too, becomes a well-known acronym in the risk management space, one that people equate with integrity and quality of work.

So we’re celebrating 2020 with the new name, a new look, and a new logo. But it’s like they say. The more things change, the more they stay the same. That’s why you can be sure that our core values, our core principle – to fulfill our clients’ needs, while surpassing their expectations – still guide us every day. We are our reputation. We are FRG.

Mike Forno is a Partner and Senior Director of Sales with FRG.

 

CECL Preparation: Handling Missing Data for CECL Requirements


Most financial institutions (FI’s) find that data is the biggest hurdle when it comes to regulatory requirements: they don’t have enough information, they have the wrong information, or they simply have missing information. With the CECL accounting standard, the range of data required to estimate expected credit losses (e.g., reasonable and supportable forecasts) grew from what was previously required. While this is a good thing in the long run (as the requirements gradually help FI’s build up their inventory of clean, model-ready data), many FI’s are finding it difficult to address data problems right now. In particular, how to handle missing data is a big concern.

Missing data becomes a larger issue because not all missing data is the same. Classifications, based on the root causes of the missing data, are used as guidance in choosing the appropriate method for data replacement. The classifications consist of:

  1. Not missing at random – the cause of the missing data is related to the missing values
    • For example, CLTV values are missing when previous values have exceeded 100.
  2. Missing at random (MAR) – the cause of the missing data is related to observed values of other variables
    • For example, DTI values are missing when the number of borrowers is 2 or more.
  3. Missing completely at random (MCAR) – the cause of the missing data is unrelated to values of the variable or other variables; data is missing due to an entirely random process
    • For example, LTV values are missing because a system outage caused recently loaded data to be reset to default value of missing.

Once a classification is made for the reason of missing data, it is easier to determine its resolution. For example, if the data is MCAR there is no pattern and therefore, involves no loss of information if those observations with the missing values are dropped. Unfortunately, data is rarely MCAR.

The following table represents some methods (not meant to be all inclusive) a FI may use to handle other, more common, data issues.

MethodDescriptionProsCons
Last observation carried forward / backwardFor a given account, use a non-missing value in that variable to fill missing values before and/or after it• Simple
• Uses actual value that the account has
• Useful for origination variables
• Assumes stability in account behavior
• Assumes data is MCAR

Mean ImputationUser of the verage value of the observed observations for the missing value• Simple• Distorts empirical distribution of data
• Does not use all information in the data set
Hot decking and cold deckingReplace missing values with a value from a similar observation in the sample (cold decking is when one uses a similar observation out of sample)• Conceptually straightforward
• Uses existing relationships in the data
• Can be difficult to define characteristics of a similar observation
• Continuous data can be problematic
• Assumes data is MAR
RegressionUse univariate or multivariate regression models to impute missing value – dependent variable is the variable that is missing• Fairly easy to implement
• Uses existing relationships in the data
• Can lead to overstating relationships among the variables
• Estimated values may fall out of accepted ranges
• Assumes data is MAR

Understanding why the data is missing is an important first step in resolving the issue. Using the imputation methods outlined above can provide a temporary solution in creating clean historical data for methodology development. However, in the long run, FI’s will benefit from establishing a more permanent solution by constructing data standards/procedures and implementing a robust on-going monitoring process to ensure the data is accurate, clean, and consistent.

 

Resources:

  1. FASB Accounting Standards Update, No. 2016-13, Financial Instruments – Credit Losses (Topic 326).

Samantha Zerger, business analytics consultant with FRG, is skilled in technical writing. Since graduating from the North Carolina State University’s Financial Mathematics Master’s program in 2017 and joining FRG, she has taken on leadership roles in developing project documentation as well as improving internal documentation processes.

CECL Preparation: How Embracing SR 11-7 Guidelines Can Support the CECL Process

The Board of Governors of the Federal Reserve System’s SR 11-7 supervisory guidance (2011) provides an effective model risk management framework for financial institutions (FI’s). SR 11-7 covers everything from the definition of a model to the robust policies/procedures that should exist within a model risk management framework. To reduce model risk, any FI should consider following the guidance throughout internal and regulatory processes as its guidelines are comprehensive and reflect a banking industry standard.

The following items and quotations represent an overview of the SR 11-7 guidelines (Board of Governors of the Federal Reserve System, 2011):

  1. The definition of a model – “the term model refers to a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.”
  2. A focus on the purpose/use of a model – “even a fundamentally sound model producing accurate outputs consistent with the design objective of the model may exhibit high model risk if it is misapplied or misused.”
  3. The three elements of model risk management:
    • Robust model development, implementation, and use – “the design, theory, logic underlying the model should be well documented and generally supported by published research and sound industry practice.”
    • Sound model validation process – “an effective validation framework should include three core elements: evaluation of conceptual soundness, …, ongoing monitoring, …, and benchmarking, outcomes analysis, …”
    • Governance – “a strong governance framework provides explicit support and structure to risk management functions through policies defining relevant risk management activities, procedures that implement those policies, allocation of resources, and mechanisms for evaluating whether policies and procedures are being carried out as specified.”

The majority of what the SR 11-7 guidelines discuss applies to some of the new aspects from the accounting standard CECL (FASB, 2016). Any FI under CECL regulation must provide explanations, justifications, and rationales for the entirety of the CECL process including (but not limited to) model development, validation, and governance. The SR 11-7 guidelines will help FI’s develop effective CECL processes in order to limit model risk.

Some considerations from the SR 11-7 guidelines in regards to the components of CECL include (but are not limited to):

  • Determining appropriateness of data and models for CECL purposes. Existing processes may need to be modified due to some differing CECL requirements (e.g., life of loan loss estimation).
  • Completing comprehensive documentation and testing of model development processes. Existing documentation may need to be updated to comply with CECL (e.g., new models or implementation processes).
  • Accounting for model uncertainty and inaccuracy through the understanding of potential limitations/assumptions. Existing model documentation may need to be re-evaluated to determine if new limitations/assumptions exist under CECL.
  • Ensuring validation independence from model development. Existing validation groups may need to be further separated from model development (e.g., external validators).
  • Developing a strong governance framework specifically for CECL purposes. Existing policies/procedures may need to be modified to ensure CECL processes are being covered.

The SR 11-7 guidelines can provide FI’s with the information they need to start their CECL process. Although not mandated, following these guidelines overall is important in reducing model risk and in establishing standards that all teams within and across FI’s can follow and can regard as a true industry standard.

Resources:

  1. Board of Governors of the Federal Reserve System. “SR 11-7 Guidance on Model Risk Management”. April 4, 2011.
  2. Daniel Brown and Dr. Craig Peters. “New Impairment Model: Governance Considerations”. Moody’s Analytics Risk Perspectives. The Convergence of Risk, Finance, and Accounting: CECL. Volume VIII. November 2016.
  3. Financial Accounting Standards Board (FASB). Financial Instruments – Credit Losses (Topic 326). No. 2016-13. June 2016.

Samantha Zerger, business analytics consultant with FRG, is skilled in technical writing. Since graduating from the North Carolina State University’s Financial Mathematics Master’s program in 2017 and joining FRG, she has taken on leadership roles in developing project documentation as well as improving internal documentation processes.

 

CECL Preparation: The Power of Vintage Analysis

I would argue that a critical step in getting ready for CECL is to review the vintage curves of the segments that have been identified. Not only do the resulting graphs provide useful information but the process itself also requires thought on how to prepare the data.

Consider the following graph of auto loan losses for different vintages of Not-A-Real-Bank bank[1]:

 

While this is a highly-stylized depiction of vintage curves, its intent is to illustrate what information can be gleaned from such a graph. Consider the following:

  1. A clear end to the seasoning period can be determined (period 8)
  2. Outlier vintages can be identified (2015Q4)
  3. Visual confirmation that segmentation captures risk profiles (there aren’t a substantial number of vintages acting odd)

But that’s not all! To get to this graph, some important questions need to be asked about the data. For example:

  1. Should prepayment behavior be captured when deriving the loss rates? If so, what’s the definition of prepayment?
  2. At what time period should the accumulation of losses be stopped (e.g., contractual term)?
  3. Is there enough loss[2] behavior to model on the loan level?
  4. How should accounts that renew be treated (e.g., put in new vintage)?

In conclusion, performing vintage analysis is more than just creating a picture with many different colors. It provides insight into the segments, makes one consider the data, and, if the data is appropriately constructed, positions one for subsequent analysis and/or modeling.

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

 

[1] Originally I called this bank ACME Bank but when I searched to see if one existed I got this, this, and this…so I changed the name. I then did a search of the new name and promptly fell into a search engine rabbit hole that, after a while, I climbed out with the realization that for any 1 or 2 word combination I come up with, someone else has already done the same and then added bank to the end.

[2] You can also build vintage curves on defaults or prepayment.

 

RELATED:

CECL—Questions to Consider When Selecting Loss Methodologies

CECL—The Caterpillar to Butterfly Evolution of Data for Model Development

CECLData (As Usual) Drives Everything

CECL Preparation: Questions to Consider When Selecting Loss Methodologies

Paragraph 326-20-30-3 of the Financial Accounting Standards Board (FASB) standards update[1] states: “The allowance for credit losses may be determined using various methods”. I’m not sure if any statement, other than “We need to talk”, can be as fear inducing. Why is it scary? Because in the world of details and accuracy, this statement is remarkably vague and not prescriptive.

Below are some questions to consider when determining the appropriate loss methodology approaches for a given segment.

How much history do you have?

If a financial institution (FI) has limited history[2] then the options available to them are, well, limited. To build a model one needs sufficient data to capture the behavior (e.g., performance or payment) of accounts. Without enough data the probability of successfully building a model is low. Worse yet, even if one builds a model, the likelihood of it being useful and robust is minimal. As a result, loss methodology approaches that do not need a lot of data should be considered (e.g., discount cashflow or a qualitative factor approach based on industry information).

Have relevant business definitions been created?

The loss component approach (decomposing loss into PD, LGD, and EAD) is considered a leading practice at banks[3]. However, in order to use this approach definitions of default and, arguably, paid-in-full, need to be created for each segment being modeled. (Note: these definitions can be the same or different across segments.) Without these definitions, one does not know when an account has defaulted or paid-off.

Is there a sufficient number of losses or defaults in the data?

Many of the loss methodologies available for consideration (e.g., loss component or vintage loss rates) require enough losses to discern a pattern. As a result, banks that are blessed with infrequent losses can feel cursed when they try to implement one of those approaches. While low losses do not necessarily rule out these approaches, it does make for a more challenging process.

Are loan level attributes available, accurate, and updated appropriately?

This question tackles the granularity of an approach instead of an approach itself. As mentioned in the post CECL – Data (As Usual) Drives Everything, there are three different data granularity levels a model can be built on. Typically, the decision is between loan-level versus segment level. Loan-level models are great for capturing sensitivities to loan characteristics and macroeconomic events provided the loan characteristics are accurate and updated (if needed) on a regular interval.

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

 

[1]FASB accounting standards update can be found here

[2] There is no consistent rule, at least that I’m aware of, that defines “limited history”. That said, we typically look for clean data reaching back through an economic cycle.

[3] See: Capital Planning at Large Bank Holding Companies: Supervisory Expectations and Range of Current Practice August 2013

RELATED:

CECL—The Caterpillar to Butterfly Evolution of Data for Model Development

CECLData (As Usual) Drives Everything

CECL Preparation: The Caterpillar to Butterfly Evolution of Data for Model Development

I don’t know about you, but I find caterpillars to be a bit creepy[1]. On the other hand, I find butterflies to be beautiful[2]. Oddly enough, this aligns to my views on the different stages of data in relation to model development.

As a financial institution (FI) prepares for CECL, it is strongly suggested (by me at least) to know which stage the data falls into. Knowing its stage provides one with guidance on how to proceed.

The Ugly

At FRG we use the term dirty data to describe data that is ugly. Dirty data typically has these following characteristics (the list is not comprehensive):

  • Unexplainable missing values: The key word is unexplainable. Missing values can mean something (e.g., a value has not been captured yet) but often they indicate a problem. See this article for more information.
  • Inconsistent values: For example, a character variable that holds values for state might have Missouri, MO, or MO. as values. A numeric variable for interest rate might have a value as a percent (7.5) and a decimal (0.075)
  • Poor definitional consistency: This occurs when a rule that is used to classify some attribute of an account changes during history. For example, at one point in history a line of credit might be indicated by a nonzero original commitment amount, but at a different point it might be indicated by whether a revolving flag is non-missing.
The Transition

You should not model or perform analysis using dirty data. Therefore, the next step in the process is to transition dirty data into clean data.

Transitioning to clean data, as the name implies, requires scrubbing the information. The main purpose of this step is to address the issues identified in the dirty data. That is, one would want to fix missing values (e.g., imputation), standardized variable values (e.g., all states are identified by a two-character code), and correct inconsistent definitions (e.g., a line indicator is always based on nonzero original commitment amount).

The Beautiful

A final step must be taken before data can be used for modeling. This step takes clean data and converts it to model-ready data.

At FRG we use the term model-ready to describe clean data with the application of relevant business definitions. An example of a relevant business definition would be how an FI defines default[3]. Once the definition has been created the corresponding logic needs to be applied to the clean data in order to create, say, a default indicator variable.

Just like a caterpillar metamorphosing to a butterfly, dirty data needs to morph to model-ready for an FI to enjoy its true beauty. And, only then, can an FI move forward on model development.

 

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

 

[1] Yikes!

[2] Pretty!

[3] E.g., is it 90+ days past due (DPD) or 90+ DPD or in bankruptcy or in non-accrual or …?

 

RELATED:

CECL—Questions to Consider When Selecting Loss Methodologies

CECLData (As Usual) Drives Everything

CECL Preparation: Data (As Usual) Drives Everything

To appropriately prepare for CECL a financial institution (FI) must have a hard heart-to-heart with itself about its data. Almost always, simply collecting data in a worksheet, reviewing it for gaps, and then giving it the thumbs up is insufficient.

Data drives all parts of the CECL process. The sections below, by no means exhaustive, provide key areas where your data, simply being by your data, constrains your options.

Segmentation

Paragraph 326-20-30-2 of the Financial Accounting Standards Board (FASB) standards update[1] states: “An entity shall measure expected credit losses of financial assets on a collective (pool) basis when similar risk characteristic(s) exist.” It then points to paragraph 326-20-55-5 which provides examples of risk characteristics, some of which are: risk rating, financial asset type, and geographical location.

Suggestion: prior to reviewing your data consider what risk profiles are in your portfolio. After that, review your data to see if it can adequately capture those risk profiles. As part of that process consider reviewing:

  • Frequency of missing values in important variables
  • Consistency in values of variables
  • Definitional consistency[2]
Methodology Selection

The FASB standard update does not provide guidance as to which methodologies to use[3]. That decision is entirely up to the FI[4]. However, the methodologies that are available to the FI are limited by the data it has. For example, if an FI has limited history then any of the methodologies that are rooted in historical behavior (e.g., vintage analysis or loss component) are likely out of the question.

Suggestion: review the historical data and ask yourself these questions: 1) do I have sufficient data to capture the behavior for a given risk profile?; 2) is my historical data of good quality?; 3) are there gaps in my history?

Granularity of Model

Expected credit loss can be determined on three different levels of granularity: loan, segment (i.e., risk profile), and portfolio. Each granularity level has a set of pros and cons but which level an FI can use depends on the data.

Suggestion: review variables that are account specific (e.g., loan-to-value, credit score, number of accounts with institution) and ask yourself: are the sources of these variables reliable? Do they get refreshed often enough to capture changes in customer or macroeconomic environment behavior?

Hopefully, this post has started you critically thinking about your data. While data review might seem daunting, I cannot stress enough—it’s needed, it’s critical, it’s worth the effort.

 

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

 

[1] You can find the update here

[2] More on what these mean in a future blog post

[3] Paragraph 326-20-30-3

[4] A future blog post will cover some questions to ask to guide in this decision.

 

RELATED:

CECL—The Caterpillar to Butterfly Evolution of Data for Model Development

Avoiding Discrimination in Unstructured Data

An article published by the Wall Street Journal on Jan. 30, 2019  got me thinking about the challenges of using unstructured data in modeling. The article discusses how New York’s Department of Financial Services is allowing life insurers to use social media, as well as other nontraditional sources, to set premium rates. The crux: the data cannot unfairly discriminate.  

I finished the article with three questions on my mind. The first: How does a company convert unstructured data into something useful? The article mentions that insurers are leveraging public information – like motor vehicle records and bankruptcy documents – in addition to social media. Surely, though, this information is not in a structured format to facilitate querying and model builds.

Second: How does a company ensure the data is good quality? Quality here doesn’t only mean the data is clean and useful, it also means the data is complete and unbiased. A lot of effort will be required to take this information and make it model ready. Otherwise, the models will at best provide spurious output and at worst provide biased output.

The third: With all this data available what “new” modeling techniques can be leveraged? I suspect many people read that last sentence and thought AI. That is one option. However, the key is to make sure the model does not unfairly discriminate. Using a powerful machine learning algorithm right from the start might not be the best option. Just ask Amazon about its AI recruiting tool.[1]

The answers to these questions are not simple, and they do require a blend of technological aptitude and machine learning sophistication. Stay tuned for future blog posts as we provide answers to these questions.

 

[1] Amazon scraps secret AI recruiting tool that showed bias against women

 

Jonathan Leonardelli, FRM, Director of Business Analytics for the Financial Risk Group, leads the group responsible for model development, data science, documentation, testing, and training. He has over 15 years’ experience in the area of financial risk.

Subscribe to our blog!