Through the lens of Financial Risk, this blog series will focus on Financial Institutions as a premier business use case for Artificial Intelligence and Machine Learning.
This blog series has covered how a financial institution (FI) can use machine learning (ML) and how these algorithms can augment existing methods for mitigating financial and non-financial risk. To tie it all together, the focus now will be on different learning types of ML algorithms:
- Supervised Learning
- Unsupervised Learning
- Semi-Supervised Learning
Deciding which learning type and ultimately which algorithm to use depends on two key factors: the data and the business use. In regards to data, there are two “formats” in which it exists. The first type is structured data. This type of data is organized and often takes the form of tables (columns and rows). The second type is unstructured data. This type of data may have a structure of its own, but is not in a standardized format. Examples of these include PDFs, recorded voice, and video feed. This data can provide great value but will need to be reformatted so an algorithm can consume it.
Supervised learning algorithms draw inferences from input datasets that have a well-defined dependent, or target, variable. This is referred to as labeled data. Consider the scenario when an FI wants to predict loss due to fraud. For this they would need a labeled dataset containing historical transactions with a target variable that populates for a known fraudulent transaction. The FI might then use a decision tree to separate the data iteratively into branches to determine estimates for likelihood of fraud. Once the decision tree captures the relationships in the data, it can then be deployed to estimate the potential for future fraud cases.
Unsupervised learning algorithms draw inferences from input datasets with an undefined dependent variable. This is referred to as unlabeled data. These kinds of algorithms are typically used for pre-work to prepare data for another process. This work ranges from data preparation to data discovery and, at times, includes dimensionality reduction, categorization, and segmentation. Returning to our fraud example, consider the data set without the target variable (i.e., no fraud indicator). In this scenario, the FI could use an unsupervised learning algorithm to identify the most suspicious transactions through means of clustering.
Sometimes, a dataset will have both labeled and unlabeled observations, meaning a value for the target variable is known for a portion of the data. Data in this case can be used for semi-supervised learning, which is an iterative process that utilizes both supervised and unsupervised learning algorithms to complete a job. In our fraud example, a neural net may be used to predict likelihood of fraud based on the labeled data (supervised learning). The process can then use this model, along with a clustering algorithm (unsupervised learning), to assign a value to the fraud indicator for the most suspicious transactions in the unlabeled data.
To learn more about ML algorithms and their applications for risk mitigation, please contact us or visit our Resources page for other ML and AI material, including the New Machinist Journal Vol. 1 – 5 .
Hannah Wiser is an associate consultant with FRG. After graduating with her Master’s in Quantitative Economics and Econometrics from East Carolina University in 2019, she joined FRG and has worked on projects focusing on technical communication and data governance.