Reducing Model Risk with Testing

by | Sep 17, 2025 | Model Development | 0 comments

This blog series explores how applying a set of coding best practices can help reduce errors during model development. We’ve captured these practices in the acronym MATHS: Modularized, Archived, Tested, Honed, and Standardized. This post focuses on the Tested element.

One of the greatest sources of model risk is poorly written code.

What does poorly written code mean? There are two general classifications: syntax errors and conceptual errors.

Syntax Errors

Syntax errors are those that occur when a developer violates a programming language’s (e.g., Python, R, SAS) syntax rule. Examples of these include:

  • a missing bracket,
  • improper indentation,
  • incorrect spelling of a function.

These errors are usually easy to find. In most cases, the code will not run. How these errors surface depends on the code’s programming language. Usually, the log will spew out warnings or errors indicating something is amiss. Using these indicators from the log, the developer can then correct the issue.

Conceptual Errors

The scary errors are the conceptual (or logical) ones. These lurk in code and rarely surface as an error or warning in the log. Examples of these include:

  • using the wrong number of time steps when lagging a variable,
  • forgetting a condition in a business rule,
  • coding a value check that only looks at lowercase values of a variable even though the variable might have capitalized values.

These errors are challenging to find. If you are lucky, they will surface in the form of nonsensical results. If you are unlucky, these errors will surface when model validation is reviewing your model. And if you are really unlucky, they will not be found until an improbable edge case surfaces the bug during production causing an epic model, and possibly process, failure. We call these “black swan bugs.”

Given the scary nature of the conceptual error, how do we reduce their existence? The simple answer: test the code.

Types of Testing

Anyone who has ever been on a software development project knows the importance of testing. On these projects, the code typically undergoes three sets of tests: unit, system integration, and user-acceptance. This ensures the developers built a solution that performs as expected by the end user. There is no reason writing code used in the model development process should not be tested.

What is system integration testing (SIT) and user-acceptance testing (UAT)?

SIT is when one tests whether the parts (e.g., modules) work together as expected within the larger system. UAT is when one tests whether the solution provides the user all the functional requirements they requested.

A model developer may not need a fully-fleshed out testing process. At a minimum, though, model development should go through rigorous unit testing. This type of test is when the developer executes test cases on a module (i.e., unit) to confirm it is working as intended.

How does tested code reduce model risk?

Testing decreases model risk because it verifies the code is functioning as it should. Which means there are (hopefully) no conceptual errors[1].

What is risk-based testing? 

It is a testing methodology that prioritizes the highest risk tests based on module’s impact on the system and likelihood of defects.

While FRG’s unit testing process for model development code may not be formal, it is still rigorous. We use both positive and negative tests. A positive test shows that when the module receives a valid input it produces an expected output. A negative test shows that when the module receives an invalid input it creates an error or produces incorrect output.

Those are important tests but usually not fun for the tester to construct. The tests that are the most fun to build are the edge case tests. These tests occur when you throw an extreme scenario at the module and see how it responds.

How to test code: an example

Let’s look at an example of some of the tests one might create to check that their module is functioning as intended. In this example, the purpose of the module is to standardize all the values in a variable (i.e., subtract mean and divide by standard deviation) based on the mean and standard deviation value entered by the user.

Test Test Type Expected Behavior
Obtain the mean and standard deviation of a series of numeric values, pass those into the module along with the series they are based on. Apply those values to the series being passed in for comparison. Positive The standardized values produced by the module should equal those calculated outside the module using the same mean, standard deviation, and series.
Using the same mean and standard deviation, pass those into the module along with a different numeric series. Apply those values to the series being passed in for comparison. Positive The standardized values produced by the module should equal those calculated outside the module using the same mean, standard deviation, and series[2].
Pass in a mean = 0, a nonnegative and non-zero standard deviation, and a series of numeric values. Apply those values to the series being passed in for comparison. Positive The standardized values produced by the module should equal those calculated outside the module using the same mean, standard deviation, and series.
Pass in a non-zero mean, a negative value for standard deviation, and a series of numeric values. Negative This should cause an error. Standard deviations cannot be negative.
Pass in non-zero mean, a missing value for standard deviation, and a series of numeric values. Negative This should cause an error. The standardizing process requires values for both mean and standard deviation.
Pass in valid values for both mean and standard deviation, and a series of character values. Edge This should cause an error. The standardizing process only works on numeric values.

 

 The above are just some tests cases a model developer might consider when testing this module. While it does require more work up front to test, the benefits exceed the time commitment. Plus, if you have embraced the Archived best practice, then these unit tests can be committed to the model development repository and reused any time you make changes to the module. This ensures that any updates to the code do not result in unintended consequences.

In conclusion, the Tested best practice is about ensuring the code used for model development is performing as expected. This creates robust modules. As a result, model developers will have confidence in the code they use for their current and future model development initiatives.

Jonathan Leonardelli, FRM, Director of Business Analytics for FRG, leads the group responsible for business analytics, statistical modeling and machine learning development, documentation, and training. He has more than 20 years’ experience in the area of financial risk.

[1] This may sound like a useless test. However, it does confirm that mean and standard deviation are not “linked” to the data series which is good to know.

[2] We asked our testing lead at the firm whether code could be entirely error free. Her response was that for super simple processes, possibly. However, in most cases the answer is no because it is impossible to consider everything that can wrong in a complex system. This is why risk-based testing is used.

RELATED:

Model Risk Management: Reduce your Model Risk through Coding Best Practices

Reducing Model Risk Through Modular Programming

Model Risk Management Through Archiving