Reducing Model Risk through Modular Programming

by | Sep 3, 2025 | Model Development | 0 comments

This blog series explores how applying a set of coding best practices can help reduce errors during model development. We’ve captured these practices in the acronym MATHS: Modularized, Archived, Tested, Honed, and Standardized. Each post will dive into one element of the framework, starting here with Modularized.

The Modularized best practice we use at FRG is rooted in modular programming. The goal of this type of programming is to write code scripts that do a single task. The model developer then brings these modules together to build a larger process. This is different than how some developers code—writing one script that does everything.

A module should serve a single purpose: for example, to retrieve data from a database, or to augment baseline data. The code within the module then carries out one or more actions to achieve that purpose.  The retrieve data from a database might have a single action of joining two database tables, while the augment baseline data module might add variables[1] based on business rules and perform transformations on some of the original variables.

Logically, the “single purpose” makes sense. And more than code developers embrace this idea of not cramming a bunch of purposes into one item. Just look around your room and you can see at least 10 examples of things that were created to serve a single purpose. Except for phones. Those are trying to be everything.

This principle does not require a visual due to its simplicity. However, we are providing one to underscore this point: modularized scripts don’t mean short scripts (although they can be). The idea behind modularization is the focus on a single purpose. The scripts may be long, but all the code within it is for that one purpose.

Benefits of Modular Programming

Are there benefits to modular coding other than appealing to logic?

Modular coding makes it easier to:

  1. Test the code. The T in MATHS stands for Tested. We’ll cover that more in a later blog. For the time being we’ll say this: testing single purpose code is ideal because the tests can be specific and detailed, ensuring more thorough testing.
  2. Update the code. If changes need to occur—e.g., a business rule changes—finding where to make the update is easier in a script that only applies business rules than one that does everything.
  3. Read the code. Have you ever encountered a paragraph that is pages long because the author crammed in a bunch of parentheticals? You first lose the thread, you start thinking about pancakes, then cakes, then wonder if the first ever pancake was like a cake in a pan because that would’ve been awesome—then you fall asleep. Reading a script that is constantly changing purposes is similar. The narrative just gets lost.
  4. Reuse the code. This is a huge benefit that might not be obvious. A piece of code that executes one function well, and has been thoroughly tested, should be reused. Not only is there an efficiency gain by reusing code, but there’s a reduction in model risk as well. Having to write less code reduces the chance of model developers coding “black swan bugs[2]” into the process.

How does Modular Programming Reduce Model Risk?

Using code that is modularized decreases model risk because it:

  • Improves the readability of code.
  • Allows focused testing of the code.
  • Facilitates code reuse.

We can summarize the first two effects as: making sure the code is doing what the model developer wants. These two effects reduce conceptual and coding errors. The last effect reduces future conceptual flaws because there is a decreased need to write new code.

What is code readability?

This relates to how easy it is for a person to follow the logic and flow of the code. This is important because it makes understanding, modifying, and maintaining the code simpler over the long run.

An Example of Modularized Coding

Let’s demonstrate modularized coding through an example. Consider the following:

A manager asks a seasoned model developer to build a new Probability of Default (PD) model for the Credit Card portfolio. The modeler returns to her desk, fires up her favorite code editor, and begins banging out code (or asks GitHub Copilot to bang out the code).

 The first thing she does is write code to query the Economics database to obtain the macroeconomic data needed to fit the model. She next writes a query to extract account level information from the database containing the Credit Card portfolio information. She joins a couple tables together, filters the variables to only those she needs, and then exports it all into a data file. Next, she writes code to perform variable transformations and then even more code to incorporate the business logic to derive new variables for model fitting. Lastly, she writes code to remove unnecessary variables from the data file, splits the data into two new data files – one for model training, one for model testing – and then saves the files into a location on a server.

At this point she pauses, looks at the line number in her code editor and sees she has exceeded 500 lines. She frowns, it felt like 5,000 lines. But she finished the data prep and is now ready to start on the model build.

Applying the Modularized best practice to the scenario above, one might break the code into these modules:

  1. Pull macroeconomic data.
  2. Pull account level data.
  3. Perform variable transformations.
  4. Apply business logic.
  5. Clean up data.
  6. Split data into train and test data files.

The first and last modules are fitting examples of reuseable code. Most credit risk models in banks require macroeconomic data so future model builds will pull this data. One will also need to split the data into train and test data files because that is standard in model development. However, with careful design, the modeler can also write the remaining modules in a way that makes them reuseable in other model development work.

In summary, when writing code for model development, you should create scripts focused on a single purpose. Test the scripts. Archive the scripts (more on that in the next blog). And, most importantly, reuse the scripts.

Jonathan Leonardelli, FRM, Director of Business Analytics for FRG, leads the group responsible for business analytics, statistical modeling and machine learning development, documentation, and training. He has more than 20 years’ experience in the area of financial risk. 

 

[1] We will touch on this again in the blog about the Tested best practice.

[2] We use the word variable in this context to describe a “characteristic” of the data (e.g., credit score or property type). Some might use the words field, feature, or column. For this blog series we will only use the word variable.

RELATED:

Model Risk Management: Reduce your Model Risk through Coding Best Practices