/
x Model Risk and Validation x Model Risk and Validation

x Model Risk and Validation - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
344 views
Uploaded On 2018-10-23

x Model Risk and Validation - PPT Presentation

for Stress Testing Stress Testing Latest Developments amp Best Practice September 2728 Martin Goldberg Lead Consultant Validationquant LLC martinvalidationquantcom The Usual Caveats This talk expresses my own personal opinions and may not represent the views of any past present o ID: 694689

model stress risk models stress model models risk validation assumptions scenario stresses market good outline scenarios data validator times

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "x Model Risk and Validation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

xModel Risk and Validation for Stress TestingStress Testing: Latest Developments & Best PracticeSeptember 27-28

Martin Goldberg

Lead Consultant, Validationquant LLC

martin@validationquant.comSlide2

The Usual CaveatsThis talk expresses my own personal opinions and may not represent the views of any past, present, or future employers. It may conflict with your views. Feel free to disagree.If models were perfect, this would be a very different universe. This talk is certainly incomplete.This topic is hard, and a short talk will not make you an expert. It may point you in some interesting directions, but there are many devils in the details.No proprietary or confidential information is included in this talk. You might decide afterwards that no information at all is in here.

I may go off-topic either deliberately or upon request

.

This talk is intended more to suggest questions than to give answers. Regulatory changes may invalidate some or all of the current approaches

.

I have been a quant for a long time

but

this talk will

not be

quantitative

.

So

don’t panic, it won’t

have any “scary

” equations

.Slide3

Outline Where is the model risk in a stress test Defining stresses – best practice Risk of models under stress conditions

Regulatory compliance

Understanding model limitations/weaknesses

Effective Challenge

Quantification of model risk is an art form

Limitations of stresses and of models

Validation tools

Validation FrameworksSlide4

Outline Where is the model risk in a stress testSlide5

Model Risk of the stress itself Two distinct kinds of stress testStress shocks are instantaneous changes having immediate effects- e.g. VaRStress scenarios have a well-defined time period and a narrative of aftershocks and reactions – e.g. CCARPlausible shocks are easier to design, since they have no plotlineScenarios are more informative but harder to get plausibility

You cannot just run Monte Carlo and use some of the more stressful computer-generated scenarios; they won’t make sense

In a crisis environment many models break down

What is the value of a product that nobody wants to buy?

How do you hedge in an illiquid market?

Can your yield curve model handle negative rates?

What happens in hyperinflation with 3-month Treasuries at 150%?

Severe stresses are rare and usually you cannot

do a traditional backtest

because of sparse or no data

. Aesthetics and subjective plausibility are the best you can do.

You can also look at older historical stresses, but they are not as relevant. The Panic of 1837, where 11 states in the U.S. defaulted, is not as plausible today.

The US financial panics of 1819, 1837, 1857, 1873, 1893, 1929, 1987, 1998, and 2007 were not identical.

You can use them as case studies, but not as a stress deck to draw cards from.

Does the stress

include contagion and circuit breakers in longer scenarios?

Good documentation by the stress designer, explaining why the stress is that way, is crucial to getting the stress accepted by the businesses, the validators, and the regulators.Slide6

Model Risk of the individual models The stresses will affect the business. Since this stress is hypothetical, the effect must be modeled. Some positions and processes will require validated models to measure the effects of the stress, but some of the hypothetical stress effects will be measured with non-model tools. Under some stresses, the non-model tools require extra assumptions and become honest-to-goodness models needing validation.Models make assumptions, and stress scenarios make assumptions. Are they compatible?Slide7

Model Ecosystem Risk Even if the stress itself is validated, and each model used for the stress test is individually validated, there is still model risk from the pieces not fitting together.I published something about this in J. Structured Finance article Spring 2017 The inputs to one model are the outputs from another model, or tool, with subtly incompatible assumptions, or built for slightly incompatible purposes or somewhat different markets. Models make assumptions, and might have modified these assumptions for stress conditions. Are the models compatible?Slide8

Outline Defining stresses – best practiceSlide9

A bad day is more cats. Stress is when the glass breaks.SqueekerSlide10

Validation of Stress PlausibilityFor a CCAR stress, it must be extreme but plausibleSimilarly for other stresses, if it’s not plausible, it does not convey useful information and will be ignored.Plausibility is somewhat subjective, and validating plausibility involves a subjective judgment.The key is for the stress designer to have adequate documentation on why they are comfortable with plausibility of the scenario. This is entirely separate from whether it’s severe enough, or too severe.The stress designer should demonstrably have put a sufficient amount of thought into the design.The various markets being stressed must fit together; if oil goes way up, gasoline will not go way down. If the U.S. defaults, the S&P 500 would not plausibly have a rally.Slide11

Validation of Stress SeverityThis is the easier part, compared to plausibility.Best practice is to scale the severities of all stresses to be roughly comparable.Maybe similar multiple of today’s volatility, orMaybe how badly it would hurt your firm.Do not penalize a business unit for good hedging.If everyone knows in advance which stress will cause the worst hypothetical loss, the validation team may think you’ve done it wrong.If some businesses are unaffected by the stress, either they have done a good job of extreme-value hedging, or you have done a bad job of stress selection.In a longer scenario, severity and plausibility interact. There should be knock-on effects, and gradual changes after the initial shock.The market usually over-reacts, and then over-corrects, so a shock tends to lead to wild swings and heightened volatility for a while.Slide12

Sizing a stress testS&P ratings are stress tests – AAA means they think you could survive the next 1930’s US Great Depression, single-B means you can surely survive the coming year, and various levels between. The Fed Severely Adverse scenario is roughly a BBB stress.Different markets will in general react differently to the same macro-economic stress, and idiosyncratic changes that would be stressful for one market segment may be benign for another.VaR is the 99th % worst ten-day period, Basel wants the loss of the 99.9th % worst year, and a AA rating is often assumed to be the 99.97th % worst year.Of course in reality no firm or nation has ever survived the third-worst year out of ten thousand.Show of hands – how many of you have employers that were in the same or a similar business in the year 1017? When writing was invented? The more stressful a shock, the further out you have to extrapolate from historical data. This requires extra assumptions.Slide13

Scenario ExpansionSince a stress shock or scenario should involve all your positions, both long and short, all the underlying factors should be stressed.An essential part of validating a stress is deciding whether the scenario is plausible and appropriate to its intended use.“Scenario Expansion” is one term used to describe how the thousands of factors will move when a few dozen key ones define the scenario (such as the Fed-prescribed ones in CCAR).It may be tempting to use historical correlations to decide how far to stress other factors, but it would not be plausible.Slide14

Tail DependenceMarkets that are not very related in good times can plummet together in bad times. For example, when times are good in equities, hedge funds do pairs trading, betting which of two related stocks will do better. In bad times, the firm may tend to close out all its equity positions and buy Treasuries.This is a quantitative measure of the folk saying that in a crisis, correlations go to either +1 or -1. The only really original idea I ever published, “tonsuring,” deals with quantifying this effect. For details see http://arxiv.org/abs/1110.4648 But there are plausible stresses where a previously solid relationship between two market factors breaks down. Again, documentation is key.Slide15

TonsuringThis is an exploratory data analysis technique I call “tonsuring,” intended to highlight infrequent features of the observed data timeseries. If one assumes that future stresses will be similar to the extremes of the past, it can help with scenarios of stressful times yet to come.By progressively throwing out “inliers” – boring days when not much happened in the market (defined by being closer to the center of a distribution) you see what happens to the “correlation.”It can also be applied to Principal Component Analysis (PCA). Typically the first PC gets more dominant, and the shapes of the second and lower PC change.Slide16

An example of tonsuring Slide17

More Suggestions for Scenario DesignTo expand a shock or scenario to cover all your firm’s diverse assets and liabilities, it can help to give each scenario a short meaningful name and a back-story. My own made-up examples:Euro currency breakupUS Congress can’t pass budget - US defaultsChina invades Taiwan“Mr Fusion” - free electricityBe sure the stress scenario has included knock-on effects on all other markets, with plausible lagsHistorical correlations are irrelevant hereDelayed shocks due to fire sales by dying firms Slide18

Outline Risk of models under stress conditionsSlide19

Risk of Models under Stress ConditionsAlmost any model has a breaking point. If you push it far enough it falls off a cliff. Do you know where that cliff is, and is your stress past the edge of validity for each model?You may need a different, and typically simpler, model which works under that stress.Are you allowed to have separate models?Do they paste together smoothly under intermediate conditions?Best practice is for the breaking point(s) of the models to be documented by the developers and/or validators.Slide20

Model Inputs under Stress ConditionsModels need inputs. If the stress causes a particular market to become illiquid, or shut down completely, a model that expects inputs from that market is not useful.During much of 2008-2009, there were no trades in some previously liquid securities, such as MBS-backed CDO tranches and Auction Rate Securities.For models to work in a stress scenario, you may have to make some heroic assumptions about dormant markets.Marking these illiquid positions to zero is too extreme, but keeping them at the last observed price from pre-crisis is totally implausible and not valid.Some prices, rates, etc. that would in ordinary times not need a model might require one during stresses. In accounting jargon, this is going from Level 1 valuation to Level 3. Such models of course need to be validated, but you can’t exactly backtest them, since one of their assumptions is that a normally observable market factor has gone missing.Slide21

Outline Regulatory complianceSlide22

Keeping the Regulators SatisfiedIt is clearly not true that you are doing all this work around stress testing out of intellectual curiosity. It is a regulatory requirement.Nobody except rating agencies did serious stress testing before the subprime crisis. But S&P ratings are defined as stress tests.CCAR and DFAST are probably the lion’s share of most validation team efforts at most banks. Doing a stress scenario correctly, and demonstrating to all that it is proper and suitable, is deliberately a high bar to get over.Sufficient documentation is a key element. If you are not sure your docs are sufficient, they aren’t. If you are sure they are sufficient, they probably still aren’t.A key is to show good intentions – deliberately trying to tweak the scenarios, models, assumptions, etc. to make your firm look better, will get you in deep trouble. Your firm will get caught. Don’t try it.I am not, and have never been, a regulator. Slide23

Outline

Understanding model limitations/weaknessesSlide24

Model Limitations “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”There are no Laws of Finance. Financial data do not follow any stochastic process, but Wall Street uses heuristics – build models as if the models worked, so an approximate answer can be found.Models are not intended to capture all the nuances of the real world.Models are useful specifically because they omit part of messy reality All models are based on one or more assumptions. Models are just a formalized version of the model designer’s intuition.Models are never “valid” in an absolute sense. They rely on assumptions about the behavior of people, organizations, acts of the natural world, and the use of other models by market participants. A model is a mixture of behavioral psychology, statistics, numerical methods, and subjective opinions, and some parts of any model are dictated by law, regulation, or company policy. Slide25

Avoiding hubrisThe main point of a model validation is the qualitative judgment of whether the model is suitable for its intended use. If the intended use is a stress test, this may make the model unsuitable for normal conditions. The perfect model does not exist. Details of the assumed stress scenario are not necessarily compatible with how the markets work in normal times. A very complicated model that is necessary for live markets may be inappropriate for stresses, but this limitation needs careful justification.Stress test models can have healthy doses of human judgment, but the judgment needs to be backed by quantitative reasoning.Slide26

Avoiding myopiaOne way to help predict what could happen is to study history. For example, here is a graph of UK consol yields since 1729 and US long bond yields since 1798.The UK long bond rate rose 360 bp in 1974, and fell 188 bp in 1983. Since 1999, the largest annual rise was 39 bp and the largest annual fall was 82 bp. In the US, annual data from 1987 – present have the change in long bond yield vary from -92 bp to +75 bp. In 1986 it went down 235 bp, and in 1980 it went up 231 bp, and a further 223 bp in 1981.Slide27

Other Tail Dependencesss

Upper and lower tail dependence of 1; middle “local dependence” -1

The rank correlation is constructed to be exactly zero. I designed this as a counterexample.

It is more pathological than what you will ever actually find

You can find funnel-shaped and galaxy-shaped copula densities in real data, but in a less exaggerated form than below.

Extreme Funnel Extreme Galaxy

Gaussian

Copula

Density – Easier to model but not always plausibleSlide28

Long histories“History never repeats itself, but it rhymes” – misattributed to Mark Twain.No historical calibration using a currency with a pegged FX rate can predict the consequences of the peg breaking.What would you predict for the Greek drachma exchange rate in 2019?What was the effect on the Euro-GBP exchange rate of the Norman conquest of 1066? This is inside the 99.9th percentile of one year changes.I suggest using as long a history as you can get, and possibly using similar assets’ histories as proxies to get as many observations of the tails as possible.Slide29

The Egg QuestionFarmer Gray’s Organic Free-Range Eggs come from his small flock of ~300 hens on his small property on Long Island (note this is a fictitious example). Because of their outstanding quality, he charges $1.50 per egg, which is far more than the cost of supermarket eggs.a. How much would a box of a dozen eggs cost? b. How much would a truckload of a million eggs cost? Slide30

Outline

Effective ChallengeSlide31

What Makes a Validator Challenge Effective?The validation team has sufficient backing to avoid any chilling effect where they think maybe their job depends on the model getting a thumbs up.Meaningful remediation of failed models actually happens.Real life counterexample – Business : “You can validate the model, but whether it passes or fails we are going to use it anyway.”Validator : “But that’s what you said last year.”The validation must make a real attempt that plausibly could cause the model to fail if it were flawed, and they should not have decided in advance of the validation whether to pass it.In the case of insufficient documentation, there should be a firmwide standard of whether this causes a fail, a pause, or something else.More than a “light touch, just kick the tires” cursory effortSlide32

Institutional AcceptanceThe firm’s culture needs to be considered. Cowboy culture – “Après moi la deluge”Arrogance – TBTF so it doesn’t matterRisk-averse – any loss causes panic and terminationsAsperger – “We set the risk tolerance already so it is what it is.” Most models work like this because the calculation is simpler.Very few risk managers and no high executives are like this.For longer scenarios, will the culture be changed by the stress? Contingency planning differs by type of stress?Know who is being challenged.Detailed non-quantitative explanation of why the stress model is or isn’t validated.Slide33

Models are hard to buildMost of us have deadlines to meet. Very complex models are harder to implement and take longer to validate. If the model is incomprehensible to the intended user, it may not get used. Is the model a good compromise between showing off the developer’s quant skills and giving the users an appropriate tool?Is the stress a good match for the firm’s positions?Remember Hofstadter’s Rule, which states that everything takes longer than you think it will, even after you take Hofstadter’s Rule into account.The fundamental law of the universe is Murphy’s Law, stated by Feynman for quantum mechanics as “Anything not forbidden is compulsory.”Slide34

Challenger ModelsEspecially for CCAR, there should be a challenger model for most or all production models. These should be validated as well, if they are developed by the model developer. Challenger models built by validators can’t be validated due to conflict of interest.Challenger models should be materially different than the one they are challenging.The challenger model should be good enough to possibly win the challenge and become the basis for the next version of the production model. This is quite difficult.Going to conferences and training sessions like this one, and reading the literature, can be a great help in coming up with ideas for challenger models.If both the champion and challenger agree on the output numbers, this adds credence to the stress result. If they disagree too much, it raises questions.If they agree precisely to the penny, there is something very suspicious.Slide35

More ways to challengeCheck consistency with similar modelsChallenge the rationale for all assumptions and developmental decisionsCheck for reproducible resultsSmall changes to stress produce small changes to resultsExcept when they don’t these are called critical parameters or critical values a slight change in a critical parameter causes a large and/or discontinuous change in resultsSome scenarios just barely trigger, or just barely miss triggering, any knock-outs, contingencies, turbo-ing, covenants, etc., in the portfolio being modeled.

Does the scenario hover just at that breaking point? Slide36

Outline

Quantification of model risk is an art formSlide37

Model Risk QuantificationModel risk in a stress test is the risk of mis-measuring the severity, or making sub-optimal business decisions about how to reduce exposure to such stresses, because the model was flawed, misused, misinterpreted, ignored, or fed inappropriate inputs or misleading assumptions.To discuss this risk, and put it on par with other risks, you need to measure model risk, market risk, credit risk, etc. in the same units. This common metric is typically dollars (or ¥, £, €, whatever your firm’s accounting currency may be).The risk of any model comes from its use - a model that has no money flowing through it but just sitting on the shelf has no risk. Thus model risk quantification should be tied to usage. This ideally would give the business user a financial incentive to lessen the risk.If the quantification method is not transparent and perceived as fair, cooperation and good faith attempts to reduce model risk in stress tests will not happen.Since these stress tests are hypothetical, all the measurements are model dependent and hopefully cannot be checked against a real future catastrophe.Slide38

Model Risk of Model Risk MeasuresThere is a recursive quality to this problem, since the model for measuring model risk has its own model risk. Admit in advance that the model risk estimates are going to be fuzzy and imprecise. If the quantification is "wrong but useful," then that is good enough to present as a "close enough" risk measure. It is better to have some measure of model risk for model risk managers to use, and eventually refine into a measure that is more useful to them, than to give them nothing until it's perfect - it will never be perfect.Ideally there is a balance between the “model as automated software” view and the “model user responsibility” view – more complex models with more inputs and assumptions are riskier, but a model that would be more heavily used during a stress, which is not used as much in normal times, is also riskier due to being unfamiliar.As with other operational risks, a model can have a tiny probability of a massive stress loss. How do you compare that to a high likelihood of a small stress loss?Slide39

Outline

Limitations of stresses and of models Slide40

Stress Scenario Limitations There is a saying misattributed to Mark Twain that “History never repeats itself, but it rhymes.”Your stress testing will not exactly predict the next crisis, but if you have a diverse set of different enough scenarios, you are more likely to have bracketed the flavor of the next real crisis. With luck you will never know for sure.For a multi-period stress like CCAR, the feedback loops, contagion, delayed panic, and so forth, are even harder to predict. The Crash of 1929 was about to start in March but JP Morgan singlehandedly stopped it, for a few months.The validator will check that you have tried to conceive of the inconceivable. I am repeating myself to say that good documentation of the thought process leading to what stresses are used and which were rejected is crucial to validation. Intention matters here.Slide41

Model Limitations Models are not magic crystal balls. They are a way to quantitatively explore the implications of the model builder’s assumptions. If the assumptions are not compatible with the stress designer’s intentions and assumptions, the model is not suitable for that stress. Clear, detailed communication is necessary.Some of the model validator’s toolbox is not applicable to stress test models – No historical backtestingMany alternative models cannot be used as a challenger due to the market dislocations of the stressEmpirical regularities in the markets stop working during stressesA pricing model for a market maker has a much higher precision than a stress test model. The whole stress exercise is a rough guess, not an exact calibration.Slide42

Outline

Validation toolsSlide43

Validation Toolbox for StressesFor validating the stresses, the validator should remember that this is an extrapolation beyond the range of sufficient historical data. A key extrapolation tool is Multivariate Extreme Value TheoryMy personal favorite 1-dimensional distribution is Tukey’s g×h, which is a skewed and stretched Gaussian. g controls skew and h controls kurtosis.A library of suitable copulas for parametrizing contagion and tail dependence are found in Harry Joe’s monograph “Dependence Modeling with Copulas” – very intense reading, but worth it.Exploratory Data Analysis methods can show graphically whether it is too implausible.Understanding the human judgment that led to the stress developer’s proposal requires soft skills and communication. The business reaction to stress in period T leading to the proposed effects in period T+1 are a difficult mix of judgment, policies, historical evidence, and prestidigitation. Validation is at least partly art mixed with science.Try to shuffle the scenario timeline, and see if the proposed reactions still make sense. Slide44

Validation Toolbox for ModelsFor validating the models for being suitable to the proposed stress scenarios, the validator should check if there is a continuous reaction as you gradually turn on the stress. In other words, does the model output graphed against percent of stress on inputs give a smooth curve, or is there a cliff that the model falls off? A historical example – when volatility rises, trading activity increases to a point. But when volatility in the crude oil market got to about 200% at the start of the Gulf War, most of the floor traders gave up and walked away.Another example – in normal times, changes in yield between the various countries’ sovereign bonds are highly correlated due to arbitrageurs and such. But when a particular country has a crisis, its bond yields skyrocket, and yields in safe haven countries go down. What would happen if the US defaulted? It has not since 1931, so there is not really relevant data.I recommend that validators spend some time keeping up with the literature. I read the arXiv.org Quantitative Finance (http://arxiv.org/archive/q-fin) abstracts every day while I eat breakfast. SSRN and NEP also have free or mostly free daily digests of new stuff. There are also many journals; but your firm won’t pay for subscriptions to all of them. (or maybe it will, and then fire anyone who reads all of them instead of getting work done)Slide45

More for the Validation Toolbox for ModelsValidators are not academics. Let the academics invent new stuff and then the validator can use it.Almost any weird new technique or algorithm that the academics invent will have an R package that does it and that you can download.Most will be in a Matlab toolbox, and some will have a Mathematica or Python implementation.C++ and SAS are not as likely to have a package to plug and play.Conferences like this one often have some new ideas for developers and validators to learn. Let the validators know about them.Try to learn enough that the rate of learning is faster than the rate of forgetting.Don’t be scared of wavelets, agent-based models, artificial intelligence / neural nets, or other techniques from outside Finance.Slide46

Rabin’s Rules(Mike Rabin was my boss in 1991)Curiously, an electrician who installed an outlet in my basement had these same 3 rules for his work.1. Pay AttentionWhat are the features you are trying to model? Did you use the right day-count conventions? What did the client actually ask for?2. Think About What You Are DoingYou are going to dinner at Nobu in an hour, and the TV in the kitchenette is broadcasting your favorite team’s tie-breaking game. Neither of these should affect the nesting of parentheses on your if statement.3. Double-Check Your WorkLimiting cases and paper trading simulationsBenchmarking against other modelsCompiler warning messages, rereading the term sheet, etc.A second set of eyes (independent validation)Slide47

Outline

Validation FrameworksSlide48

Validation Scope and FrameworksThe point of a validation is to have an independent second opinion that the stress modeling is done well enough.The key question to answer for the developer is “Why are you comfortable that this is a suitable method, with sufficient documentation and testing?”The key question for the validator is the same. Note the quantitative machinery is ultimately to answer a qualitative question.The framework sets the ground rules for what the validator has to do, and should include all the regulatory guidance and MRA remediations from past exams. The validator following he framework will be less likely to omit important steps.We all have deadlines, and the framework can help to show when it’s time to stop and move on to the next project. Slide49