Thursday, April 17, 2014

Estimating Probability of Extreme Events

Figure 1: Distribution for Mixture Distribution
1.0 Introduction

What is the probability that the Dow Jones Industrial Average (DJIA) will rise by at least 5% tomorrow? By 10%? Very few samples can be found in the data for a large enough rise, and, eventually, you will be asking about a rise beyond all historical experience. Some have argued that Extreme Value Theory can be applied to financial data to extrapolate these sorts of tail probabilities. In this post, I attempt to explain this theory. For purposes of exposition, I here disregard the possibility of such rises as being associated with states that might be impossible to foresee from the past history of the data-generating process.

2.0 A Random Sample from a Mixture Distribution

This exposition includes an example. I need a probability distribution in which tails differ from the portion of the distribution clustered around the center, in some sense. Consider a random variable X which can take on any real number. The probability distribution for this random variable is defined by the Cumulative Distribution Function (CDF). The CDF specifies the probability that a realization of the random variable is less than or equal to a given value:

F(x) = Pr(Xx)


  • x is the argument at which the CDF is evaluated.
  • F is the CDF.
  • F(x) is the indicated probability, that is, the value of the CDF evaluated for the argument.

(Conventionally, uppercase letters toward the end of the alphabet denote random variables. The corresponding lowercase letter denotes a realization of that random variable resulting from the outcome of conducting the underlying experiment.)

To obtain a distribution with heavy tails, I consider a mixture distribution. (Mixture distributions are often used in the theory for robust statistics. I would appreciate a reference arguing that robust statistics and Extreme Value Theory are complementary, in some sense.) Suppose F1 and F2 are CDFs for Gaussian (also known as normal or bell shaped) distributions with possibly different means and standard deviations. And let p be a real number between zero and one. F is the CDF for a mixture distribution if it is defined as follows:

F(x) = p F1(x) + (1 - p) F2(x)

For definiteness, let the parameters for this distribution be as in Table 1. The two Gaussian distributions have equal means. The distribution with the 90% weight also has the smaller standard deviation. In other words, the distribution that is selected less frequently will have realizations that tend towards the tails of the overall mixture distribution.

Table 1: Parameters for a Mixture Distribution
Probability Variate from First Distribution90%
First Gaussian Distribution
Standard Deviation1.0
Second Gaussian Distribution
Standard Deviation3.0

2.1 A Random Sample

Suppose X1, X2, ..., Xn are mutually stochastically independent random variables, each of which has the probability distribution with CDF F. Under these conditions, these random variables comprise a random sample. I wrote a computer program to generate a realization of such a random sample of size n. Table 2 shows some statistics for this realization. I use this realization of a random sample to illustrate the application of various statistical techniques below.

Table 2: Statistics for Synthesized Variates
Sample Size500
Realizations from 1st Distribution443
Realizations from 2nd Distribution57
Sample Mean-0.0086
Standard Deviation1.3358

2.2 Goodness of Fit

It is difficult to determine that a realization of the random sample is not from the distribution F1. In other words, the existence of sample values, often in the tails, from F2 is not readily apparent from a straightforward statistical test for the goodness-of-fit. Consider the order statistics found by sorting the random sample:

X(1)X(2) ≤ ... ≤ X(n)

(By convention, a subscript for a random variable without parentheses denotes a random variable from a random sample. Parentheses denotes an order statistic.)

An empirical CDF can be constructed from the order statistics. The probability that a random variable from the distribution generating the random sample is less than or equal to x(i) is estimated as i/n, the proportion of the sample less than or equal to the given order statistic. Figure 1, above, shows the empirical CDF for my realization of the random sample, as well as the CDFs for the two Gaussian distributions in the mixture distribution. Both Gaussian CDFs have a value of 1/2 for an argument of zero, since that is their mean. The Gaussian distribution with the smaller standard deviation has a CDF with a steeper slope around the mean, since more of its probability is clustered around zero. The empirical CDF, estimated from the data, is a step function, with equal size steps occurring at each realization of a random variable in the sample. One needs to sort the data to calculate the empirical CDF.

The maximum vertical distance between a theoretical distribution and an empirical CDF is known as the Kolmogorov-Smirnov statistic. Under the null hypothesis that the random sample is drawn from the theoretical distribution, the Kolmogorov-Smirnov statistic will be a small positive number. Table 3 shows the Kolmogorov-Smirnov statistics for the data. This statistic is not statistically significant for the first Gaussian distribution. The probability that one would observe such a large value for the Kolmogorov-Smirnov statistic for the second Gaussian distribution is less than 1%. Thus, one could conclude that this data was not generated from the second distribution, but (incorrectly) conclude that it was generated from the first.

Table 3: Goodness of Fit
1st Gaussian
2nd Gaussian
Kolmogorov-Smirnov Statistic0.03540.228
p Value54.59%0.00%

3.0 Distribution for the Tail

With the description of the data out of the way, tail probabilities can now be defined. I concentrate on the upper tail.

3.1 Definition of a Tail

The upper tail is defined in terms of the lower bound u for the tail and the tail probability q. These parameters are related like so:

q = Pr(X > u) = 1 - F(u)

The upper tail is defined as those values of the random variable such that the probability of exceeding such a value is less than the given parameter:

{x | Pr(X > x) < q}

In other words, the tail consists of values of the random variable that lie above the lower bound on the tail. It is sometimes convenient to define a new random variable, Y, for outcomes that lie in the tail:

Y = X - u

This new random variable is the distance from the lower bound of the tail, given that a realization of X lies in the tail. One could give a symmetrical definition of the lower tail and a corresponding random variable. Table 4 shows how many samples in my realization of the random sample, defined above, happen to come from the Gaussian distribution with the larger standard deviation, where, the parameter q is taken to be 10%.

Table 4: Variates in Tails and Center
Number in Lower Tail18
Number in Center23
Number in Upper Tail16
Percentage of Lower Tail from 2nd Distribution36.7%
Percentage of Center from 2nd Distribution5.7%
Percentage of Upper Tail from 2nd Distribution32.7%

For y > 0, the conditional probability that X exceeds any given value in the tail, given that X lies in the tail is:

Pr(X > y + u | X > u) = Pr[(X > y + u) and (X > u)]/Pr(X > u)

The above formula simply follows from the definition of conditional probability. The second clause in the "and" expression is redundant. So the above can be rewritten as:

Pr(X > y + u | X > u) = Pr(X > y + u)/Pr(X > u), y > 0

Let G(y) be the CDF for the distribution for the random variable Y. One can then rewrite the above formula as follows"

1 - G(y) = [1 - F(y + u)]/[1 - F(u)], y > 0

Substituting for the definition of the parameter q, one obtains:

F(y + u) = (1 - q) + q G(y), y > 0


F(x) = (1 - q) + q G(x - u), x > u

The above two expressions relate the CDFs for the distributions of the random variables X and Y.

3.2 Generalized Pareto Distribution

A theorem states that if X is a continuous random variable, the distribution of the tail is from a Generalized Pareto Distribution with the following CDF:

G(y) = 1 - [1 + (c/a)y]-1/c

The parameter a is called the scale parameter, and it must be positive. The parameter c is the shape parameter. It can take on any real number. When the shape parameter is zero, the Generalized Pareto Distribution reduces, by a limit theorem, to the exponential distribution.

Below, I will need the following expression for the Probability Density Function (PDF) for the Generalized Pareto Distribution:

g(y, a, c) = (1/a)[1 + (c/a)y]-(1 + c)/c

The PDF is the derivative of the CDF. For any set A to which a probability can be assigned, the probability that Y lies in A is the integral, over A, of the PDF for Y.

3.3 Parameter Estimates

The parameters defining the upper tail are easily estimated. Let r be an exogenously specified number of variates in the tail. The lower bound on the upper tail is estimated as:

uestimate = X(n - r)

The corresponding tail probability is estimated as:

qestimate = r/n

Several methods exist for estimating the scale and shape parameters for the Generalized Pareto Distribution. I chose to apply the method of maximum likelihood. Since the random variables in a random sample are stochastically independent, their joint PDF is merely the product of the their individual PDFs. The log-likelihood function is the natural logarithm of the joint PDF, considered as a function of the parameters of the PDF.

ln g(a, c) = ln g(y1, a, c) + ... ln g(yr, a, c)

Maximum likelihood estimates are the values of the parameters that maximize the log-likelihood function for the observed realization of the random sample. I found these estimates by applying the Nelder-Mead algorithm to the additive inverse of the log-likelihood function. Table 5 shows estimates for the example.

Table 5: Estimates for Upper Tail Distribution
Tail Probability (q)10%
Lower Bound on Tail (u)1.368
Scale Parameter (a)0.7332
Shape Parameter (c)0.2144

The above has described how to estimate parameters for a distribution characterizing a tail of any continuous distribution. Given these estimates, one can calculate the conditional probability that Y lies above any value in the tail. Figure 2 plots this probability for the example. Notice that this probability is noticeably higher, for much of the tail, for the mixture distribution, as compared to the probability found from the Gaussian distribution with the smaller standard deviation in the mixture. And the Kolmogorov-Smirnov goodness-of-fit would not have led one to reject estimates from the first Gaussian distribution. But the estimates from Extreme Value Theory are closer to the higher (and correct) probabilities from the true theoretical distribution.

Figure 2: Tail Probabilities

4.0 Conclusion

This post has illustrated:

  • A probability distribution in which the central part of the distribution's support tends to behave differently from the tails.
  • The difficultly in rejecting the hypothesis that data is drawn from the distribution characterizing the central tendency of the data, with no account being taken of heavy tails.
  • A method, applicable to any continuous random variable, for estimating a tail distribution.
  • Such estimation yielding an appreciably larger estimate for a tail probability than the distribution characterizing the central tendency.

  • J. B. Broadwater and R, Chellappa (2010). Adaptive Threshold Estimation via Extreme Value Theory, IEEE Transactions on Signal Processing, V. 58, No. 2 (Feb.): pp. 490-500.
  • Damon Levine (2009). Modeling Tail Behavior with Extreme Value Theory, Risk Management Issue. 17.
  • R. V. Hogg and A. T. Craig (1978). Introduction to Mathematical Statistics, Fourth edition, Macmillan.
  • A. Ozturk, P. R. Chakravarthi, and D. D. Weiner (). On Determining the Radar Threshold for Non-Gaussian Process from Experimental Data, IEEE Transactions on Information Theory, V. 42, No. 4 (July): pp. 1310-1316.
  • James Pickands III (1975). Statistical Inference Using Extreme Order Statistics, Annals of Statistics, V. 3, No. 1: pp. 119-131.


Lord Keynes said...

Hi Robert,

Off-topic, but do you have any thoughts on this diagram of Post Keynesian economists I have been working on (based on Mark Lavoie's categories)?:

Also, would you classify yourself as a Sraffian or Kaleckian?

Robert Vienneau said...

I find your diagram a source for comprehensive list of economists.

I'm definitely Sraffian for criticism. I find interesting economists that try to synthesize traditions. For example, Heinrich Bortis, Edward Nell, Luigi Pasinetti draw on both Sraffa and institutionalism.