
Figure 1: Distribution for Mixture Distribution 
1.0 Introduction
What is the probability that the Dow Jones Industrial Average (DJIA) will rise by at least 5% tomorrow? By 10%? Very few samples can be found in the data for a large enough rise, and, eventually, you will be asking about a rise beyond all historical experience. Some have argued that Extreme Value Theory can be applied to financial data to extrapolate these sorts of tail probabilities. In this post, I attempt to explain this theory. For purposes of exposition, I here disregard the possibility of such rises as being associated with states that might be impossible to foresee from the past history of the datagenerating process.
2.0 A Random Sample from a Mixture Distribution
This exposition includes an example. I need a probability distribution in which tails differ from the portion of the distribution clustered around the center, in some sense. Consider a random variable X which can take on any real number. The probability distribution for this random variable is defined by the Cumulative Distribution Function (CDF). The CDF specifies the probability that a realization of the random variable is less than or equal to a given value:
F(x) = Pr(X ≤ x)
where:
 x is the argument at which the CDF is evaluated.
 F is the CDF.
 F(x) is the indicated probability, that is, the value of the CDF evaluated for the argument.
(Conventionally, uppercase letters toward the end of the alphabet denote random variables. The corresponding lowercase letter denotes a realization of that random variable resulting from the outcome of conducting the underlying experiment.)
To obtain a distribution with heavy tails, I consider a mixture distribution. (Mixture distributions are often
used in the theory for robust statistics. I would appreciate a reference arguing that robust statistics and Extreme Value
Theory are complementary, in some sense.) Suppose F_{1} and F_{2} are CDFs for Gaussian (also known as normal or bell shaped) distributions with possibly different means and standard deviations. And let p be a real number between zero and one. F is the CDF for a mixture distribution if it is defined as follows:
F(x) = p F_{1}(x) + (1  p) F_{2}(x)
For definiteness, let the parameters for this distribution be as in Table 1. The two Gaussian distributions
have equal means. The distribution with the 90% weight also has the smaller standard deviation. In other words,
the distribution that is selected less frequently will have realizations that tend towards the tails of the
overall mixture distribution.
Table 1: Parameters for a Mixture Distribution
Parameter  Value 
Probability Variate from First Distribution  90% 
First Gaussian Distribution 
Mean  0.0 
Standard Deviation  1.0 
Second Gaussian Distribution 
Mean  0.0 
Standard Deviation  3.0 
2.1 A Random Sample
Suppose X_{1}, X_{2}, ..., X_{n} are mutually stochastically independent random variables, each of which has the probability distribution with CDF F. Under these conditions, these random variables comprise a random sample. I wrote a computer program to generate a realization of such a random sample of size n. Table 2 shows some statistics for this realization. I use this realization of a random sample to illustrate the application of various statistical techniques below.
Table 2: Statistics for Synthesized Variates
 Value 
Sample Size  500 
Realizations from 1st Distribution  443 
Realizations from 2nd Distribution  57 

Sample Mean  0.0086 
Standard Deviation  1.3358 

Minimum  6.0125 
Median  0.0034 
Maximum  9.932 
2.2 Goodness of Fit
It is difficult to determine that a realization of the random sample is not from the distribution F_{1}. In other words, the existence of sample values, often in the tails, from F_{2} is not readily apparent from a straightforward statistical test for the goodnessoffit. Consider the order statistics found by sorting the random sample:
X_{(1)} ≤ X_{(2)} ≤ ... ≤ X_{(n)}
(By convention, a subscript for a random variable without parentheses denotes a random variable from a random sample. Parentheses denotes an order statistic.)
An empirical CDF can be constructed from the order statistics. The probability that a random variable from the distribution generating the random sample is less than or equal to x_{(i)} is estimated as i/n, the proportion of the sample less than or equal to the given order statistic. Figure 1, above, shows the empirical CDF for my realization of the random sample, as well as the CDFs for the two Gaussian distributions in the mixture distribution. Both Gaussian CDFs have a value of 1/2 for an argument of zero, since that is their mean. The Gaussian distribution with the smaller standard deviation has a CDF with a steeper slope around the mean, since more of its probability is clustered around zero. The empirical CDF, estimated from the data, is a step function, with equal size steps occurring at each realization of a random variable in the sample. One needs to sort the data to calculate the empirical CDF.
The maximum vertical distance between a theoretical distribution and an empirical CDF is known as the KolmogorovSmirnov statistic. Under the null hypothesis that the random sample is drawn from the theoretical distribution, the KolmogorovSmirnov statistic will be a small positive number. Table 3 shows the KolmogorovSmirnov statistics for the data. This statistic is not statistically significant for the first Gaussian distribution. The probability that one would observe such a large value for the KolmogorovSmirnov statistic for the second Gaussian distribution is less than 1%. Thus, one could conclude that this data was not generated from the second distribution, but (incorrectly) conclude that it was generated from the first.
Table 3: Goodness of Fit
 1st Gaussian Distribution  2nd Gaussian Distribution 
KolmogorovSmirnov Statistic  0.0354  0.228 
p Value  54.59%  0.00% 
3.0 Distribution for the Tail
With the description of the data out of the way, tail probabilities can now be defined. I concentrate on the upper tail.
3.1 Definition of a Tail
The upper tail is defined in terms of the lower bound u for the tail and the tail probability q. These parameters are related like so:
q = Pr(X > u) = 1  F(u)
The upper tail is defined as those values of the random variable such that the probability of exceeding such a value is less than the given parameter:
{x  Pr(X > x) < q}
In other words, the tail consists of values of the random variable that lie above the lower bound on the tail. It is sometimes convenient to define a new random variable, Y, for outcomes that lie in the tail:
Y = X  u
This new random variable is the distance from the lower bound of the tail, given that a realization of X lies in the tail. One could give a symmetrical definition of the lower tail and a corresponding random variable. Table 4 shows how many samples in my realization of the random sample, defined above, happen to come from the Gaussian distribution with the larger standard deviation, where, the parameter q is taken to be 10%.
Table 4: Variates in Tails and Center
 Value 
Number in Lower Tail  18 
Number in Center  23 
Number in Upper Tail  16 

Percentage of Lower Tail from 2nd Distribution  36.7% 
Percentage of Center from 2nd Distribution  5.7% 
Percentage of Upper Tail from 2nd Distribution  32.7% 
For y > 0, the conditional probability that X exceeds any given value in the tail, given that X lies in the tail is:
Pr(X > y + u  X > u) = Pr[(X > y + u) and (X > u)]/Pr(X > u)
The above formula simply follows from the definition of conditional probability. The second clause in the "and" expression is redundant. So the above can be rewritten as:
Pr(X > y + u  X > u) = Pr(X > y + u)/Pr(X > u), y > 0
Let G(y) be the CDF for the distribution for the random variable Y. One can then rewrite the above formula as follows"
1  G(y) = [1  F(y + u)]/[1  F(u)], y > 0
Substituting for the definition of the parameter q, one obtains:
F(y + u) = (1  q) + q G(y), y > 0
Or:
F(x) = (1  q) + q G(x  u), x > u
The above two expressions relate the CDFs for the distributions of the random variables X and Y.
3.2 Generalized Pareto Distribution
A theorem states that if X is a continuous random variable, the distribution of the tail is from a Generalized Pareto Distribution with the following CDF:
G(y) = 1  [1 + (c/a)y]^{1/c}
The parameter a is called the scale parameter, and it must be positive. The parameter c is the shape parameter. It can take on any real number. When the shape parameter is zero, the Generalized Pareto Distribution reduces, by a limit theorem, to the exponential distribution.
Below, I will need the following expression for the Probability Density Function (PDF) for the Generalized Pareto Distribution:
g(y, a, c) = (1/a)[1 + (c/a)y]^{(1 + c)/c}
The PDF is the derivative of the CDF. For any set A to which a probability can be assigned, the probability that Y lies in A is the integral, over A, of the PDF for Y.
3.3 Parameter Estimates
The parameters defining the upper tail are easily estimated. Let r be an exogenously specified number of variates in the tail. The lower bound on the upper tail is estimated as:
u_{estimate} = X_{(n  r)}
The corresponding tail probability is estimated as:
q_{estimate} = r/n
Several methods exist for estimating the scale and shape parameters for the Generalized Pareto Distribution. I chose to apply the method of maximum likelihood. Since the random variables in a random sample are stochastically independent, their joint PDF is merely the product of the their individual PDFs. The loglikelihood function is the natural logarithm of the joint PDF, considered as a function of the parameters of the PDF.
ln g(a, c) = ln g(y_{1}, a, c) + ... ln g(y_{r}, a, c)
Maximum likelihood estimates are the values of the parameters that maximize the loglikelihood function for the observed realization of the random sample. I found these estimates by applying the NelderMead algorithm to the additive inverse of the loglikelihood function. Table 5 shows estimates for the example.
Table 5: Estimates for Upper Tail Distribution
Parameter  Estimate 
Tail Probability (q)  10% 
Lower Bound on Tail (u)  1.368 
Scale Parameter (a)  0.7332 
Shape Parameter (c)  0.2144 
The above has described how to estimate parameters for a distribution characterizing a tail of any continuous distribution. Given these estimates, one can calculate the conditional probability that Y lies above any value in the tail. Figure 2 plots this probability for the example. Notice that this probability is noticeably higher, for much of the tail, for the mixture distribution, as compared to the probability found from the Gaussian distribution with the smaller standard deviation in the mixture. And the KolmogorovSmirnov goodnessoffit would not have led one to reject estimates from the first Gaussian distribution. But the estimates from Extreme Value Theory are closer to the higher (and correct) probabilities from the true theoretical distribution.

Figure 2: Tail Probabilities 
4.0 Conclusion
This post has illustrated:
 A probability distribution in which the central part of the distribution's support tends to behave differently from the tails.
 The difficultly in rejecting the hypothesis that data is drawn from the distribution characterizing the central tendency of the data, with no account being taken of heavy tails.
 A method, applicable to any continuous random variable, for estimating a tail distribution.
 Such estimation yielding an appreciably larger estimate for a tail probability than the distribution characterizing the central tendency.
References
 J. B. Broadwater and R, Chellappa (2010). Adaptive Threshold Estimation via Extreme Value Theory, IEEE Transactions on Signal Processing, V. 58, No. 2 (Feb.): pp. 490500.
 Damon Levine (2009). Modeling Tail Behavior with Extreme Value Theory, Risk Management Issue. 17.
 R. V. Hogg and A. T. Craig (1978). Introduction to Mathematical Statistics, Fourth edition, Macmillan.
 A. Ozturk, P. R. Chakravarthi, and D. D. Weiner (). On Determining the Radar Threshold for NonGaussian Process from Experimental Data, IEEE Transactions on Information Theory, V. 42, No. 4 (July): pp. 13101316.
 James Pickands III (1975). Statistical Inference Using Extreme Order Statistics, Annals of Statistics, V. 3, No. 1: pp. 119131.