Monday, April 28, 2014

On r > g

I have not even started to read Thomas Piketty's Capital in the Twenty-First Century, but I have heard of Piketty as compared to Marx.

Whatever else Marx was, he was a very learned man. He read the works of virtually all political economists who came before him. And in his lengthy tomes, he would comment on them, not always fairly. He did not confine himself to ones that were politically influential among the elite. For example, consider Marx on the Ricardian socialists.

So if Piketty is like Marx, can I expect to find comments on the Cambridge equation, r = g/sc? Can I expect to find something about the models of growth and distribution put forward by Richard Kahn, Nicholas Kaldor, Luigi Pasinetti, and Joan Robinson? (Joshua Gans has also noticed a parallelism between the work of Piketty and the Post Keynesian theory of distribution.) Or maybe the analogy is not complete.

Thursday, April 24, 2014

Size of Government in USA

I thought that Krugman had a post about Paul Ryan stating, incorrectly, that Obama had increased the size of the government. And he wondered why conservatives make factual statements that can be easily shown to be wrong. But I cannot find such a post. I can find this one on Rand Paul making a different incorrect factual claim. I am fairly sure I am thinking of something more recent than this post about Rand Paul being confused in 2012.

(By the way, Paul Krugman is wrong about what heterodox economists believe about marginal productivity theory. If he reads this, though, I would rather read his comments about the empirical correlation between increased government size and increased equality.)

Thursday, April 17, 2014

Estimating Probability of Extreme Events

Figure 1: Distribution for Mixture Distribution
1.0 Introduction

What is the probability that the Dow Jones Industrial Average (DJIA) will rise by at least 5% tomorrow? By 10%? Very few samples can be found in the data for a large enough rise, and, eventually, you will be asking about a rise beyond all historical experience. Some have argued that Extreme Value Theory can be applied to financial data to extrapolate these sorts of tail probabilities. In this post, I attempt to explain this theory. For purposes of exposition, I here disregard the possibility of such rises as being associated with states that might be impossible to foresee from the past history of the data-generating process.

2.0 A Random Sample from a Mixture Distribution

This exposition includes an example. I need a probability distribution in which tails differ from the portion of the distribution clustered around the center, in some sense. Consider a random variable X which can take on any real number. The probability distribution for this random variable is defined by the Cumulative Distribution Function (CDF). The CDF specifies the probability that a realization of the random variable is less than or equal to a given value:

F(x) = Pr(Xx)


  • x is the argument at which the CDF is evaluated.
  • F is the CDF.
  • F(x) is the indicated probability, that is, the value of the CDF evaluated for the argument.

(Conventionally, uppercase letters toward the end of the alphabet denote random variables. The corresponding lowercase letter denotes a realization of that random variable resulting from the outcome of conducting the underlying experiment.)

To obtain a distribution with heavy tails, I consider a mixture distribution. (Mixture distributions are often used in the theory for robust statistics. I would appreciate a reference arguing that robust statistics and Extreme Value Theory are complementary, in some sense.) Suppose F1 and F2 are CDFs for Gaussian (also known as normal or bell shaped) distributions with possibly different means and standard deviations. And let p be a real number between zero and one. F is the CDF for a mixture distribution if it is defined as follows:

F(x) = p F1(x) + (1 - p) F2(x)

For definiteness, let the parameters for this distribution be as in Table 1. The two Gaussian distributions have equal means. The distribution with the 90% weight also has the smaller standard deviation. In other words, the distribution that is selected less frequently will have realizations that tend towards the tails of the overall mixture distribution.

Table 1: Parameters for a Mixture Distribution
Probability Variate from First Distribution90%
First Gaussian Distribution
Standard Deviation1.0
Second Gaussian Distribution
Standard Deviation3.0

2.1 A Random Sample

Suppose X1, X2, ..., Xn are mutually stochastically independent random variables, each of which has the probability distribution with CDF F. Under these conditions, these random variables comprise a random sample. I wrote a computer program to generate a realization of such a random sample of size n. Table 2 shows some statistics for this realization. I use this realization of a random sample to illustrate the application of various statistical techniques below.

Table 2: Statistics for Synthesized Variates
Sample Size500
Realizations from 1st Distribution443
Realizations from 2nd Distribution57
Sample Mean-0.0086
Standard Deviation1.3358

2.2 Goodness of Fit

It is difficult to determine that a realization of the random sample is not from the distribution F1. In other words, the existence of sample values, often in the tails, from F2 is not readily apparent from a straightforward statistical test for the goodness-of-fit. Consider the order statistics found by sorting the random sample:

X(1)X(2) ≤ ... ≤ X(n)

(By convention, a subscript for a random variable without parentheses denotes a random variable from a random sample. Parentheses denotes an order statistic.)

An empirical CDF can be constructed from the order statistics. The probability that a random variable from the distribution generating the random sample is less than or equal to x(i) is estimated as i/n, the proportion of the sample less than or equal to the given order statistic. Figure 1, above, shows the empirical CDF for my realization of the random sample, as well as the CDFs for the two Gaussian distributions in the mixture distribution. Both Gaussian CDFs have a value of 1/2 for an argument of zero, since that is their mean. The Gaussian distribution with the smaller standard deviation has a CDF with a steeper slope around the mean, since more of its probability is clustered around zero. The empirical CDF, estimated from the data, is a step function, with equal size steps occurring at each realization of a random variable in the sample. One needs to sort the data to calculate the empirical CDF.

The maximum vertical distance between a theoretical distribution and an empirical CDF is known as the Kolmogorov-Smirnov statistic. Under the null hypothesis that the random sample is drawn from the theoretical distribution, the Kolmogorov-Smirnov statistic will be a small positive number. Table 3 shows the Kolmogorov-Smirnov statistics for the data. This statistic is not statistically significant for the first Gaussian distribution. The probability that one would observe such a large value for the Kolmogorov-Smirnov statistic for the second Gaussian distribution is less than 1%. Thus, one could conclude that this data was not generated from the second distribution, but (incorrectly) conclude that it was generated from the first.

Table 3: Goodness of Fit
1st Gaussian
2nd Gaussian
Kolmogorov-Smirnov Statistic0.03540.228
p Value54.59%0.00%

3.0 Distribution for the Tail

With the description of the data out of the way, tail probabilities can now be defined. I concentrate on the upper tail.

3.1 Definition of a Tail

The upper tail is defined in terms of the lower bound u for the tail and the tail probability q. These parameters are related like so:

q = Pr(X > u) = 1 - F(u)

The upper tail is defined as those values of the random variable such that the probability of exceeding such a value is less than the given parameter:

{x | Pr(X > x) < q}

In other words, the tail consists of values of the random variable that lie above the lower bound on the tail. It is sometimes convenient to define a new random variable, Y, for outcomes that lie in the tail:

Y = X - u

This new random variable is the distance from the lower bound of the tail, given that a realization of X lies in the tail. One could give a symmetrical definition of the lower tail and a corresponding random variable. Table 4 shows how many samples in my realization of the random sample, defined above, happen to come from the Gaussian distribution with the larger standard deviation, where, the parameter q is taken to be 10%.

Table 4: Variates in Tails and Center
Number in Lower Tail18
Number in Center23
Number in Upper Tail16
Percentage of Lower Tail from 2nd Distribution36.7%
Percentage of Center from 2nd Distribution5.7%
Percentage of Upper Tail from 2nd Distribution32.7%

For y > 0, the conditional probability that X exceeds any given value in the tail, given that X lies in the tail is:

Pr(X > y + u | X > u) = Pr[(X > y + u) and (X > u)]/Pr(X > u)

The above formula simply follows from the definition of conditional probability. The second clause in the "and" expression is redundant. So the above can be rewritten as:

Pr(X > y + u | X > u) = Pr(X > y + u)/Pr(X > u), y > 0

Let G(y) be the CDF for the distribution for the random variable Y. One can then rewrite the above formula as follows"

1 - G(y) = [1 - F(y + u)]/[1 - F(u)], y > 0

Substituting for the definition of the parameter q, one obtains:

F(y + u) = (1 - q) + q G(y), y > 0


F(x) = (1 - q) + q G(x - u), x > u

The above two expressions relate the CDFs for the distributions of the random variables X and Y.

3.2 Generalized Pareto Distribution

A theorem states that if X is a continuous random variable, the distribution of the tail is from a Generalized Pareto Distribution with the following CDF:

G(y) = 1 - [1 + (c/a)y]-1/c

The parameter a is called the scale parameter, and it must be positive. The parameter c is the shape parameter. It can take on any real number. When the shape parameter is zero, the Generalized Pareto Distribution reduces, by a limit theorem, to the exponential distribution.

Below, I will need the following expression for the Probability Density Function (PDF) for the Generalized Pareto Distribution:

g(y, a, c) = (1/a)[1 + (c/a)y]-(1 + c)/c

The PDF is the derivative of the CDF. For any set A to which a probability can be assigned, the probability that Y lies in A is the integral, over A, of the PDF for Y.

3.3 Parameter Estimates

The parameters defining the upper tail are easily estimated. Let r be an exogenously specified number of variates in the tail. The lower bound on the upper tail is estimated as:

uestimate = X(n - r)

The corresponding tail probability is estimated as:

qestimate = r/n

Several methods exist for estimating the scale and shape parameters for the Generalized Pareto Distribution. I chose to apply the method of maximum likelihood. Since the random variables in a random sample are stochastically independent, their joint PDF is merely the product of the their individual PDFs. The log-likelihood function is the natural logarithm of the joint PDF, considered as a function of the parameters of the PDF.

ln g(a, c) = ln g(y1, a, c) + ... ln g(yr, a, c)

Maximum likelihood estimates are the values of the parameters that maximize the log-likelihood function for the observed realization of the random sample. I found these estimates by applying the Nelder-Mead algorithm to the additive inverse of the log-likelihood function. Table 5 shows estimates for the example.

Table 5: Estimates for Upper Tail Distribution
Tail Probability (q)10%
Lower Bound on Tail (u)1.368
Scale Parameter (a)0.7332
Shape Parameter (c)0.2144

The above has described how to estimate parameters for a distribution characterizing a tail of any continuous distribution. Given these estimates, one can calculate the conditional probability that Y lies above any value in the tail. Figure 2 plots this probability for the example. Notice that this probability is noticeably higher, for much of the tail, for the mixture distribution, as compared to the probability found from the Gaussian distribution with the smaller standard deviation in the mixture. And the Kolmogorov-Smirnov goodness-of-fit would not have led one to reject estimates from the first Gaussian distribution. But the estimates from Extreme Value Theory are closer to the higher (and correct) probabilities from the true theoretical distribution.

Figure 2: Tail Probabilities

4.0 Conclusion

This post has illustrated:

  • A probability distribution in which the central part of the distribution's support tends to behave differently from the tails.
  • The difficultly in rejecting the hypothesis that data is drawn from the distribution characterizing the central tendency of the data, with no account being taken of heavy tails.
  • A method, applicable to any continuous random variable, for estimating a tail distribution.
  • Such estimation yielding an appreciably larger estimate for a tail probability than the distribution characterizing the central tendency.

  • J. B. Broadwater and R, Chellappa (2010). Adaptive Threshold Estimation via Extreme Value Theory, IEEE Transactions on Signal Processing, V. 58, No. 2 (Feb.): pp. 490-500.
  • Damon Levine (2009). Modeling Tail Behavior with Extreme Value Theory, Risk Management Issue. 17.
  • R. V. Hogg and A. T. Craig (1978). Introduction to Mathematical Statistics, Fourth edition, Macmillan.
  • A. Ozturk, P. R. Chakravarthi, and D. D. Weiner (). On Determining the Radar Threshold for Non-Gaussian Process from Experimental Data, IEEE Transactions on Information Theory, V. 42, No. 4 (July): pp. 1310-1316.
  • James Pickands III (1975). Statistical Inference Using Extreme Order Statistics, Annals of Statistics, V. 3, No. 1: pp. 119-131.

Wednesday, April 09, 2014

Illusions Generated By Markets Like Those Created By Language On Holiday

I have been reading a book, edited by Gavin Kitching and Nigel Pleasants, comparing and contrasting Ludwig Wittgenstein and Karl Marx. This is the later Wittgenstein of the Philosophical Investigations, not of the Tractatus. The authors of the papers from the conference generating this work do not seem too concerned with arguments about the differences between the young Marx and the mature Marx, albeit many quote a passage from the German Ideology about language. (I think this post is more disorganized than many others here.)

Anyways, I want to first consider a reading of Capital, consonant with the approach of Friedrich Engels and the Second International, but at variance with an analogy to Wittgenstein's later philosophy. One might think of the labor theory of value as a scientific approach revealing hidden forces and structures that are at a deeper level than observed empirical reality. Think about how, for example, physicists have an atomic theory that explains why tables are hard and water is wet. Even though a table may be seem solid, we know, if we accept science, that it is mostly empty space. Somewhere Bertrand Russell writes something like, "Naive realism leads to physics, and physics shows naive realism is wrong. Hence naive realism is false". Similarly, you may think purchases and sales on markets under capitalism are made between equals, freely contracting. But the science of Marxism reveals an underlying reality in which the source of profits is the exploitation of the workers.

Wittgenstein, in rejecting his early approach to language, rejects the idea of a decontextualized analysis of the sentences of our languages into an ultimate underlying uniform atomic structure which explains their meaning. Rather, in his later philosophy, he gathers togethers descriptions of the use of language, to dispel and dissolve the illusions characteristic of traditional philosophy. He is hostile to ideal of an ultimate essence for meaning, and points out the multifarious uses to which language is put. Some of his famous aphorisms include, "Nothing is hidden" and his explanation of the point of his philosophical investigations as "To show the fly the way out of the fly bottle". Some of his descriptions are not from actually existing societies, but from imagined primitive societies. Some of these imagined societies are described near the beginning of the Philosophical Investigations, much as in the first chapter of Piero Sraffa's Production of Commodities by Means of Commodities.

Can Marx be read in an analogous manner, as attempting to dispel illusions, while claiming that no hidden essence or foundation underlies capitalist economies? Such a reading, I think, will emphasize Marx's remarks on commodity fetishism and "real illusions" that come with non-reflective participation in a market economy. It also makes sense of Marx's literary style. Both Marx and Wittgenstein are attempting to encourage a fundamental change so that our form of life will not generate these illusions.

Perhaps such a reading is in tension with the view of Marx's account of exploitation as descriptive, not normative. What about Wittgenstein's saying that philosophy "leaves everything as it is"? How can one read Wittgenstein and Marx as pursuing complementary projects when Marx writes, "Philosophers have hitherto only interpreted the world in various ways; the point is to change it"? Various essays in this book address these issues. I guess what concerns me more is Marx's Hegelian style, quite different from Wittgenstein. (I rely on English translations.)

This book also alerted me to some issues in Wittgenstein interpretation. When Wittgenstein writes of a form of life, is he writing of human life in general (in contrast, say, to the form of life of a lion)? Or would different human cultures and societies have different forms of life? Does Wittgenstein encourage a political quietism since he does not provide an external standpoint outside of language to criticize rules? (I think the last objection draws lines more firm than is compatible with Wittgenstein's comments on family resemblances.)

I also have two new books to look up, Gellner (1959) and Winch (1963). Gellner sounds like an unscholarly polemic that yet was influential in turning philosophy away from the linguistic philosophy of the later Wittgenstein, J. L. Austen, and Gilbert Ryle. Winch seems to argue those studying society must use the terms that members of a culture use, and with the same understanding. So perhaps this is a Wittgensteinian argument that social science is not possible, or at least must lower its aims. But I have not read it yet.

  • Ernest Gellner (1959). Words and Things: A Critical Account of Linguistic Philosophy and a Study in Ideology London: Gollancz.
  • Gavin Kitching and Nigel Pleasants (editors) (2002). Marx and Wittgenstein: Knowledge, Morality and Politics, London: Routledge
  • Peter Winch (1963). The Idea of a Social Science, London: Routledge and Kegan Paul.