Thoughts On Economics: Markov Process

Friday, January 23, 2015

Approximating a Continuous Time Markov Process

Figure 1: Rate of Transitions Between States in a Three-State Markov Chain

1.0 Introduction

This post, about Markov processes, does not have much to do with economics. I here define how to approximate a continuous time Markov chain with a discrete time Markov chain. This mathematics is useful for one way of implementing computer simulations involving Markov chains. That is, I want to consider how to start with a continuous time model and synthesize a realization with a small, constant time step.

2.0 Continuous Time Markov Chains

Consider a stochastic process that, at any non-negative time t is in one of N states. Assume this process satisfies the Markov process: the future history of the process after time t depends only on the state of the process at time t, independently of how the process arrived at that state. I consider here only processes with stationary probability distributions for state transitions and for times between transitions. A continuous time Markov chain is specified by a state transition matrix. In this section, I define such a matrix as well as specifying two additional assumptions.

Formally, let P_{i, J} denote the conditional probability that the next transition will be into state j, given that the process is in state i at time zero. (As seen below, in the notation adopted here it matters that these conditional probabilities are not a function of time.) Assume that for each state, the next transition when the process is in that state is into a different state:

P_{i, i} = 0; i = 0, 1, ..., N - 1

Further, assume that for each state, the time to the next transition is from an exponential distribution with the rate of occurrence of state transitions dependent only on the initial state:

F_{i, j}(t) = 1 - e^{- λ_i t}; i, j = 0, 1, ..., N - 1;

where F_{i, j}(t) is the conditional probability that the next transition will be before time t, given that the chain is in state i at time zero and that the next transition will be into state j. In other words, F_{i, j}(t) is the Cumulative Distribution Function (CDF) for the specified random variable. Under the above definitions, the stochastic process is a continuous time, finite state Markov chain.

Let P_{i, j}(t) be the conditional probability that the chain is in state j at time t, given that the chain is in state i at time zero. These conditional probabilities satisfy Kolmogorov's forward equation:

,

where the transition rate matrix Q is defined to be:

The elements in each row of the transition rate matrix sum to zero. Kolmogorov's forward equation can be expressed in scalar form:

The above equation applies to continuous time Markov chains with a countably infinite number of states only under certain special conditions.

Steady state probabilities of this Markov chain satisfy:

p Q = 0,

where p is a row vector in which each element is the steady-state probability that the chain is in the corresponding state.

3.0 Discrete Time Approximation

A discrete time Markov chain is specified by a state transition matrix A, where a_{i, j} is the probability that the chain will transition in a time step from state i to state j, given that the chain is in state i at the start of the time step. Steady state probabilities for a discrete time Markov chain satisfy:

p A = p

The above equation compares and contrasts with how steady state probabilities relate to the transition rate matrix in a continuous time Markov chain.

Let the time step h be small enough that the probability of the continuous time Markov chain undergoing two or more transitions in a single time step is negligible. In other words, the following probability, calculated from a Poisson distribution, is close to unity for all states i:

P(0 or 1 transitions in time h | Chain in state i at time 0) =
(1 + λ_i h) e^{- λ_i h}

The probability that the chain remains in a given state for a time step is the probability that no transitions occur during that time step, given the state of the chain at the start of the time step. This probability is also found from a Poisson distribution:

a_{i, i} = e^{- λ_i h} = e^{- q_{i, i} h}; i = 0, 1, ..., N - 1

The probability that the chain transitions to state j, given the chain is in state i at the start of the time step, is the product of:

The probability that a transition occurs during that time step, and
The conditional probability that the next transition will be into state j, given the chain is in state i at the start of the time step.

The following equation specifies this probability:

a_{i, j} = (1 - a_{i, i})P_{i, j} = (1 - a_{i, i}) q_{i, j}/(- q_{i, i}); i ≠ j

These equations allow one to write a computer program to synthesize a realization from a finite state Markov chain, given the parameters of a continuous time, finite state Markov chain. Such a program will be based on a discrete time approximation.

4.0 An Example

Consider a three-state, continuous time Markov chain. Figure 1 shows the rate of transitions between the various states. The transition rate matrix is:

To discretize time, choose a small time step h such that, for all states i, the following probabilities are approximately unity:

P(0 or 1 transitions in time h | Chain in state 0 at time 0) =
[1 + (λ_{0, 1} + λ_{0, 2})h] e^{-(λ_{0, 1} + λ_{0, 2})h}

P(0 or 1 transitions in time h | Chain in state 1 at time 0) =
[1 + (λ_{1, 0} + λ_{1, 2})h] e^{-(λ_{1, 0} + λ_{1, 2})h}

P(0 or 1 transitions in time h | Chain in state 2 at time 0) =
[1 + (λ_{2, 0} + λ_{2, 1})h] e^{-(λ_{2, 0} + λ_{2, 1})h}

The state transition matrix A for the discrete-time Markov chain is:

I have not tested the above with concrete values for a continuous time Markov chain.

Reference

S. M. Ross (1970). Applied Probability Models with Optimization Applications. San Francisco: Holden-Day

Sunday, July 27, 2008

Two Roads Diverged In A Yellow Wood, And Sorry I Could Not Travel Both And Be One Traveler

1.0 Introduction
Brian Arthur and Paul David, two teachers at Stanford about a decade ago, have attracted a certain amount of popular attention with the concept of path dependence. Arthur, for example, has had a certain amount of influence on policy. This post is an attempt to explain the concept, primarily as it applies to stochastic processes. Path dependence is one way of formalizing the idea that history matters.

2.0 A Stochastic Process
Path dependence relates to events economists choose to model as random. This modeling choice does not imply that economists think such events are necessarily the result of the modeled agents acting capriciously, irrationally, or mistakenly. Consider such childhood games as Odds and Evens or Paper-Rock-Scissors. The optimal strategy for each player is to choose their move randomly. The winner of such games will vary randomly. Notice that apart from the players' choices, these games are deterministic. No dice are being rolled or cards shuffled.

A stochastic process is merely an indexed set of random variables:

{ X( 1 ), X( 2 ), X( 3 ), ... }

The index often represents time. The value of a given one of these random variables is frequently referred to as the state of the process at that time.

I consider an example of a path-dependent stochastic process that does not exhibit certain other properties. This stochastic process can be in one of eight states, {Start, B, C, D, E, F, G, H}. It begins in the Start state. State transition probabilities are indicated by
the fractions in Figure 1. For example, consider the probability distribution for the state of the process at the second time step, X( 2 ). Figure 1 shows that if the process is in the Start state, the probability that it will transition to state B is 1/2. Likewise, given that it is in the Start state, the probability that it will transition to state C is 1/2. Hence the probability distribution of X( 2 ) is:

Pr[ X( 2 ) = B ] = 1/2

Pr[ X( 2 ) = C ] = 1/2

Notice that the probabilities leading out from each state total unity. It is left as an exercise for the reader to confirm that the proability distribution of X( 3 ) is as follows:

Pr[ X( 3 ) = Start ] = 1/3

Pr[ X( 3 ) = B ] = 1/6

Pr[ X( 3 ) = C ] = 1/6

Pr[ X( 3 ) = D ] = 1/6

Pr[ X( 3 ) = G ] = 1/6

I deliberately created this example to exhibit a certain symmetry for the transient states (defined below).

Figure 1: Markov Process State Space

This process exhibits certain properties that are particularly simple, as well as some properties that complicate analysis. Notice that the state transition properties are invariant across time. Given that the process is in the Start state, the probability that it will transition to state B is 1/2, no matter at what time the process may be in the Start state. It does not matter whether we are considering the initial time step or some later time when the process happens to have returned to the Start state.

Furthermore, the process is memoryless. State transition probabilities depend only on the current state, not the history with which the process reached the current state. This property of memorylessness is known as the Markov property. This example is a Markov process.

The Markov property and the assumption of time-inavariant state transition probabilities are simplifying assumptions. One might think relaxation of these assumptions might be one way of showing that "history matters." Since, as will be explained, this example exhibits path dependence, violations of these assumptions are clearly not necessary for path dependence. And Margolis and Liebowitz are incorrect:

"In probability theory, a stochastic process is path dependent if the probability distribution for period t+1 is conditioned on more than the value of the system in period t. ... path independence means that it doesn't matter how you get to a particular point, only that you got there." -- Stephen E. Margolis and S. J. Liebowitz (1988)

An interesting classification of sets of states is available for Markov processes, that of transient and absorbing states. Consider the states {Start, B, C}. By assumption, the process starts in a state within this set. But eventually the process will lie in a state outside this set. Once this happens, the process will never return to this set. States Start, B, and C are known as transient states. On the other hand, consider the states {D, E, F}. Once the process is in a state in this set, the process will never depart from a state in the set. Furthermore, if the process is in a state in this set, it will eventually visit all other states in the set. {D, E, F} is a set of asorbing states. This is not the unique set of absorbing states for this process. {G, H} is also a set of absorbing states.

Consider the problem of estimating the probability distribution over the states at some large number of time step, say X(10,000), after the start. The probability that the process is in a transient state is negligible. One might be tempted to estimate X(10,000) by the proportion of time steps that a single realization of the process spends in each state. A realization might be:

Start, B, D, E, F, D, E, F, E, F, D

The column for the first realization in Table 1 shows the proportion of time spent in each state in this realization as the number of time steps in the realization increases without bound. (Although transient states are observed in a realization, the proportion of time spent in transient states in such an infinite sequence is zero.)

**Table 1: Limiting State Probabilities**
State	From One Realization	From Another Realization	Over All Realizations
Start	0	0	0
B	0	0	0
C	0	0	0
D	1/3	0	1/6
E	1/3	0	1/6
F	1/3	0	1/6
G	0	1/2	1/4
H	0	1/2	1/4

If the process were ergodic, the limiting distribution in the first column in Table 1 would be a good estimate of the probability distribution for the state of the process at some time after transient behavior was likely to be completed. In this case, though, another realization might yield the limiting distribution shown for the second realization in Table 1. The probability distribution, in fact, at a given time, as that time increases without bound would have positive probabilities for all non-transient states. The last column in Table 1 shows this limiting probability distribution.

In general, estimates of parameters of the underlying probability distributions can be made across multiple realizations of the process or from a single realization. In a nonergodic process, such estimates will not converge as the number of realizations or the length of the single realization increases.

Another definition of ergodicity involves what states can be eventually reached from each state. In an ergodic process, each state can be eventually reached, with positive probability. In the example, all states can be reached from the transient states Start, B, and C. But states Start, B, C, G and H cannot be reached from states D, E, and F. Suppose an external occurrence changes the state transition probabilities. If the process were previously ergodic, this change could not possibly result in states arising that were previously unobserved.

To re-iterate, a nonergodic process does not have an unique limiting probability distribution. The applicable limiting distribution of any realization of the process depends on the history of that particular realization. Thus, the process exhibits path dependence.

Another branch of mathematics deals with deterministic dynamical systems. Such systems are typically defined by systems of differential or difference equations. Sometimes the solutions of such systems can be such that trajectories in phase space diverge, no matter how close they start. This is known as sensitive dependence on initial conditions, popularly known as the "butterfly effect." (The irritating mathematican character in Jurassic Park mumbles about this stuff.) Notice that path dependence is defined above without drawing on this branch of mathematics. Here, too, Margolis and Liebowitz are mistaken:

”Path dependence is an idea that spilled over to economics from intellectual movements that arose elsewhere. In physics and mathematics the related ideas come from chaos theory. One potential of the non-linear models of chaos theory is sensitive dependence on initial conditions: Determination, and perhaps lock-in, by small, insignificant events." -- Stephen E. Margolis and S. J. Liebowitz (1988)

3.0 Conclusion
Why should economists care about the mathematics of path dependence? First, models of path dependence suggest how to construct economic models that overcome frequently criticized characteristics of neoclassical economics. Neoclassical models often depict equilibria. These equilibria are end states of path independent processes. Ever since Veblen, some economists have objected to such models as being teleological and acausal. Models in which path dependence can arise are causal and show neoclassical economics to be a special case.

This claim that neoclassical economics is merely a special case, dependent on a special case assumption of ergodicity, may lead one to wonder about connections with a theory claimed to be the "General Theory." As a matter of fact, Paul Davidson claims that a consideration of nonergodicity is useful in explicating the economics of Keynes. So a second reason economists should be concerned with nonergodicity and path dependence is to further understand possible approaches to macroeconomics.

Third, some economists, e.g. Brian Arthur, have developed specific models of technological change that exhibit nonergodicity. These models, including those of a Polya urns, show how increasing returns can act as positive feedback and lead to path dependence. Inasmuch as these models cast light on economic history, path dependence can be useful for empirical work.

The theory of path dependence raises an empirical question. Are nonergodic stochastic processes useful for modeling any, or any important, economic time series? The evolution of the QWERTY keyboard seems to be an example of a path dependent process. Apparently there were several 19th century typewriter keyboard layouts. QWERTY became the dominant one, seemingly even after the jamming problem that was the rationale for its introduction had been overcome. The historical evidence suggests that with different early choices, one of the other arrangements could have become dominant. The chances that some one or other of these early arrangements can now become dominant seems quite negligible.

This discussion has been carried out without mentioning efficiency.

References

W. Brian Arthur (1989) "Competing Technologies, Increasing Returns, and Lock-In by Historical Events", Economic Journal, V. 99: 116-131.
W. Brian Arthur (1990) "Positive Feedbacks in the Economy", Scientific American, 262 (February): 92-99.
W. Brian Arthur (1996) "Increasing Returns and the New World of Business", Harvard Business Review.
Paul A. David (1985) "Clio and the Economics of QWERTY", American Economic Review 75, 2 (May)
Paul A. David "Path Dependence, Its Critics and the Quest for 'Historical Economics'"
Paul Davidson (1982-1983) "Rational Expectations: A Fallacious Foundation for Studying Crucial Decision-Making Processes", Journal of Post Keynesian Economics, V. 5 (Winter): 182-197.
Stephen E. Margolis and S. J. Liebowitz (1988) "Path Dependence", The New Palgrave Dictionary of Economics and Law, MacMillan.

Thoughts On Economics

Friday, January 23, 2015

Approximating a Continuous Time Markov Process

Sunday, July 27, 2008

Two Roads Diverged In A Yellow Wood, And Sorry I Could Not Travel Both And Be One Traveler

Followers

Labels

My Blog List

Blog Archive

Thoughts On Economics

Friday, January 23, 2015

Approximating a Continuous Time Markov Process

Sunday, July 27, 2008

Two Roads Diverged In A Yellow Wood, And Sorry I Could Not Travel Both And Be One Traveler

Subscribe To

Followers

Labels

My Blog List

Blog Archive