1.0 IntroductionBrian Arthur and Paul David, two teachers at Stanford about a decade ago, have attracted a certain amount of popular attention with the concept of
path dependence. Arthur, for example, has had a certain amount of influence on policy. This post is an attempt to explain the concept, primarily as it applies to stochastic processes. Path dependence is one way of formalizing the idea that history matters.
2.0 A Stochastic ProcessPath dependence relates to events economists choose to model as random. This modeling choice does not imply that economists think such events are necessarily the result of the modeled agents acting capriciously, irrationally, or mistakenly. Consider such childhood games as Odds and Evens or PaperRockScissors. The optimal strategy for each player is to choose their move randomly. The winner of such games will vary randomly. Notice that apart from the players' choices, these games are deterministic. No dice are being rolled or cards shuffled.
A stochastic process is merely an indexed set of random variables:
{ X( 1 ), X( 2 ), X( 3 ), ... }
The index often represents time. The value of a given one of these random variables is frequently referred to as the state of the process at that time.
I consider an example of a pathdependent stochastic process that does not exhibit certain other properties. This stochastic process can be in one of eight states, {
Start,
B,
C,
D,
E,
F,
G,
H}. It begins in the
Start state. State transition probabilities are indicated by
the fractions in Figure 1. For example, consider the probability distribution for the state of the process at the second time step,
X( 2 ). Figure 1 shows that if the process is in the
Start state, the probability that it will transition to state
B is 1/2. Likewise, given that it is in the
Start state, the probability that it will transition to state
C is 1/2. Hence the probability distribution of
X( 2 ) is:
Pr[ X( 2 ) = B ] = 1/2
Pr[ X( 2 ) = C ] = 1/2
Notice that the probabilities leading out from each state total unity. It is left as an exercise for the reader to confirm that the proability distribution of
X( 3 ) is as follows:
Pr[ X( 3 ) = Start ] = 1/3
Pr[ X( 3 ) = B ] = 1/6
Pr[ X( 3 ) = C ] = 1/6
Pr[ X( 3 ) = D ] = 1/6
Pr[ X( 3 ) = G ] = 1/6
I deliberately created this example to exhibit a certain symmetry for the transient states (defined below).

Figure 1: Markov Process State Space 
This process exhibits certain properties that are particularly simple, as well as some properties that complicate analysis. Notice that the state transition properties are invariant across time. Given that the process is in the
Start state, the probability that it will transition to state
B is 1/2, no matter at what time the process may be in the
Start state. It does not matter whether we are considering the initial time step or some later time when the process happens to have returned to the
Start state.
Furthermore, the process is
memoryless. State transition probabilities depend only on the current state, not the history with which the process reached the current state. This property of memorylessness is known as the Markov property. This example is a Markov process.
The Markov property and the assumption of timeinavariant state transition probabilities are simplifying assumptions. One might think relaxation of these assumptions might be one way of showing that "history matters." Since, as will be explained, this example exhibits path dependence, violations of these assumptions are clearly not necessary for path dependence. And Margolis and Liebowitz are incorrect:
"In probability theory, a stochastic process is path dependent if the probability distribution for period t+1 is conditioned on more than the value of the system in period t. ... path independence means that it doesn't matter how you get to a particular point, only that you got there."  Stephen E. Margolis and S. J. Liebowitz (1988)
An interesting classification of sets of states is available for Markov processes, that of transient and absorbing states. Consider the states {
Start,
B,
C}. By assumption, the process starts in a state within this set. But eventually the process will lie in a state outside this set. Once this happens, the process will never return to this set. States
Start,
B, and
C are known as transient states. On the other hand, consider the states {
D,
E,
F}. Once the process is in a state in this set, the process will never depart from a state in the set. Furthermore, if the process is in a state in this set, it will eventually visit all other states in the set. {
D,
E,
F} is a set of asorbing states. This is not the unique set of absorbing states for this process. {
G,
H} is also a set of absorbing states.
Consider the problem of estimating the probability distribution over the states at some large number of time step, say
X(10,000), after the start. The probability that the process is in a transient state is negligible. One might be tempted to estimate
X(10,000) by the proportion of time steps that a single realization of the process spends in each state. A realization might be:
Start, B, D, E, F, D, E, F, E, F, D
The column for the first realization in Table 1 shows the proportion of time spent in each state in this realization as the number of time steps in the realization increases without bound. (Although transient states are observed in a realization, the proportion of time spent in transient states in such an infinite sequence is zero.)
Table 1: Limiting State ProbabilitiesState  From One Realization  From Another Realization  Over All Realizations 
Start  0  0  0 
B  0  0  0 
C  0  0  0 
D  1/3  0  1/6 
E  1/3  0  1/6 
F  1/3  0  1/6 
G  0  1/2  1/4 
H  0  1/2  1/4 
If the process were ergodic, the limiting distribution in the first column in Table 1 would be a good estimate of the probability distribution for the state of the process at some time after transient behavior was likely to be completed. In this case, though, another realization might yield the limiting distribution shown for the second realization in Table 1. The probability distribution, in fact, at a given time, as that time increases without bound would have positive probabilities for all nontransient states. The last column in Table 1 shows this limiting probability distribution.
In general, estimates of parameters of the underlying probability distributions can be made across multiple realizations of the process or from a single realization. In a
nonergodic process, such estimates will not converge as the number of realizations or the length of the single realization increases.
Another definition of ergodicity involves what states can be eventually reached from each state. In an ergodic process, each state can be eventually reached, with positive probability. In the example, all states can be reached from the transient states
Start,
B, and
C. But states
Start,
B,
C,
G and
H cannot be reached from states
D,
E, and
F. Suppose an external occurrence changes the state transition probabilities. If the process were previously ergodic, this change could not possibly result in states arising that were previously unobserved.
To reiterate, a nonergodic process does not have an unique limiting probability distribution. The applicable limiting distribution of any realization of the process depends on the history of that particular realization. Thus, the process exhibits
path dependence.
Another branch of mathematics deals with deterministic dynamical systems. Such systems are typically defined by systems of differential or difference equations. Sometimes the solutions of such systems can be such that trajectories in phase space diverge, no matter how close they start. This is known as sensitive dependence on initial conditions, popularly known as the "butterfly effect." (The irritating mathematican character in
Jurassic Park mumbles about this stuff.) Notice that path dependence is defined above without drawing on this branch of mathematics. Here, too, Margolis and Liebowitz are mistaken:
”Path dependence is an idea that spilled over to economics from intellectual movements that arose elsewhere. In physics and mathematics the related ideas come from chaos theory. One potential of the nonlinear models of chaos theory is sensitive dependence on initial conditions: Determination, and perhaps lockin, by small, insignificant events."  Stephen E. Margolis and S. J. Liebowitz (1988)
3.0 ConclusionWhy should economists care about the mathematics of path dependence? First, models of path dependence suggest how to construct economic models that overcome frequently criticized characteristics of neoclassical economics. Neoclassical models often depict equilibria. These equilibria are end states of path independent processes. Ever since Veblen, some economists have objected to such models as being teleological and acausal. Models in which path dependence can arise are causal and show neoclassical economics to be a special case.
This claim that neoclassical economics is merely a special case, dependent on a special case assumption of ergodicity, may lead one to wonder about connections with a theory claimed to be the "General Theory." As a matter of fact, Paul Davidson claims that a consideration of nonergodicity is useful in explicating the economics of Keynes. So a second reason economists should be concerned with nonergodicity and path dependence is to further understand possible approaches to macroeconomics.
Third, some economists, e.g. Brian Arthur, have developed specific models of technological change that exhibit nonergodicity. These models, including those of a Polya urns, show how increasing returns can act as positive feedback and lead to path dependence. Inasmuch as these models cast light on economic history, path dependence can be useful for empirical work.
The theory of path dependence raises an empirical question. Are nonergodic stochastic processes useful for modeling any, or any important, economic time series? The evolution of the QWERTY keyboard seems to be an example of a path dependent process. Apparently there were several 19th century typewriter keyboard layouts. QWERTY became the dominant one, seemingly even after the jamming problem that was the rationale for its introduction had been overcome. The historical evidence suggests that with different early choices, one of the other arrangements could have become dominant. The chances that some one or other of these early arrangements can now become dominant seems quite negligible.
This discussion has been carried out without mentioning efficiency.
References W. Brian Arthur (1989) "Competing Technologies, Increasing Returns, and LockIn by Historical Events", Economic Journal, V. 99: 116131.
 W. Brian Arthur (1990) "Positive Feedbacks in the Economy", Scientific American, 262 (February): 9299.
 W. Brian Arthur (1996) "Increasing Returns and the New World of Business", Harvard Business Review.
 Paul A. David (1985) "Clio and the Economics of QWERTY", American Economic Review 75, 2 (May)
 Paul A. David "Path Dependence, Its Critics and the Quest for 'Historical Economics'"
 Paul Davidson (19821983) "Rational Expectations: A Fallacious Foundation for Studying Crucial DecisionMaking Processes", Journal of Post Keynesian Economics, V. 5 (Winter): 182197.
 Stephen E. Margolis and S. J. Liebowitz (1988) "Path Dependence", The New Palgrave Dictionary of Economics and Law, MacMillan.