Namespaces
Variants
Actions

Difference between revisions of "Dynamic programming"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
(latex details)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 +
<!--
 +
d0342701.png
 +
$#A+1 = 50 n = 0
 +
$#C+1 = 50 : ~/encyclopedia/old_files/data/D034/D.0304270 Dynamic programming
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
A branch of mathematics studying the theory and the methods of solution of multi-step problems of optimal control.
 
A branch of mathematics studying the theory and the methods of solution of multi-step problems of optimal control.
  
Line 5: Line 17:
 
The methods of dynamic programming are a part of the methods of [[Operations research|operations research]], and are utilized in optimal planning problems (e.g. in problems of the optimal distribution of available resources, in the theory of inventory management, replacement of equipment, etc.), and in solving numerous technical problems (e.g. control of successive chemical processes, optimal design of structural highway layers, etc.).
 
The methods of dynamic programming are a part of the methods of [[Operations research|operations research]], and are utilized in optimal planning problems (e.g. in problems of the optimal distribution of available resources, in the theory of inventory management, replacement of equipment, etc.), and in solving numerous technical problems (e.g. control of successive chemical processes, optimal design of structural highway layers, etc.).
  
The basic idea will now be illustrated. Let the process of controlling a given system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342701.png" /> consist of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342702.png" /> stages; at the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342703.png" />-th step let the control <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342704.png" /> convert the system from the state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342705.png" /> attained during the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342706.png" />-th step to the new state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342707.png" />. This transition process is realized by a given function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342708.png" />, and the new state is determined by the values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d0342709.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427010.png" />:
+
The basic idea will now be illustrated. Let the process of controlling a given system $  X $
 +
consist of $  m $
 +
stages; at the $  i $-
 +
th step let the control $  y _ {i} $
 +
convert the system from the state $  x _ {i-1} $
 +
attained during the $  ( i- 1) $-
 +
th step to the new state $  x _ {i} $.  
 +
This transition process is realized by a given function $  f _ {i} ( x , y ) $,  
 +
and the new state is determined by the values $  x _ {i-1} $,  
 +
$  y _ {i} $:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427011.png" /></td> </tr></table>
+
$$
 +
x _ {i}  = f _ {i} ( x _ {i-1} , y _ {i} ) .
 +
$$
  
Thus, the control operations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427012.png" /> convert the system from its initial state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427013.png" /> into the final state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427014.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427015.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427016.png" /> are the sets of feasible initial and final states of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427017.png" />. A possible way of stating the extremal problem is given below. The initial state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427018.png" /> is known. The task is to select controls <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427019.png" /> so that the system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427020.png" /> is converted to a feasible final state and, at the same time, the objective function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427021.png" /> attains its maximal value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427022.png" />, i.e.
+
Thus, the control operations $  y _ {1} \dots y _ {m} $
 +
convert the system from its initial state $  x _ {0} \in X _ {0} $
 +
into the final state $  x _ {m} \in X _ {m} $,  
 +
where $  X _ {0} $
 +
and $  X _ {m} $
 +
are the sets of feasible initial and final states of $  X $.  
 +
A possible way of stating the extremal problem is given below. The initial state $  x _ {0} $
 +
is known. The task is to select controls $  y _ {1} \dots y _ {m} $
 +
so that the system $  X $
 +
is converted to a feasible final state and, at the same time, the objective function $  F ( x _ {0} , y _ {1} \dots x _ {m-1} , y _ {m} , x _ {m} ) $
 +
attains its maximal value $  F ^ { * } $,  
 +
i.e.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427023.png" /></td> </tr></table>
+
$$
 +
F ^ { * }  = \max _ {y _ {1} \dots y _ {m} } \
 +
F ( x _ {0} , y _ {1} \dots x _ {m-1} , y _ {m} , x _ {m} ) .
 +
$$
  
 
An important feature of dynamic programming is that it is applicable to additive objective functions only. In the example above, this means that
 
An important feature of dynamic programming is that it is applicable to additive objective functions only. In the example above, this means that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427024.png" /></td> </tr></table>
+
$$
 +
= \sum _ {i = 1 } ^ { m }  \phi _ {i} ( x _ {i-1} , y _ {i} ) .
 +
$$
  
Another requirement of the method is the absence of  "after-effects"  in the problem, i.e. the solutions (controls) selected for a given step may only affect the state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427025.png" /> of the system at the moment <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427026.png" />. In other words, processes of the type
+
Another requirement of the method is the absence of  "after-effects"  in the problem, i.e. the solutions (controls) selected for a given step may only affect the state $  x _ {i} $
 +
of the system at the moment $  i $.  
 +
In other words, processes of the type
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427027.png" /></td> </tr></table>
+
$$
 +
x _ {i}  = f _ {i} ( x _ {i-1} , y _ {i} \dots y _ {1} , x _ {0} ) ,
 +
$$
  
 
are not considered. Both these restrictions may be weakened, but only by unduly complicating the method.
 
are not considered. Both these restrictions may be weakened, but only by unduly complicating the method.
  
Conventional methods of mathematical analysis are altogether inapplicable to problems of dynamic programming or else involve very laborious computations. The method of dynamic programming is based on the optimality principle formulated by R. Bellman: Assume that, in controlling a discrete system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427028.png" />, a certain control on the discrete system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427029.png" />, and hence the trajectory of states <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427030.png" />, have already been selected, and suppose it is required to terminate the process, i.e. to select <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427031.png" /> (and hence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427032.png" />); then, if the terminal part of the process is not optimal, i.e. if the maximum of the function
+
Conventional methods of mathematical analysis are altogether inapplicable to problems of dynamic programming or else involve very laborious computations. The method of dynamic programming is based on the optimality principle formulated by R. Bellman: Assume that, in controlling a discrete system $  X $,  
 +
a certain control on the discrete system $  y _ {1} \dots y _ {k} $,  
 +
and hence the trajectory of states $  x _ {0} \dots x _ {k} $,  
 +
have already been selected, and suppose it is required to terminate the process, i.e. to select $  y _ {k+1} \dots y _ {m} $(
 +
and hence $  x _ {k+1} \dots x _ {m} $);  
 +
then, if the terminal part of the process is not optimal, i.e. if the maximum of the function
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427033.png" /></td> </tr></table>
+
$$
 +
F _ {k}  = \sum _ {i = k + 1 } ^ { m }  \phi _ {i} ( x _ {i-1} , y _ {i} ) ,
 +
$$
  
 
is not attained by it, the process as a whole will not be optimal.
 
is not attained by it, the process as a whole will not be optimal.
  
By using the optimality principle, the basic functional relationship is readily obtained. One determines the sequence of functions of the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427034.png" />:
+
By using the optimality principle, the basic functional relationship is readily obtained. One determines the sequence of functions of the variable $  x $:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427035.png" /></td> </tr></table>
+
$$
 +
\omega _ {m} ( x)  = 0 ,\  \omega _ {k-1} ( x)  = \
 +
\max _ { y } [ \phi _ {k} ( x , y ) + \omega _ {k} ( f _ {k} ( x , y ) ) ] ,
 +
$$
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427036.png" />. Here, the maximum is taken over all the control operations feasible at step <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427037.png" />. The relation defining the dependence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427038.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427039.png" /> is known as the [[Bellman equation|Bellman equation]]. The meaning of the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427040.png" /> is clear: If, in step <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427041.png" />, the system appears to be in state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427042.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427043.png" /> is the maximum possible value of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427044.png" />. Simultaneously with the construction of the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427045.png" /> one also finds the conditional optimal controls <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427046.png" /> at each step (i.e. the values of the optimal controls under the assumption that the system is in the state <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427047.png" /> at the step <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427048.png" />). The final optimal controls are found by successively computing the values
+
$  k= 1 \dots m $.  
 +
Here, the maximum is taken over all the control operations feasible at step $  k $.  
 +
The relation defining the dependence of $  \omega _ {k-1} $
 +
on $  \omega _ {k} $
 +
is known as the [[Bellman equation|Bellman equation]]. The meaning of the functions $  \omega _ {k-1} ( x) $
 +
is clear: If, in step $  k - 1 $,  
 +
the system appears to be in state $  x $,  
 +
then $  \omega _ {k-1} ( x) $
 +
is the maximum possible value of the function $  F $.  
 +
Simultaneously with the construction of the functions $  \omega _ {k-1} ( x) $
 +
one also finds the conditional optimal controls $  y _ {k} ( x) $
 +
at each step (i.e. the values of the optimal controls under the assumption that the system is in the state $  x $
 +
at the step $  k - 1 $).  
 +
The final optimal controls are found by successively computing the values
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427049.png" /></td> </tr></table>
+
$$
 +
\omega _ {0} ( x _ {0} )  = F ^ { * } ,\  y _ {1} , x _ {1} \dots y _ {m} , x _ {m} .
 +
$$
  
The following property of the method of dynamic programming is evident from the above: It solves not merely one specific problem for a given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d034/d034270/d03427050.png" />, but all problems of this type, for all initial states. Since the numerical realization of the method of dynamic programming is laborious, i.e. requires a large storage capacity of the computer, it is expediently employed whenever certain typical problems have to be solved a large number of times (e.g. the optimum flight conditions for an aircraft under changing weather conditions). Even though dynamic programming problems are formulated for discrete processes, the method may often be successfully employed in solving problems with continuous parameters.
+
The following property of the method of dynamic programming is evident from the above: It solves not merely one specific problem for a given $  x _ {0} $,  
 +
but all problems of this type, for all initial states. Since the numerical realization of the method of dynamic programming is laborious, i.e. requires a large storage capacity of the computer, it is expediently employed whenever certain typical problems have to be solved a large number of times (e.g. the optimum flight conditions for an aircraft under changing weather conditions). Even though dynamic programming problems are formulated for discrete processes, the method may often be successfully employed in solving problems with continuous parameters.
  
 
Dynamic programming furnished a novel approach to many problems of [[Variational calculus|variational calculus]].
 
Dynamic programming furnished a novel approach to many problems of [[Variational calculus|variational calculus]].
Line 49: Line 118:
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  R. Bellman,  "Dynamic programming" , Princeton Univ. Press  (1957)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  V.G. Boltyanskii,  "Mathematical methods of optimal control" , Holt, Rinehart &amp; Winston  (1971)  (Translated from Russian)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  G.F. Hadley,  "Nonlinear and dynamic programming" , Addison-Wesley  (1964)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  G. Hadley,  T. Whitin,  "Analysis of inventory systems" , Prentice-Hall  (1963)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  R.A. Howard,  "Dynamic programming and Markov processes" , M.I.T.  (1960)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  R. Bellman,  "Dynamic programming" , Princeton Univ. Press  (1957)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  V.G. Boltyanskii,  "Mathematical methods of optimal control" , Holt, Rinehart &amp; Winston  (1971)  (Translated from Russian)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  G.F. Hadley,  "Nonlinear and dynamic programming" , Addison-Wesley  (1964)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  G. Hadley,  T. Whitin,  "Analysis of inventory systems" , Prentice-Hall  (1963)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  R.A. Howard,  "Dynamic programming and Markov processes" , M.I.T.  (1960)</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====

Latest revision as of 16:50, 20 January 2024


A branch of mathematics studying the theory and the methods of solution of multi-step problems of optimal control.

In dynamic programming of controlled processes the objective is to find among all possible controls a control that gives the extremal (maximal or minimal) value of the objective function — some numerical characteristic of the process. A multi-stage process is one the structure of which consists of several steps, or else a process in which the control is subdivided into a sequence of successive stages (steps), which usually correspond to different moments in time. In a number of problems the multi-stage nature is dictated by the process itself (for example, in the determination of the optimal dimensions of the steps of a multi-stage rocket or in finding the most economical way of flying an aircraft), but it may also be artificially introduced in order to be able to apply methods of dynamic programming. Thus, the meaning of "programming" in dynamic programming is making decisions and planning, while the word "dynamic" indicates the importance of time and of the sequence of execution of the operations in the process and methods under study.

The methods of dynamic programming are a part of the methods of operations research, and are utilized in optimal planning problems (e.g. in problems of the optimal distribution of available resources, in the theory of inventory management, replacement of equipment, etc.), and in solving numerous technical problems (e.g. control of successive chemical processes, optimal design of structural highway layers, etc.).

The basic idea will now be illustrated. Let the process of controlling a given system $ X $ consist of $ m $ stages; at the $ i $- th step let the control $ y _ {i} $ convert the system from the state $ x _ {i-1} $ attained during the $ ( i- 1) $- th step to the new state $ x _ {i} $. This transition process is realized by a given function $ f _ {i} ( x , y ) $, and the new state is determined by the values $ x _ {i-1} $, $ y _ {i} $:

$$ x _ {i} = f _ {i} ( x _ {i-1} , y _ {i} ) . $$

Thus, the control operations $ y _ {1} \dots y _ {m} $ convert the system from its initial state $ x _ {0} \in X _ {0} $ into the final state $ x _ {m} \in X _ {m} $, where $ X _ {0} $ and $ X _ {m} $ are the sets of feasible initial and final states of $ X $. A possible way of stating the extremal problem is given below. The initial state $ x _ {0} $ is known. The task is to select controls $ y _ {1} \dots y _ {m} $ so that the system $ X $ is converted to a feasible final state and, at the same time, the objective function $ F ( x _ {0} , y _ {1} \dots x _ {m-1} , y _ {m} , x _ {m} ) $ attains its maximal value $ F ^ { * } $, i.e.

$$ F ^ { * } = \max _ {y _ {1} \dots y _ {m} } \ F ( x _ {0} , y _ {1} \dots x _ {m-1} , y _ {m} , x _ {m} ) . $$

An important feature of dynamic programming is that it is applicable to additive objective functions only. In the example above, this means that

$$ F = \sum _ {i = 1 } ^ { m } \phi _ {i} ( x _ {i-1} , y _ {i} ) . $$

Another requirement of the method is the absence of "after-effects" in the problem, i.e. the solutions (controls) selected for a given step may only affect the state $ x _ {i} $ of the system at the moment $ i $. In other words, processes of the type

$$ x _ {i} = f _ {i} ( x _ {i-1} , y _ {i} \dots y _ {1} , x _ {0} ) , $$

are not considered. Both these restrictions may be weakened, but only by unduly complicating the method.

Conventional methods of mathematical analysis are altogether inapplicable to problems of dynamic programming or else involve very laborious computations. The method of dynamic programming is based on the optimality principle formulated by R. Bellman: Assume that, in controlling a discrete system $ X $, a certain control on the discrete system $ y _ {1} \dots y _ {k} $, and hence the trajectory of states $ x _ {0} \dots x _ {k} $, have already been selected, and suppose it is required to terminate the process, i.e. to select $ y _ {k+1} \dots y _ {m} $( and hence $ x _ {k+1} \dots x _ {m} $); then, if the terminal part of the process is not optimal, i.e. if the maximum of the function

$$ F _ {k} = \sum _ {i = k + 1 } ^ { m } \phi _ {i} ( x _ {i-1} , y _ {i} ) , $$

is not attained by it, the process as a whole will not be optimal.

By using the optimality principle, the basic functional relationship is readily obtained. One determines the sequence of functions of the variable $ x $:

$$ \omega _ {m} ( x) = 0 ,\ \omega _ {k-1} ( x) = \ \max _ { y } [ \phi _ {k} ( x , y ) + \omega _ {k} ( f _ {k} ( x , y ) ) ] , $$

$ k= 1 \dots m $. Here, the maximum is taken over all the control operations feasible at step $ k $. The relation defining the dependence of $ \omega _ {k-1} $ on $ \omega _ {k} $ is known as the Bellman equation. The meaning of the functions $ \omega _ {k-1} ( x) $ is clear: If, in step $ k - 1 $, the system appears to be in state $ x $, then $ \omega _ {k-1} ( x) $ is the maximum possible value of the function $ F $. Simultaneously with the construction of the functions $ \omega _ {k-1} ( x) $ one also finds the conditional optimal controls $ y _ {k} ( x) $ at each step (i.e. the values of the optimal controls under the assumption that the system is in the state $ x $ at the step $ k - 1 $). The final optimal controls are found by successively computing the values

$$ \omega _ {0} ( x _ {0} ) = F ^ { * } ,\ y _ {1} , x _ {1} \dots y _ {m} , x _ {m} . $$

The following property of the method of dynamic programming is evident from the above: It solves not merely one specific problem for a given $ x _ {0} $, but all problems of this type, for all initial states. Since the numerical realization of the method of dynamic programming is laborious, i.e. requires a large storage capacity of the computer, it is expediently employed whenever certain typical problems have to be solved a large number of times (e.g. the optimum flight conditions for an aircraft under changing weather conditions). Even though dynamic programming problems are formulated for discrete processes, the method may often be successfully employed in solving problems with continuous parameters.

Dynamic programming furnished a novel approach to many problems of variational calculus.

An important branch of dynamic programming is constituted by stochastic problems, in which the state of the system and the objective function are affected by random factors. Such problems include, for example, optimal inventory control with allowance of random inventory replenishment. Controlled Markov processes are the most natural domains of application of dynamic programming in such cases.

The method of dynamic programming was first proposed by Bellman. Rigorous foundations of the method were laid by L.S. Pontryagin and his school, who studied the mathematical theory of control process (cf. Optimal control, mathematical theory of).

Even though the method of dynamic programming considerably simplifies the initial problems, its explicit utilization is usually very laborious. Attempts are made to overcome this difficulty by developing approximation methods.

References

[1] R. Bellman, "Dynamic programming" , Princeton Univ. Press (1957)
[2] V.G. Boltyanskii, "Mathematical methods of optimal control" , Holt, Rinehart & Winston (1971) (Translated from Russian)
[3] G.F. Hadley, "Nonlinear and dynamic programming" , Addison-Wesley (1964)
[4] G. Hadley, T. Whitin, "Analysis of inventory systems" , Prentice-Hall (1963)
[5] R.A. Howard, "Dynamic programming and Markov processes" , M.I.T. (1960)

Comments

If the process to be controlled is described by a differential equation instead of a difference equation as considered above, a "continuous" version of the Bellman equation exists, but it is usually referred to as the Hamilton–Jacobi–Bellman equation (the HJB-equation). The HJB-equation is a partial differential equation. Application of the so-called "method of characteristics" solution method to this partial differential equation leads to the same differential equations as involved in the Pontryagin maximum principle. Derivation of this principle via the HJB-equation is only valid under severe restrictions. Another derivation exists, using variational principles, which is valid under weaker conditions. For a comparison of dynamic programming approaches versus the Pontryagin principle see [a1], [a2].

In the terminology of automatic control theory one could say that the dynamic programming approach leads to closed-loop solutions of the optimal control and Pontryagin's principle leads to open-loop solutions.

For some discussion of stochastic dynamic programming cf. the editorial comments to Controlled stochastic process.

References

[a1] A.E. Bryson, Y.-C. Ho, "Applied optimal control" , Ginn (1969)
[a2] W.H. Fleming, R.W. Rishel, "Deterministic and stochastic optimal control" , Springer (1975)
How to Cite This Entry:
Dynamic programming. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Dynamic_programming&oldid=11658
This article was adapted from an original article by S.A. AshmanovV.G. Karmanov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article