Difference between revisions of "Optimal synthesis control"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | A solution of a problem in the mathematical theory of optimal control (cf. [[Optimal control, mathematical theory of|Optimal control, mathematical theory of]]), consisting of a synthesis of an optimal control (a feedback synthesis) in the form of a control strategy (a feedback principle), as a function of the current state (position) of a process (see [[#References|[1]]]–[[#References|[3]]]). The value of the control is defined not only by the current time, but also by the admissible values of the current parameters. In this way the introduction of a positional strategy makes possible an a posteriori realization of a control | + | <!-- |
+ | o0685001.png | ||
+ | $#A+1 = 128 n = 0 | ||
+ | $#C+1 = 128 : ~/encyclopedia/old_files/data/O068/O.0608500 Optimal synthesis control | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
+ | |||
+ | {{TEX|auto}} | ||
+ | {{TEX|done}} | ||
+ | |||
+ | A solution of a problem in the mathematical theory of optimal control (cf. [[Optimal control, mathematical theory of|Optimal control, mathematical theory of]]), consisting of a synthesis of an optimal control (a feedback synthesis) in the form of a control strategy (a feedback principle), as a function of the current state (position) of a process (see [[#References|[1]]]–[[#References|[3]]]). The value of the control is defined not only by the current time, but also by the admissible values of the current parameters. In this way the introduction of a positional strategy makes possible an a posteriori realization of a control $ u $, | ||
+ | corrected on the basis of supplementary information obtained during the process. | ||
The simplest synthesis problem, for example, for a system | The simplest synthesis problem, for example, for a system | ||
− | + | $$ \tag{1 } | |
+ | \dot{x} = f( t, x, u),\ \ | ||
+ | t _ {0} \leq t \leq t _ {1} ,\ \ | ||
+ | x \in \mathbf R ^ {n} ,\ u \in \mathbf R ^ {p} , | ||
+ | $$ | ||
with constraints | with constraints | ||
− | + | $$ \tag{2 } | |
+ | u \in U \subseteq \mathbf R ^ {p} \ \textrm{ or } \ \ | ||
+ | \psi ( u) \leq 0,\ \psi : \mathbf R ^ {p} \rightarrow \mathbf R ^ {k} , | ||
+ | $$ | ||
+ | |||
+ | and a given "terminal" criterion | ||
+ | |||
+ | $$ | ||
+ | I( x( \cdot ), u( \cdot )) = \phi ( t _ {1} , x( t _ {1} )),\ \ | ||
+ | \phi : \mathbf R ^ {n+} 1 \rightarrow \mathbf R ^ {1} , | ||
+ | $$ | ||
− | + | assumes that a solution $ u ^ {0} $ | |
+ | is being sought to minimize the functional $ I( x( \cdot ), u( \cdot )) $ | ||
+ | among the functions of the form $ u( t, x) $ | ||
+ | for an arbitrary initial position $ \{ \tau , x \} $. | ||
+ | The natural course is to find for every pair $ \{ \tau , x \} $ | ||
+ | a solution of the corresponding problem of constructing an [[Optimal programming control|optimal programming control]] | ||
− | + | $$ | |
+ | u ^ {0} [ t \mid \tau , x],\ \ | ||
+ | x = x( \tau ),\ \ | ||
+ | \tau \leq t \leq t _ {1} , | ||
+ | $$ | ||
− | + | as a minimum of that same functional $ I( x( \cdot ), u( \cdot )) $ | |
+ | and with those same constraints. It is further supposed that | ||
− | + | $$ | |
+ | u ^ {0} ( t, x) = u ^ {0} [ t \mid t, x] ; | ||
+ | $$ | ||
− | + | if the function $ u ^ {0} ( t, x) $ | |
+ | is correctly defined, while the equation | ||
− | + | $$ \tag{3 } | |
+ | \dot{x} = f( t, x, u ^ {0} ( t, x)),\ \ | ||
+ | x( \tau ) = x,\ \ | ||
+ | \tau \leq t \leq t _ {1} , | ||
+ | $$ | ||
− | + | has a unique solution, then the synthesis problem can be solved; moreover, the optimal values of $ I $ | |
+ | found in the classes of programming and synthesis controls coincide (in general, conditions prevail which ensure the existence in a specific sense of the solutions of equation (3), and conditions also prevail which guarantee the optimality of all the trajectories of this equation). | ||
− | + | The synthesized function $ u ^ {0} ( t, x) $, | |
+ | being an optimal synthesis control, leads to an optimal solution for the minimum of the functional $ I $ | ||
+ | in the problem of optimal control for any initial position $ \{ \tau , x \} $. | ||
+ | This is in contrast to an optimal programming control, which in general depends on the fixed starting point $ \{ t _ {0} , x ^ {0} \} $ | ||
+ | of the process. The solution of an optimal control problem in the form of an optimal synthesis control has many applications, especially in practical procedures for implementing the optimal control in the presence of limited information or perturbations in the dynamics. In these situations a synthesis control is preferable to a programming control. | ||
− | + | The search for $ u ^ {0} ( t, x) $ | |
+ | in the form of a function of the current state is immediately linked to [[Dynamic programming|dynamic programming]] (see [[#References|[2]]]). The return function (Bellman function, value function) $ V( \tau , x) $, | ||
+ | being introduced as a minimum (maximum) of a quantity to be optimized (for example, the functional | ||
− | + | $$ \tag{4 } | |
+ | J( x( \cdot ), u( \cdot )) = \ | ||
+ | \int\limits _ \tau ^ { {t _ 1} } f ^ { 0 } ( t, x, u) dt + \phi ( t _ {1} , x( t _ {1} )) | ||
+ | $$ | ||
− | + | for the system (1) if $ x( \tau ) = x, t \in [ \tau , t _ {1} ] $), | |
+ | must satisfy the [[Bellman equation|Bellman equation]] with boundary conditions depending on the aim of the control and $ J $. | ||
+ | For the system (1), (2) and (4), this equation takes the form | ||
− | + | $$ \tag{5 } | |
− | + | \frac{\partial V }{\partial t } | |
+ | + H \left ( t, x, | ||
+ | \frac{\partial | ||
+ | V }{\partial x } | ||
− | + | \right ) = 0,\ \ | |
+ | V( t _ {1} , x) = \phi ( t _ {1} , x), | ||
+ | $$ | ||
where | where | ||
− | + | $$ \tag{6 } | |
+ | H \left ( t, x, | ||
+ | \frac{\partial V }{\partial x } | ||
+ | \right ) = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | \min \left \{ \left ( | ||
+ | \frac{\partial V }{\partial x } | ||
+ | , f( t, x, u) | ||
+ | \right ) + f ^ { 0 } ( t, x, u) : u \in U \right \} | ||
+ | $$ | ||
is the Hamilton function. This equation is connected with the equations figuring in the conditions of the [[Pontryagin maximum principle|Pontryagin maximum principle]], in the same way as the Hamilton–Jacobi equation for a return function is linked in analytical mechanics to the ordinary Hamiltonian differential equations (see [[Variational principles of classical mechanics|Variational principles of classical mechanics]]). | is the Hamilton function. This equation is connected with the equations figuring in the conditions of the [[Pontryagin maximum principle|Pontryagin maximum principle]], in the same way as the Hamilton–Jacobi equation for a return function is linked in analytical mechanics to the ordinary Hamiltonian differential equations (see [[Variational principles of classical mechanics|Variational principles of classical mechanics]]). | ||
Line 47: | Line 118: | ||
The derivation of equations (5) for the synthesis problem relies on the optimality principle asserting that a section of an optimal trajectory is also an optimal trajectory (see [[#References|[2]]]). The viability of this approach depends on the correct definition of the informational properties of the process, particularly on the concept of position (current state, see [[#References|[5]]]). | The derivation of equations (5) for the synthesis problem relies on the optimality principle asserting that a section of an optimal trajectory is also an optimal trajectory (see [[#References|[2]]]). The viability of this approach depends on the correct definition of the informational properties of the process, particularly on the concept of position (current state, see [[#References|[5]]]). | ||
− | In the time-optimal control problem — regarding the minimal time | + | In the time-optimal control problem — regarding the minimal time $ T( x) $ |
+ | for a trajectory of an autonomous system (1) to hit a set $ M $, | ||
+ | starting from a position $ x $— | ||
+ | the function $ V( \tau , x) = V( x) $ | ||
+ | can be considered as a particular kind of potential $ V( x) = T( x) $ | ||
+ | with respect to $ M $. | ||
+ | The choice of an optimal control $ u ^ {0} ( t, x) $ | ||
+ | from conditions (5), (6) now has the form | ||
+ | |||
+ | $$ | ||
+ | \min \left \{ {\left ( | ||
+ | \frac{\partial V }{\partial x } | ||
− | + | , f( x, u) \right ) } : {u \in U | |
+ | } \right \} | ||
+ | = - 1, | ||
+ | $$ | ||
− | + | $$ | |
+ | V( x) = 0 \ \textrm{ when } x \in M, | ||
+ | $$ | ||
− | which means that | + | which means that $ u ^ {0} ( t, x) = u ^ {0} ( x) $ |
+ | realizes the descent of the optimal trajectory $ x ^ {0} ( t) $ | ||
+ | relative to the level surfaces of the function $ V( x) $ | ||
+ | by the fastest method permitted by the condition $ u \in U $. | ||
− | The use of the method of dynamic programming (as a sufficient condition of optimality) will be rigorous if the function | + | The use of the method of dynamic programming (as a sufficient condition of optimality) will be rigorous if the function $ V( \tau , x) $ |
+ | satisfies certain smoothness conditions everywhere (for example, in problems (3)–(6) the function $ V( \tau , x) $ | ||
+ | must be continuously differentiable) or if smoothness conditions are satisfied everywhere with the exception of a "special" set $ N $. | ||
+ | When certain special "conditions of regular synthesis" are fulfilled, the method of dynamic programming is equivalent to Pontryagin's principle, which is then seen to be a necessary and sufficient condition for optimality (see [[#References|[8]]]). Difficulties connected with the a priori verification of the applicability of the method of dynamic programming and with the need to solve the Bellman equation complicate the use of this method. The method of dynamic programming has been extended to problems of optimal synthesis control for discrete (multi-stage) systems, where the corresponding Bellman equation is a finite-difference equation (see [[#References|[2]]], [[#References|[9]]]). | ||
− | In problems of optimal control with differential constraints, the method of dynamic programming gives an effective solution of the synthesis problem in closed form for a class of problems which embraces linear systems with quadratic performance criterion (4) (the functions | + | In problems of optimal control with differential constraints, the method of dynamic programming gives an effective solution of the synthesis problem in closed form for a class of problems which embraces linear systems with quadratic performance criterion (4) (the functions $ f ^ { 0 } $, |
+ | $ \phi $ | ||
+ | are positive-definite quadratic forms in $ x, u $ | ||
+ | and in $ x $, | ||
+ | respectively). This problem, related to the analytic construction of an optimal regulator if $ \phi ( t, x) \equiv 0 $, | ||
+ | $ t _ {1} = \infty $, | ||
+ | becomes a problem of optimal stabilization of the system (the property of asymptotic stability of the equilibrium position of a synthesized system follows directly from the existence of an admissible control) (see [[#References|[10]]], [[#References|[4]]]). The existence of a solution in the given instance is ensured by the property of stabilizability of the system (see [[#References|[4]]]). For linear stationary and periodic systems it is equivalent to the property of controllability of the unstable models of the system (see [[Optimal programming control|Optimal programming control]]). | ||
− | The solution of the problem of optimal stabilization has shown that the corresponding Bellman function is at the same time the | + | The solution of the problem of optimal stabilization has shown that the corresponding Bellman function is at the same time the "optimal" Lyapunov function for the initial system with an obtained optimal control. Under these circumstances effective conditions of controllability have been obtained and a complete analogue of Lyapunov's theory of stability (in a first approximation and in critical cases) for problems of stabilization has been created which embraces ordinary quasi-linear and periodic systems and also delay-systems. In the latter case, the role of the Bellman function is played by "optimal" Lyapunov–Krasovskii functionals, given on the sections of the trajectory that correspond to the value of the delay in the system (see [[#References|[4]]], [[#References|[5]]]). The theory of linear, quadratic problems of optimal control is also well-developed for partial differential equations (see [[#References|[11]]]). |
− | In applied problems of optimal synthesis control it is not always possible to measure all phase coordinates of the system. The following problem of observation therefore arises, permitting numerous generalizations: Knowing the realization | + | In applied problems of optimal synthesis control it is not always possible to measure all phase coordinates of the system. The following problem of observation therefore arises, permitting numerous generalizations: Knowing the realization $ y[ t] \in \mathbf R ^ {m} $ |
+ | for an interval $ \sigma \leq t \leq \theta $ | ||
+ | of an accessible measurement of the function $ y = g( t, x) $ | ||
+ | in the coordinates of the system (1) (if $ u( t) $ | ||
+ | is known, for example, if $ u( t) \equiv 0 $ | ||
+ | and $ m \leq n $), | ||
+ | find the vector $ x( \theta ) \in \mathbf R ^ {n} $ | ||
+ | at the given moment $ \theta $. | ||
+ | Systems which, through a unique realization $ y[ t] $, | ||
+ | allow one to establish $ x( \theta ) $, | ||
+ | whatever its value, are called completely observable. | ||
− | The property of complete observability, as well as the construction of the corresponding analytic operations that distinguish | + | The property of complete observability, as well as the construction of the corresponding analytic operations that distinguish $ x( \theta ) $ |
+ | and the optimization of these operations, have been well studied for linear systems. Here, a duality principle is known: For every problem of observation, a corresponding equivalent two-point boundary value control problem for the dual system can be established. A consequence of this is that the property of complete observability of a linear system coincides with the property of complete controllability of the dual system with control. Moreover, it turns out that the corresponding dual boundary value problems of optimal observation and optimal control can also be composed in such a way that their solutions coincide (see [[#References|[3]]]). The properties of controllability and observability of linear systems have many generalizations to linear infinite-dimensional systems (equations in a Banach space, systems with deviating argument, partial differential equations). There is also a number of results characterizing the corresponding properties. For non-linear systems, only some local theorems on observability are known. Solutions of the problem of observation have found numerous applications in synthesis problems with incomplete information on coordinates, among them problems of optimal stabilization (see [[#References|[3]]]–[[#References|[5]]], [[#References|[14]]], [[#References|[15]]]). | ||
The problem of synthesis control becomes especially interesting when information on the controls of the controllable process, the initial conditions and the current parameters is subject to uncertainty (perturbations). If the description of this uncertainty has a statistical character, the problems of optimal control are examined within the framework of the theory of stochastic optimal control. This theory, arising from the solution of stochastic problems [[#References|[16]]], has been, to a very large degree, developed for systems of the form | The problem of synthesis control becomes especially interesting when information on the controls of the controllable process, the initial conditions and the current parameters is subject to uncertainty (perturbations). If the description of this uncertainty has a statistical character, the problems of optimal control are examined within the framework of the theory of stochastic optimal control. This theory, arising from the solution of stochastic problems [[#References|[16]]], has been, to a very large degree, developed for systems of the form | ||
− | + | $$ \tag{7 } | |
+ | \dot{x} = f( t, x, u) + g( t, x, u) \eta ,\ \ | ||
+ | x( t _ {0} ) = x ^ {0} , | ||
+ | $$ | ||
− | with random perturbations | + | with random perturbations $ \eta ( t) $ |
+ | described by Gaussian diffusion processes or by more general classes of Markov processes (the initial vector is also usually taken random). In these circumstances, as a rule, it is assumed that certain probability characteristics of the variable $ \eta $ | ||
+ | are given (for example, information on the moments of the corresponding distributions or on the parameters of the stochastic equations describing the evolution of the process $ \eta ( t) $). | ||
− | Generally, the use of programming and synthesis controls gives essentially different values of the optimal performance indices | + | Generally, the use of programming and synthesis controls gives essentially different values of the optimal performance indices $ J $( |
+ | the roles of these indices can be played, for instance, by average estimates of non-negative functionals defined on trajectories of the process). The problem of synthesis of a stochastic optimal synthesis control now has clear advantages, since the continuous measurement of coordinates of the system enables one to correct the movement with regard to the real course of the random process, not predicted earlier. The method of dynamic programming combined with the theory of generating operators for the Markov semi-groups associated with stochastic processes, has led to sufficient conditions for optimality. This has led to the solution of a number of problems of stochastic optimal control on finite or infinite time intervals, including those with complete and incomplete information on current coordinates, of stochastic problems of pursuit, etc. It is essential that, for the principle of optimality to apply, the control $ u $ | ||
+ | exists at every moment of time $ t $ | ||
+ | as a function of "sufficient coordinates" $ z $ | ||
+ | of the process which are known to have the Markov property (see [[#References|[5]]], [[#References|[6]]], [[#References|[17]]], [[#References|[18]]]). | ||
It is in this way, in particular, that the theory of optimal stochastic stabilization has been developed, in conjunction with the corresponding Lyapunov theory of stability, for stochastic systems [[#References|[19]]]. | It is in this way, in particular, that the theory of optimal stochastic stabilization has been developed, in conjunction with the corresponding Lyapunov theory of stability, for stochastic systems [[#References|[19]]]. | ||
− | For the formulation of an optimal synthesis controller as well as for other aims of control, it is usual to evaluate the state of a stochastic system by means of measurements. The theory of stochastic filtering is about the solution of this question, given the condition that the measurement process is disturbed by probabilistic | + | For the formulation of an optimal synthesis controller as well as for other aims of control, it is usual to evaluate the state of a stochastic system by means of measurements. The theory of stochastic filtering is about the solution of this question, given the condition that the measurement process is disturbed by probabilistic "noises" . The most complete solutions known here are for linear systems with quadratic optimality criteria (the so-called Kalman–Bucy filter, see [[#References|[13]]]). In applying this theory to the problem of stochastic optimal synthesis control, conditions have been developed which ensure the validity of the separation principle, allowing the problem of control to be solved independently of the problem of evaluating current positions on the basis of sufficient coordinates of the process (see [[#References|[20]]]; [[#References|[18]]] and [[#References|[21]]] are dedicated to more general procedures of stochastic filtering, as well as to problems of a stochastic optimal control when the control itself is selected from the class of Markov diffusion processes). |
A strictly formalized solution of the problem of stochastic optimal control is invariably coupled with the problem of a correct foundation for the existence questions of solutions for the corresponding stochastic differential equations. The latter circumstance generated specific difficulties in the solution of problems of stochastic optimal control when non-classical constraints are applied. | A strictly formalized solution of the problem of stochastic optimal control is invariably coupled with the problem of a correct foundation for the existence questions of solutions for the corresponding stochastic differential equations. The latter circumstance generated specific difficulties in the solution of problems of stochastic optimal control when non-classical constraints are applied. | ||
Line 83: | Line 202: | ||
Let there be given a system | Let there be given a system | ||
− | + | $$ \tag{8 } | |
+ | \dot{x} = f( t , x , u , w),\ \ | ||
+ | t _ {0} \leq t \leq t _ {1} , | ||
+ | $$ | ||
with constraints | with constraints | ||
− | + | $$ | |
+ | x ^ {0} = x( t _ {0} ) \in X ^ {0} \subseteq \mathbf R | ||
+ | ^ {n} ,\ \ | ||
+ | u \in U \subseteq \mathbf R ^ {p} ,\ \ | ||
+ | w \in W \subseteq \mathbf R ^ {q} , | ||
+ | $$ | ||
− | on the initial vector | + | on the initial vector $ x ^ {0} $, |
+ | the control $ u $ | ||
+ | and the disturbances $ w $. | ||
+ | Unlike the case of a coalition of players, represented by the initial control $ u $ | ||
+ | subject to definition, the disturbances $ w $ | ||
+ | are in this case treated as controls of an opponent player, and one is allowed to examine any strategies $ w $ | ||
+ | formed from any admissible information. Moreover, the aims of control can be formulated from the point of view of each one of the players separately. If the stated aims are contradictory, then the problem of conflicting control arises. Research into problems of synthesis control under conditions of conflict or uncertainty is the subject of the theory of [[Differential games|differential games]]. | ||
− | The process of forming an optimal synthesis control under conditions of uncertainty can also be complicated by incomplete information on the current state. So, in the system (8), only the results of indirect measurements of the phase vector | + | The process of forming an optimal synthesis control under conditions of uncertainty can also be complicated by incomplete information on the current state. So, in the system (8), only the results of indirect measurements of the phase vector $ x $ |
+ | and the realization $ y[ t] $ | ||
+ | of the function | ||
− | + | $$ \tag{9 } | |
+ | y( t) = g( t, x, \xi ) | ||
+ | $$ | ||
− | can be accessible; here the indefinite parameters | + | can be accessible; here the indefinite parameters $ \xi $ |
+ | are restricted by an a priori known constraint, $ \xi \in {\mathsf E} $. | ||
+ | The values $ y[ t] $, | ||
+ | $ t _ {0} \leq t \leq \theta $( | ||
+ | for a given $ u( t) $), | ||
+ | make it possible to construct a region of information $ X( \theta , y( \cdot )) $ | ||
+ | of states of the system (8) in the phase space, along with the realization $ y[ t] $, | ||
+ | equation (9) and the restriction on $ w, \xi $. | ||
+ | Among the elements of $ X( \theta , y( \cdot )) = X( \theta , \cdot ) $ | ||
+ | there will be also an unknown true state of the system (8), which can be estimated by choosing a point $ x ^ \star ( \theta , \cdot ) $ | ||
+ | from $ X( \theta , \cdot ) $( | ||
+ | for example, the "centre of gravity" or the "Chebyshev centre" of $ X( \theta , \cdot ) $). | ||
+ | The study of the evolution of the regions $ X( \theta , \cdot ) $ | ||
+ | and the dynamics of the vectors $ x ^ \star ( \theta , \cdot ) $ | ||
+ | is the purpose of the theory of minimax filtering. Most complete solutions are known for linear systems and convex constraints (see [[#References|[22]]]). | ||
− | In general, the choice of a synthesis strategy of optimal control under conditions of uncertainty (for example, in the form of a functional | + | In general, the choice of a synthesis strategy of optimal control under conditions of uncertainty (for example, in the form of a functional $ u = u( t, X( t, \cdot )) $) |
+ | must aim at control of the evolution of the domains $ X( \theta , \cdot ) $( | ||
+ | i.e. the alternation of their configuration and their displacement in space), in accordance with prescribed criteria. For the problem shown, a number of general qualitative results is known, as well as constructive solutions in the class of special linear, convex problems (see [[#References|[7]]], [[#References|[22]]]). Furthermore, information containing measurements (for example, of the functions $ y[ t] $ | ||
+ | in the systems (8) and (9)), permits an a posteriori re-evaluation during the process of the domain of admissible values of the indefinite parameters in the direction of their constraint. In this way, the problem of identification of a mathematical model of a process (for example, the parameters $ w $ | ||
+ | in equation (8)), is solved at the same time. All that has been said enables one to treat the solutions of the problem of optimal synthesis control under conditions of uncertainty as a procedure of adaptive optimal control, in which a more precise definition of the properties of the model of the process gets mixed up with the choice of the controls. The questions of identification of models of dynamical processes and of the problem of adaptive optimal control are studied in detail under the assumption of existence of a probabilistic description of the indefinite parameters (see [[#References|[23]]], [[#References|[24]]]). | ||
− | If in problems of optimal synthesis control under conditions of uncertainty the parameters | + | If in problems of optimal synthesis control under conditions of uncertainty the parameters $ w $, |
+ | $ \xi $ | ||
+ | are treated as "controls" of a fictitious opponent player, then the aims of the controls $ u $ | ||
+ | and $ \{ w, \xi \} $ | ||
+ | can be different. The latter instance leads to a non-scalar quality criterion of the process. A consequence of this is that the corresponding problems can be considered within the framework of the concepts of equilibrium situations peculiar to the multi-criterion problems of the theory of cooperative games and their generalizations. | ||
====References==== | ====References==== | ||
− | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> | + | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> L.S. Pontryagin, V.G. Boltayanskii, R.V. Gamkrelidze, E.F. Mishchenko, "The mathematical theory of optimal processes" , Wiley (1967) (Translated from Russian) {{MR|0836572}} {{MR|0719372}} {{MR|0186436}} {{MR|1532457}} {{MR|0166037}} {{ZBL|0882.01027}} {{ZBL|0516.49001}} {{ZBL|0117.31702}} {{ZBL|0102.32001}} </TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> R. Bellman, "Dynamic programming" , Princeton Univ. Press (1957) {{MR|0093619}} {{MR|0091224}} {{MR|0091222}} {{MR|0090499}} {{MR|0090477}} {{MR|0090469}} {{MR|0089099}} {{MR|0088666}} {{MR|0085149}} {{MR|0083675}} {{ZBL|0081.14402}} {{ZBL|0079.36202}} {{ZBL|0079.35202}} {{ZBL|0079.34301}} {{ZBL|0078.09502}} {{ZBL|0077.32402}} {{ZBL|0077.13605}} {{ZBL|0995.90618}} </TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> N.N. Krasovskii, "Theory of control by motion" , Moscow (1968) (In Russian)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> N.N. Krasovskii, "On the stabilization of dynamic systems by supplementary forces" ''Diff. Eq.'' , '''1''' : 1 (1965) pp. 1–9 ''Differentsial'nye Uravneniya'' , '''1''' : 1 (1963) pp. 5–16 {{MR|192934}} {{ZBL|}} </TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> N.N. Krasovskii, "Theory of optimal control systems" , ''Mechanics in the USSR during 50 years'' , '''1''' , Moscow (1968) pp. 179–244 (In Russian) {{MR|0238147}} {{ZBL|}} </TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top"> N.N. Krasovskii, "On mean-square optimum stabilization at damped random perturbations" ''J. Appl. Math. Mech.'' , '''25''' (1961) pp. 1212–1227 ''Prikl. Mat. Mekh.'' , '''25''' : 5 (1961) pp. 806–817 {{MR|0136838}} {{ZBL|}} </TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top"> N.N. Krasovaskii, A.I. Subbotin, "Game-theoretical control problems" , Springer (1988) (Translated from Russian) {{MR|918771}} {{ZBL|}} </TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top"> V.G. Boltyanskii, "Mathematical methods of optimal control" , Holt, Rinehart & Winston (1971) (Translated from Russian) {{MR|0353081}} {{ZBL|}} </TD></TR><TR><TD valign="top">[9]</TD> <TD valign="top"> V.G. Boltyanskii, "Optimal control of discrete systems" , Wiley (1978) (Translated from Russian) {{MR|0528636}} {{MR|0514558}} {{ZBL|}} </TD></TR><TR><TD valign="top">[10]</TD> <TD valign="top"> A.M. Letov, "Mathematical theory of control processes" , Moscow (1981) (In Russian) {{MR|0615390}} {{ZBL|}} </TD></TR><TR><TD valign="top">[11]</TD> <TD valign="top"> J.-L. Lions, "Optimal control of systems governed by partial differential equations" , Springer (1971) (Translated from French) {{MR|0271512}} {{ZBL|0203.09001}} </TD></TR><TR><TD valign="top">[12]</TD> <TD valign="top"> R.E. Kalman, "On the general theory of control systems" , ''Proc. 1-st Internat. Congress Internat. Fed. Autom. Control'' , '''2''' , Moscow (1960) pp. 521–547</TD></TR><TR><TD valign="top">[13]</TD> <TD valign="top"> R. Kalman, R. Bucy, "New results in linear filtering and prediction theory" ''Proc. Amer. Soc. Mech. Engineers Ser. 1.D'' , '''83''' (1961) pp. 95–108 {{MR|0234760}} {{ZBL|}} </TD></TR><TR><TD valign="top">[14]</TD> <TD valign="top"> E.B. Lee, L. Marcus, "Foundations of optimal control theory" , Wiley (1967) {{MR|0220537}} {{ZBL|0159.13201}} </TD></TR><TR><TD valign="top">[15]</TD> <TD valign="top"> A.G. Butkovskii, "Structural theory of distributed systems" , Horwood (1983) (Translated from Russian)</TD></TR><TR><TD valign="top">[16]</TD> <TD valign="top"> A.N. Kolmogorov, E.F. Mishchenko, L.S. Pontryagin, "A probability problem of optimal control" ''Soviet Math. Dokl.'' , '''3''' : 4 (1962) pp. 1143–1145 ''Dokl. Akad. Nauk SSSR'' , '''145''' : 5 (1962) pp. 993–995 {{MR|0183574}} {{ZBL|0124.33803}} </TD></TR><TR><TD valign="top">[17]</TD> <TD valign="top"> R.S. Liptser, A.N. Shiryaev, "Statistics of random processes" , '''1–2''' , Springer (1977–1978) (Translated from Russian) {{MR|1800858}} {{MR|1800857}} {{MR|0608221}} {{MR|0488267}} {{MR|0474486}} {{ZBL|1008.62073}} {{ZBL|1008.62072}} {{ZBL|0556.60003}} {{ZBL|0369.60001}} {{ZBL|0364.60004}} </TD></TR><TR><TD valign="top">[18]</TD> <TD valign="top"> K.J. Åström, "Introduction to stochastic control theory" , Acad. Press (1970) {{MR|0270799}} {{ZBL|0226.93027}} </TD></TR><TR><TD valign="top">[19]</TD> <TD valign="top"> I.Ya. Kats, N.N. Krasovskii, "On the stability of systems with random parameters" ''J. Appl. Math. Mech.'' , '''24''' (1960) pp. 1225–1246 ''Prikl. Mat. Mekh.'' , '''24''' : 5 (1960) pp. 809–823 {{MR|}} {{ZBL|0103.36403}} </TD></TR><TR><TD valign="top">[20]</TD> <TD valign="top"> W.M. Wonham, "On the separation theorem of stochastic control" ''SIAM J. Control'' , '''6''' (1968) pp. 312–326 {{MR|0237219}} {{ZBL|0164.19101}} </TD></TR><TR><TD valign="top">[21]</TD> <TD valign="top"> N.V. Krylov, "Controlled diffusion processes" , Springer (1980) (Translated from Russian) {{MR|0601776}} {{ZBL|0459.93002}} {{ZBL|0436.93055}} </TD></TR><TR><TD valign="top">[22]</TD> <TD valign="top"> A.B. Kurzhanskii, "Control and observability under conditions of uncertainty" , Moscow (1977) (In Russian)</TD></TR><TR><TD valign="top">[23]</TD> <TD valign="top"> Ya.Z. Tsypkin, "Foundations of the theory of learning systems" , Acad. Press (1973) (Translated from Russian) {{MR|0434545}} {{ZBL|0258.93019}} </TD></TR><TR><TD valign="top">[24]</TD> <TD valign="top"> P. Eikhoff, "Basics of identification of control systems" , Moscow (1975) (In Russian; translated from English)</TD></TR></table> |
− | |||
− | |||
====Comments==== | ====Comments==== | ||
Line 111: | Line 268: | ||
For a detailed discussion of when optimal open-loop controls can be used to find optimal closed-loop controls see [[#References|[a11]]], [[#References|[a12]]]. | For a detailed discussion of when optimal open-loop controls can be used to find optimal closed-loop controls see [[#References|[a11]]], [[#References|[a12]]]. | ||
− | In the formulation of an optimal control problem one distinguishes problems with a terminal index, with an integral index or with a combination of both such, as expressed by equation (4). Instead of | + | In the formulation of an optimal control problem one distinguishes problems with a terminal index, with an integral index or with a combination of both such, as expressed by equation (4). Instead of "index" , and depending on the particular application, one also speaks about "cost function" (to be minimized) or "performance index" (usually to be maximized). |
In general, analytic solutions to optimal control problems do not exist. A noteable exception is the case where the system is described by a linear equation: | In general, analytic solutions to optimal control problems do not exist. A noteable exception is the case where the system is described by a linear equation: | ||
− | + | $$ | |
+ | \dot{x} = A x + B u ,\ t _ {0} \leq t \leq t _ {f} ,\ \ | ||
+ | x ( t _ {0} ) = x _ {0} ; | ||
+ | $$ | ||
and the cost function by a quadratic equation: | and the cost function by a quadratic equation: | ||
− | + | $$ | |
+ | J = | ||
+ | \frac{1}{2} | ||
+ | x ^ \prime ( t _ {f} ) | ||
+ | Q _ {f} x ( t _ {f} ) + | ||
+ | \frac{1}{2} | ||
+ | |||
+ | \int\limits _ { t _ {0} } ^ { {t _ f } } ( x ^ \prime Q x + | ||
+ | u ^ \prime R u ) d t . | ||
+ | $$ | ||
− | Here | + | Here $ x \in \mathbf R ^ {n} $, |
+ | $ u \in \mathbf R ^ {p} $, | ||
+ | $ A $, | ||
+ | $ B $, | ||
+ | $ Q _ {f} $, | ||
+ | $ Q $, | ||
+ | and $ R $ | ||
+ | are matrices of appropriate sizes. Moreover, $ Q _ {f} \geq 0 $, | ||
+ | $ Q \geq 0 $, | ||
+ | $ R > 0 $, | ||
+ | and the transpose is denoted by $ {} ^ \prime $. | ||
+ | The final time is supposed to be fixed here. The solution to this optimal control problem is | ||
− | + | $$ | |
+ | u ^ {0} ( x , t ) = - R ^ {-} 1 B ^ \prime P ( t) x , | ||
+ | $$ | ||
− | where the | + | where the $ ( n \times n) $- |
+ | matrix $ P ( t) $ | ||
+ | satisfies the so-called Riccati equation; | ||
− | + | $$ | |
+ | \dot{P} = - A ^ \prime P - P A + P B R ^ {-} 1 | ||
+ | B ^ \prime P - Q ,\ P ( t _ {f} ) = Q _ {f} . | ||
+ | $$ | ||
− | If the pair | + | If the pair $ ( A , B ) $ |
+ | is controllable and the pair $ ( A , C ) $ | ||
+ | is observable, where the $ ( n \times n) $- | ||
+ | matrix $ C $ | ||
+ | is defined by $ C ^ \prime C = Q $, | ||
+ | then $ \lim\limits _ {t _ {f} \rightarrow \infty } P ( t _ {0} ) $ | ||
+ | exists; it will be denoted by $ \overline{P}\; $. | ||
+ | The $ x = 0 $ | ||
+ | solution of $ \dot{x} = ( A - R ^ {-} 1 B ^ \prime \overline{P}\; B ) x $ | ||
+ | is asymptotically stable. The conditions on controllability and observability can be replaced by the weaker conditions on stabilizability and detectability, respectively; see [[#References|[a13]]]. The notions of controllability, observability, etc. are properties of the system and as such belong to the field of mathematical system theory. | ||
− | Another class of problems of which the features of the optimal control function | + | Another class of problems of which the features of the optimal control function $ u ^ {0} ( x , t ) $ |
+ | are well understood is the class of linear time-optimal control problems. The notion of reachability set helps visualizing the optimal solution; see [[#References|[a14]]]. | ||
====References==== | ====References==== | ||
− | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> | + | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> W.H. Fleming, R.W. Rishel, "Deterministic and stochastic control" , Springer (1975) {{MR|454768}} {{ZBL|0323.49001}} </TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> D.P. Bertsekas, S.E. Shreve, "Stochastic optimal control: the discrete-time case" , Acad. Press (1978) {{MR|}} {{ZBL|0471.93002}} </TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top"> D.P. Bertsekas, "Dynamic programming and stochastic control" , Acad. Press (1976) {{MR|0688509}} {{ZBL|0549.93064}} </TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top"> M.H.A. Davis, "Martingale methods in stochastic control" , ''Stochastic Control and Stochastic Differential Systems'' , ''Lect. notes in control and inform. sci.'' , '''16''' , Springer (1979) pp. 85–117 {{MR|0547467}} {{ZBL|0409.93052}} </TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top"> L. Cesari, "Optimization - Theory and applications" , Springer (1983) {{MR|0688142}} {{ZBL|0506.49001}} </TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top"> L.W. Neustadt, "Optimization, a theory of necessary conditions" , Princeton Univ. Press (1976) {{MR|}} {{ZBL|0353.49003}} </TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top"> V. Barbu, G. Da Prato, "Hamilton–Jacobi equations in Hilbert spaces" , Pitman (1983) {{MR|0704182}} {{ZBL|0508.34001}} {{ZBL|0471.49026}} </TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top"> H.J. Kushner, "Introduction to stochastic control" , Holt (1971) {{MR|0280248}} {{ZBL|0293.93018}} </TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top"> P.R. Kumar, P. Varaiya, "Stochastic systems: estimation, identification and adaptive control" , Prentice-Hall (1986) {{MR|}} {{ZBL|0706.93057}} </TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top"> L. Ljung, "System identification theory for the user" , Prentice-Hall (1987) {{MR|1157156}} {{ZBL|0615.93004}} </TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top"> P. Brunovsky, "On the structure of optimal feedback systems" , ''Proc. Internat. Congress Mathematicians (Helsinki, 1978)'' , '''2''' , Acad. Sci. Fennicae (1980) pp. 841–846 {{MR|0562697}} {{ZBL|0425.49019}} </TD></TR><TR><TD valign="top">[a12]</TD> <TD valign="top"> H.J. Sussmann, "Analytic stratifications and control theory" , ''Proc. Internat. Congress Mathematicians (Helsinki, 1978)'' , '''2''' , Acad. Sci. Fennicae (1980) pp. 865–871 {{MR|0562701}} {{ZBL|0499.93023}} </TD></TR><TR><TD valign="top">[a13]</TD> <TD valign="top"> H. Kuakernaek, R. Sivan, "Linear optimal control systems" , Wiley (1972) {{MR|406607}} {{ZBL|}} </TD></TR><TR><TD valign="top">[a14]</TD> <TD valign="top"> H. Hermes, J.P. Lasalle, "Functional analysis and time optimal control" , Acad. Press (1969) {{MR|0420366}} {{ZBL|0203.47504}} </TD></TR><TR><TD valign="top">[a15]</TD> <TD valign="top"> A.E. Bryson, Y.-C. Ho, "Applied optimal control" , Ginn (1969) {{MR|0446628}} {{ZBL|}} </TD></TR></table> |
Latest revision as of 08:04, 6 June 2020
A solution of a problem in the mathematical theory of optimal control (cf. Optimal control, mathematical theory of), consisting of a synthesis of an optimal control (a feedback synthesis) in the form of a control strategy (a feedback principle), as a function of the current state (position) of a process (see [1]–[3]). The value of the control is defined not only by the current time, but also by the admissible values of the current parameters. In this way the introduction of a positional strategy makes possible an a posteriori realization of a control $ u $,
corrected on the basis of supplementary information obtained during the process.
The simplest synthesis problem, for example, for a system
$$ \tag{1 } \dot{x} = f( t, x, u),\ \ t _ {0} \leq t \leq t _ {1} ,\ \ x \in \mathbf R ^ {n} ,\ u \in \mathbf R ^ {p} , $$
with constraints
$$ \tag{2 } u \in U \subseteq \mathbf R ^ {p} \ \textrm{ or } \ \ \psi ( u) \leq 0,\ \psi : \mathbf R ^ {p} \rightarrow \mathbf R ^ {k} , $$
and a given "terminal" criterion
$$ I( x( \cdot ), u( \cdot )) = \phi ( t _ {1} , x( t _ {1} )),\ \ \phi : \mathbf R ^ {n+} 1 \rightarrow \mathbf R ^ {1} , $$
assumes that a solution $ u ^ {0} $ is being sought to minimize the functional $ I( x( \cdot ), u( \cdot )) $ among the functions of the form $ u( t, x) $ for an arbitrary initial position $ \{ \tau , x \} $. The natural course is to find for every pair $ \{ \tau , x \} $ a solution of the corresponding problem of constructing an optimal programming control
$$ u ^ {0} [ t \mid \tau , x],\ \ x = x( \tau ),\ \ \tau \leq t \leq t _ {1} , $$
as a minimum of that same functional $ I( x( \cdot ), u( \cdot )) $ and with those same constraints. It is further supposed that
$$ u ^ {0} ( t, x) = u ^ {0} [ t \mid t, x] ; $$
if the function $ u ^ {0} ( t, x) $ is correctly defined, while the equation
$$ \tag{3 } \dot{x} = f( t, x, u ^ {0} ( t, x)),\ \ x( \tau ) = x,\ \ \tau \leq t \leq t _ {1} , $$
has a unique solution, then the synthesis problem can be solved; moreover, the optimal values of $ I $ found in the classes of programming and synthesis controls coincide (in general, conditions prevail which ensure the existence in a specific sense of the solutions of equation (3), and conditions also prevail which guarantee the optimality of all the trajectories of this equation).
The synthesized function $ u ^ {0} ( t, x) $, being an optimal synthesis control, leads to an optimal solution for the minimum of the functional $ I $ in the problem of optimal control for any initial position $ \{ \tau , x \} $. This is in contrast to an optimal programming control, which in general depends on the fixed starting point $ \{ t _ {0} , x ^ {0} \} $ of the process. The solution of an optimal control problem in the form of an optimal synthesis control has many applications, especially in practical procedures for implementing the optimal control in the presence of limited information or perturbations in the dynamics. In these situations a synthesis control is preferable to a programming control.
The search for $ u ^ {0} ( t, x) $ in the form of a function of the current state is immediately linked to dynamic programming (see [2]). The return function (Bellman function, value function) $ V( \tau , x) $, being introduced as a minimum (maximum) of a quantity to be optimized (for example, the functional
$$ \tag{4 } J( x( \cdot ), u( \cdot )) = \ \int\limits _ \tau ^ { {t _ 1} } f ^ { 0 } ( t, x, u) dt + \phi ( t _ {1} , x( t _ {1} )) $$
for the system (1) if $ x( \tau ) = x, t \in [ \tau , t _ {1} ] $), must satisfy the Bellman equation with boundary conditions depending on the aim of the control and $ J $. For the system (1), (2) and (4), this equation takes the form
$$ \tag{5 } \frac{\partial V }{\partial t } + H \left ( t, x, \frac{\partial V }{\partial x } \right ) = 0,\ \ V( t _ {1} , x) = \phi ( t _ {1} , x), $$
where
$$ \tag{6 } H \left ( t, x, \frac{\partial V }{\partial x } \right ) = $$
$$ = \ \min \left \{ \left ( \frac{\partial V }{\partial x } , f( t, x, u) \right ) + f ^ { 0 } ( t, x, u) : u \in U \right \} $$
is the Hamilton function. This equation is connected with the equations figuring in the conditions of the Pontryagin maximum principle, in the same way as the Hamilton–Jacobi equation for a return function is linked in analytical mechanics to the ordinary Hamiltonian differential equations (see Variational principles of classical mechanics).
The derivation of equations (5) for the synthesis problem relies on the optimality principle asserting that a section of an optimal trajectory is also an optimal trajectory (see [2]). The viability of this approach depends on the correct definition of the informational properties of the process, particularly on the concept of position (current state, see [5]).
In the time-optimal control problem — regarding the minimal time $ T( x) $ for a trajectory of an autonomous system (1) to hit a set $ M $, starting from a position $ x $— the function $ V( \tau , x) = V( x) $ can be considered as a particular kind of potential $ V( x) = T( x) $ with respect to $ M $. The choice of an optimal control $ u ^ {0} ( t, x) $ from conditions (5), (6) now has the form
$$ \min \left \{ {\left ( \frac{\partial V }{\partial x } , f( x, u) \right ) } : {u \in U } \right \} = - 1, $$
$$ V( x) = 0 \ \textrm{ when } x \in M, $$
which means that $ u ^ {0} ( t, x) = u ^ {0} ( x) $ realizes the descent of the optimal trajectory $ x ^ {0} ( t) $ relative to the level surfaces of the function $ V( x) $ by the fastest method permitted by the condition $ u \in U $.
The use of the method of dynamic programming (as a sufficient condition of optimality) will be rigorous if the function $ V( \tau , x) $ satisfies certain smoothness conditions everywhere (for example, in problems (3)–(6) the function $ V( \tau , x) $ must be continuously differentiable) or if smoothness conditions are satisfied everywhere with the exception of a "special" set $ N $. When certain special "conditions of regular synthesis" are fulfilled, the method of dynamic programming is equivalent to Pontryagin's principle, which is then seen to be a necessary and sufficient condition for optimality (see [8]). Difficulties connected with the a priori verification of the applicability of the method of dynamic programming and with the need to solve the Bellman equation complicate the use of this method. The method of dynamic programming has been extended to problems of optimal synthesis control for discrete (multi-stage) systems, where the corresponding Bellman equation is a finite-difference equation (see [2], [9]).
In problems of optimal control with differential constraints, the method of dynamic programming gives an effective solution of the synthesis problem in closed form for a class of problems which embraces linear systems with quadratic performance criterion (4) (the functions $ f ^ { 0 } $, $ \phi $ are positive-definite quadratic forms in $ x, u $ and in $ x $, respectively). This problem, related to the analytic construction of an optimal regulator if $ \phi ( t, x) \equiv 0 $, $ t _ {1} = \infty $, becomes a problem of optimal stabilization of the system (the property of asymptotic stability of the equilibrium position of a synthesized system follows directly from the existence of an admissible control) (see [10], [4]). The existence of a solution in the given instance is ensured by the property of stabilizability of the system (see [4]). For linear stationary and periodic systems it is equivalent to the property of controllability of the unstable models of the system (see Optimal programming control).
The solution of the problem of optimal stabilization has shown that the corresponding Bellman function is at the same time the "optimal" Lyapunov function for the initial system with an obtained optimal control. Under these circumstances effective conditions of controllability have been obtained and a complete analogue of Lyapunov's theory of stability (in a first approximation and in critical cases) for problems of stabilization has been created which embraces ordinary quasi-linear and periodic systems and also delay-systems. In the latter case, the role of the Bellman function is played by "optimal" Lyapunov–Krasovskii functionals, given on the sections of the trajectory that correspond to the value of the delay in the system (see [4], [5]). The theory of linear, quadratic problems of optimal control is also well-developed for partial differential equations (see [11]).
In applied problems of optimal synthesis control it is not always possible to measure all phase coordinates of the system. The following problem of observation therefore arises, permitting numerous generalizations: Knowing the realization $ y[ t] \in \mathbf R ^ {m} $ for an interval $ \sigma \leq t \leq \theta $ of an accessible measurement of the function $ y = g( t, x) $ in the coordinates of the system (1) (if $ u( t) $ is known, for example, if $ u( t) \equiv 0 $ and $ m \leq n $), find the vector $ x( \theta ) \in \mathbf R ^ {n} $ at the given moment $ \theta $. Systems which, through a unique realization $ y[ t] $, allow one to establish $ x( \theta ) $, whatever its value, are called completely observable.
The property of complete observability, as well as the construction of the corresponding analytic operations that distinguish $ x( \theta ) $ and the optimization of these operations, have been well studied for linear systems. Here, a duality principle is known: For every problem of observation, a corresponding equivalent two-point boundary value control problem for the dual system can be established. A consequence of this is that the property of complete observability of a linear system coincides with the property of complete controllability of the dual system with control. Moreover, it turns out that the corresponding dual boundary value problems of optimal observation and optimal control can also be composed in such a way that their solutions coincide (see [3]). The properties of controllability and observability of linear systems have many generalizations to linear infinite-dimensional systems (equations in a Banach space, systems with deviating argument, partial differential equations). There is also a number of results characterizing the corresponding properties. For non-linear systems, only some local theorems on observability are known. Solutions of the problem of observation have found numerous applications in synthesis problems with incomplete information on coordinates, among them problems of optimal stabilization (see [3]–[5], [14], [15]).
The problem of synthesis control becomes especially interesting when information on the controls of the controllable process, the initial conditions and the current parameters is subject to uncertainty (perturbations). If the description of this uncertainty has a statistical character, the problems of optimal control are examined within the framework of the theory of stochastic optimal control. This theory, arising from the solution of stochastic problems [16], has been, to a very large degree, developed for systems of the form
$$ \tag{7 } \dot{x} = f( t, x, u) + g( t, x, u) \eta ,\ \ x( t _ {0} ) = x ^ {0} , $$
with random perturbations $ \eta ( t) $ described by Gaussian diffusion processes or by more general classes of Markov processes (the initial vector is also usually taken random). In these circumstances, as a rule, it is assumed that certain probability characteristics of the variable $ \eta $ are given (for example, information on the moments of the corresponding distributions or on the parameters of the stochastic equations describing the evolution of the process $ \eta ( t) $).
Generally, the use of programming and synthesis controls gives essentially different values of the optimal performance indices $ J $( the roles of these indices can be played, for instance, by average estimates of non-negative functionals defined on trajectories of the process). The problem of synthesis of a stochastic optimal synthesis control now has clear advantages, since the continuous measurement of coordinates of the system enables one to correct the movement with regard to the real course of the random process, not predicted earlier. The method of dynamic programming combined with the theory of generating operators for the Markov semi-groups associated with stochastic processes, has led to sufficient conditions for optimality. This has led to the solution of a number of problems of stochastic optimal control on finite or infinite time intervals, including those with complete and incomplete information on current coordinates, of stochastic problems of pursuit, etc. It is essential that, for the principle of optimality to apply, the control $ u $ exists at every moment of time $ t $ as a function of "sufficient coordinates" $ z $ of the process which are known to have the Markov property (see [5], [6], [17], [18]).
It is in this way, in particular, that the theory of optimal stochastic stabilization has been developed, in conjunction with the corresponding Lyapunov theory of stability, for stochastic systems [19].
For the formulation of an optimal synthesis controller as well as for other aims of control, it is usual to evaluate the state of a stochastic system by means of measurements. The theory of stochastic filtering is about the solution of this question, given the condition that the measurement process is disturbed by probabilistic "noises" . The most complete solutions known here are for linear systems with quadratic optimality criteria (the so-called Kalman–Bucy filter, see [13]). In applying this theory to the problem of stochastic optimal synthesis control, conditions have been developed which ensure the validity of the separation principle, allowing the problem of control to be solved independently of the problem of evaluating current positions on the basis of sufficient coordinates of the process (see [20]; [18] and [21] are dedicated to more general procedures of stochastic filtering, as well as to problems of a stochastic optimal control when the control itself is selected from the class of Markov diffusion processes).
A strictly formalized solution of the problem of stochastic optimal control is invariably coupled with the problem of a correct foundation for the existence questions of solutions for the corresponding stochastic differential equations. The latter circumstance generated specific difficulties in the solution of problems of stochastic optimal control when non-classical constraints are applied.
An interesting process of dynamic optimization arises in problems of optimal synthesis control under conditions of uncertainty (see Optimal programming control). Synthesis solutions generally permit improvement of the quality of the criteria of the process, as compared to programming solutions, which are none the less the result of statistical optimization (carried out, admittedly, in a space of dynamical systems and control functions). The concepts and methods of game theory are now used to obtain a solution of these problems.
Let there be given a system
$$ \tag{8 } \dot{x} = f( t , x , u , w),\ \ t _ {0} \leq t \leq t _ {1} , $$
with constraints
$$ x ^ {0} = x( t _ {0} ) \in X ^ {0} \subseteq \mathbf R ^ {n} ,\ \ u \in U \subseteq \mathbf R ^ {p} ,\ \ w \in W \subseteq \mathbf R ^ {q} , $$
on the initial vector $ x ^ {0} $, the control $ u $ and the disturbances $ w $. Unlike the case of a coalition of players, represented by the initial control $ u $ subject to definition, the disturbances $ w $ are in this case treated as controls of an opponent player, and one is allowed to examine any strategies $ w $ formed from any admissible information. Moreover, the aims of control can be formulated from the point of view of each one of the players separately. If the stated aims are contradictory, then the problem of conflicting control arises. Research into problems of synthesis control under conditions of conflict or uncertainty is the subject of the theory of differential games.
The process of forming an optimal synthesis control under conditions of uncertainty can also be complicated by incomplete information on the current state. So, in the system (8), only the results of indirect measurements of the phase vector $ x $ and the realization $ y[ t] $ of the function
$$ \tag{9 } y( t) = g( t, x, \xi ) $$
can be accessible; here the indefinite parameters $ \xi $ are restricted by an a priori known constraint, $ \xi \in {\mathsf E} $. The values $ y[ t] $, $ t _ {0} \leq t \leq \theta $( for a given $ u( t) $), make it possible to construct a region of information $ X( \theta , y( \cdot )) $ of states of the system (8) in the phase space, along with the realization $ y[ t] $, equation (9) and the restriction on $ w, \xi $. Among the elements of $ X( \theta , y( \cdot )) = X( \theta , \cdot ) $ there will be also an unknown true state of the system (8), which can be estimated by choosing a point $ x ^ \star ( \theta , \cdot ) $ from $ X( \theta , \cdot ) $( for example, the "centre of gravity" or the "Chebyshev centre" of $ X( \theta , \cdot ) $). The study of the evolution of the regions $ X( \theta , \cdot ) $ and the dynamics of the vectors $ x ^ \star ( \theta , \cdot ) $ is the purpose of the theory of minimax filtering. Most complete solutions are known for linear systems and convex constraints (see [22]).
In general, the choice of a synthesis strategy of optimal control under conditions of uncertainty (for example, in the form of a functional $ u = u( t, X( t, \cdot )) $) must aim at control of the evolution of the domains $ X( \theta , \cdot ) $( i.e. the alternation of their configuration and their displacement in space), in accordance with prescribed criteria. For the problem shown, a number of general qualitative results is known, as well as constructive solutions in the class of special linear, convex problems (see [7], [22]). Furthermore, information containing measurements (for example, of the functions $ y[ t] $ in the systems (8) and (9)), permits an a posteriori re-evaluation during the process of the domain of admissible values of the indefinite parameters in the direction of their constraint. In this way, the problem of identification of a mathematical model of a process (for example, the parameters $ w $ in equation (8)), is solved at the same time. All that has been said enables one to treat the solutions of the problem of optimal synthesis control under conditions of uncertainty as a procedure of adaptive optimal control, in which a more precise definition of the properties of the model of the process gets mixed up with the choice of the controls. The questions of identification of models of dynamical processes and of the problem of adaptive optimal control are studied in detail under the assumption of existence of a probabilistic description of the indefinite parameters (see [23], [24]).
If in problems of optimal synthesis control under conditions of uncertainty the parameters $ w $, $ \xi $ are treated as "controls" of a fictitious opponent player, then the aims of the controls $ u $ and $ \{ w, \xi \} $ can be different. The latter instance leads to a non-scalar quality criterion of the process. A consequence of this is that the corresponding problems can be considered within the framework of the concepts of equilibrium situations peculiar to the multi-criterion problems of the theory of cooperative games and their generalizations.
References
[1] | L.S. Pontryagin, V.G. Boltayanskii, R.V. Gamkrelidze, E.F. Mishchenko, "The mathematical theory of optimal processes" , Wiley (1967) (Translated from Russian) MR0836572 MR0719372 MR0186436 MR1532457 MR0166037 Zbl 0882.01027 Zbl 0516.49001 Zbl 0117.31702 Zbl 0102.32001 |
[2] | R. Bellman, "Dynamic programming" , Princeton Univ. Press (1957) MR0093619 MR0091224 MR0091222 MR0090499 MR0090477 MR0090469 MR0089099 MR0088666 MR0085149 MR0083675 Zbl 0081.14402 Zbl 0079.36202 Zbl 0079.35202 Zbl 0079.34301 Zbl 0078.09502 Zbl 0077.32402 Zbl 0077.13605 Zbl 0995.90618 |
[3] | N.N. Krasovskii, "Theory of control by motion" , Moscow (1968) (In Russian) |
[4] | N.N. Krasovskii, "On the stabilization of dynamic systems by supplementary forces" Diff. Eq. , 1 : 1 (1965) pp. 1–9 Differentsial'nye Uravneniya , 1 : 1 (1963) pp. 5–16 MR192934 |
[5] | N.N. Krasovskii, "Theory of optimal control systems" , Mechanics in the USSR during 50 years , 1 , Moscow (1968) pp. 179–244 (In Russian) MR0238147 |
[6] | N.N. Krasovskii, "On mean-square optimum stabilization at damped random perturbations" J. Appl. Math. Mech. , 25 (1961) pp. 1212–1227 Prikl. Mat. Mekh. , 25 : 5 (1961) pp. 806–817 MR0136838 |
[7] | N.N. Krasovaskii, A.I. Subbotin, "Game-theoretical control problems" , Springer (1988) (Translated from Russian) MR918771 |
[8] | V.G. Boltyanskii, "Mathematical methods of optimal control" , Holt, Rinehart & Winston (1971) (Translated from Russian) MR0353081 |
[9] | V.G. Boltyanskii, "Optimal control of discrete systems" , Wiley (1978) (Translated from Russian) MR0528636 MR0514558 |
[10] | A.M. Letov, "Mathematical theory of control processes" , Moscow (1981) (In Russian) MR0615390 |
[11] | J.-L. Lions, "Optimal control of systems governed by partial differential equations" , Springer (1971) (Translated from French) MR0271512 Zbl 0203.09001 |
[12] | R.E. Kalman, "On the general theory of control systems" , Proc. 1-st Internat. Congress Internat. Fed. Autom. Control , 2 , Moscow (1960) pp. 521–547 |
[13] | R. Kalman, R. Bucy, "New results in linear filtering and prediction theory" Proc. Amer. Soc. Mech. Engineers Ser. 1.D , 83 (1961) pp. 95–108 MR0234760 |
[14] | E.B. Lee, L. Marcus, "Foundations of optimal control theory" , Wiley (1967) MR0220537 Zbl 0159.13201 |
[15] | A.G. Butkovskii, "Structural theory of distributed systems" , Horwood (1983) (Translated from Russian) |
[16] | A.N. Kolmogorov, E.F. Mishchenko, L.S. Pontryagin, "A probability problem of optimal control" Soviet Math. Dokl. , 3 : 4 (1962) pp. 1143–1145 Dokl. Akad. Nauk SSSR , 145 : 5 (1962) pp. 993–995 MR0183574 Zbl 0124.33803 |
[17] | R.S. Liptser, A.N. Shiryaev, "Statistics of random processes" , 1–2 , Springer (1977–1978) (Translated from Russian) MR1800858 MR1800857 MR0608221 MR0488267 MR0474486 Zbl 1008.62073 Zbl 1008.62072 Zbl 0556.60003 Zbl 0369.60001 Zbl 0364.60004 |
[18] | K.J. Åström, "Introduction to stochastic control theory" , Acad. Press (1970) MR0270799 Zbl 0226.93027 |
[19] | I.Ya. Kats, N.N. Krasovskii, "On the stability of systems with random parameters" J. Appl. Math. Mech. , 24 (1960) pp. 1225–1246 Prikl. Mat. Mekh. , 24 : 5 (1960) pp. 809–823 Zbl 0103.36403 |
[20] | W.M. Wonham, "On the separation theorem of stochastic control" SIAM J. Control , 6 (1968) pp. 312–326 MR0237219 Zbl 0164.19101 |
[21] | N.V. Krylov, "Controlled diffusion processes" , Springer (1980) (Translated from Russian) MR0601776 Zbl 0459.93002 Zbl 0436.93055 |
[22] | A.B. Kurzhanskii, "Control and observability under conditions of uncertainty" , Moscow (1977) (In Russian) |
[23] | Ya.Z. Tsypkin, "Foundations of the theory of learning systems" , Acad. Press (1973) (Translated from Russian) MR0434545 Zbl 0258.93019 |
[24] | P. Eikhoff, "Basics of identification of control systems" , Moscow (1975) (In Russian; translated from English) |
Comments
An optimal synthesis control is usually called an optimal closed-loop control or an optimal feedback control in the Western literature, while an optimal programming control is usually called an optimal open-loop control. See also Optimal control, mathematical theory of.
For a detailed discussion of when optimal open-loop controls can be used to find optimal closed-loop controls see [a11], [a12].
In the formulation of an optimal control problem one distinguishes problems with a terminal index, with an integral index or with a combination of both such, as expressed by equation (4). Instead of "index" , and depending on the particular application, one also speaks about "cost function" (to be minimized) or "performance index" (usually to be maximized).
In general, analytic solutions to optimal control problems do not exist. A noteable exception is the case where the system is described by a linear equation:
$$ \dot{x} = A x + B u ,\ t _ {0} \leq t \leq t _ {f} ,\ \ x ( t _ {0} ) = x _ {0} ; $$
and the cost function by a quadratic equation:
$$ J = \frac{1}{2} x ^ \prime ( t _ {f} ) Q _ {f} x ( t _ {f} ) + \frac{1}{2} \int\limits _ { t _ {0} } ^ { {t _ f } } ( x ^ \prime Q x + u ^ \prime R u ) d t . $$
Here $ x \in \mathbf R ^ {n} $, $ u \in \mathbf R ^ {p} $, $ A $, $ B $, $ Q _ {f} $, $ Q $, and $ R $ are matrices of appropriate sizes. Moreover, $ Q _ {f} \geq 0 $, $ Q \geq 0 $, $ R > 0 $, and the transpose is denoted by $ {} ^ \prime $. The final time is supposed to be fixed here. The solution to this optimal control problem is
$$ u ^ {0} ( x , t ) = - R ^ {-} 1 B ^ \prime P ( t) x , $$
where the $ ( n \times n) $- matrix $ P ( t) $ satisfies the so-called Riccati equation;
$$ \dot{P} = - A ^ \prime P - P A + P B R ^ {-} 1 B ^ \prime P - Q ,\ P ( t _ {f} ) = Q _ {f} . $$
If the pair $ ( A , B ) $ is controllable and the pair $ ( A , C ) $ is observable, where the $ ( n \times n) $- matrix $ C $ is defined by $ C ^ \prime C = Q $, then $ \lim\limits _ {t _ {f} \rightarrow \infty } P ( t _ {0} ) $ exists; it will be denoted by $ \overline{P}\; $. The $ x = 0 $ solution of $ \dot{x} = ( A - R ^ {-} 1 B ^ \prime \overline{P}\; B ) x $ is asymptotically stable. The conditions on controllability and observability can be replaced by the weaker conditions on stabilizability and detectability, respectively; see [a13]. The notions of controllability, observability, etc. are properties of the system and as such belong to the field of mathematical system theory.
Another class of problems of which the features of the optimal control function $ u ^ {0} ( x , t ) $ are well understood is the class of linear time-optimal control problems. The notion of reachability set helps visualizing the optimal solution; see [a14].
References
[a1] | W.H. Fleming, R.W. Rishel, "Deterministic and stochastic control" , Springer (1975) MR454768 Zbl 0323.49001 |
[a2] | D.P. Bertsekas, S.E. Shreve, "Stochastic optimal control: the discrete-time case" , Acad. Press (1978) Zbl 0471.93002 |
[a3] | D.P. Bertsekas, "Dynamic programming and stochastic control" , Acad. Press (1976) MR0688509 Zbl 0549.93064 |
[a4] | M.H.A. Davis, "Martingale methods in stochastic control" , Stochastic Control and Stochastic Differential Systems , Lect. notes in control and inform. sci. , 16 , Springer (1979) pp. 85–117 MR0547467 Zbl 0409.93052 |
[a5] | L. Cesari, "Optimization - Theory and applications" , Springer (1983) MR0688142 Zbl 0506.49001 |
[a6] | L.W. Neustadt, "Optimization, a theory of necessary conditions" , Princeton Univ. Press (1976) Zbl 0353.49003 |
[a7] | V. Barbu, G. Da Prato, "Hamilton–Jacobi equations in Hilbert spaces" , Pitman (1983) MR0704182 Zbl 0508.34001 Zbl 0471.49026 |
[a8] | H.J. Kushner, "Introduction to stochastic control" , Holt (1971) MR0280248 Zbl 0293.93018 |
[a9] | P.R. Kumar, P. Varaiya, "Stochastic systems: estimation, identification and adaptive control" , Prentice-Hall (1986) Zbl 0706.93057 |
[a10] | L. Ljung, "System identification theory for the user" , Prentice-Hall (1987) MR1157156 Zbl 0615.93004 |
[a11] | P. Brunovsky, "On the structure of optimal feedback systems" , Proc. Internat. Congress Mathematicians (Helsinki, 1978) , 2 , Acad. Sci. Fennicae (1980) pp. 841–846 MR0562697 Zbl 0425.49019 |
[a12] | H.J. Sussmann, "Analytic stratifications and control theory" , Proc. Internat. Congress Mathematicians (Helsinki, 1978) , 2 , Acad. Sci. Fennicae (1980) pp. 865–871 MR0562701 Zbl 0499.93023 |
[a13] | H. Kuakernaek, R. Sivan, "Linear optimal control systems" , Wiley (1972) MR406607 |
[a14] | H. Hermes, J.P. Lasalle, "Functional analysis and time optimal control" , Acad. Press (1969) MR0420366 Zbl 0203.47504 |
[a15] | A.E. Bryson, Y.-C. Ho, "Applied optimal control" , Ginn (1969) MR0446628 |
Optimal synthesis control. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Optimal_synthesis_control&oldid=16102