Namespaces
Variants
Actions

Difference between revisions of "Analysis of programs"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex done)
 
Line 1: Line 1:
 +
{{TEX|done}}
 +
 
''program analysis''
 
''program analysis''
  
Line 4: Line 6:
  
 
==Flow-based approaches to program analysis.==
 
==Flow-based approaches to program analysis.==
These have mainly been developed for imperative, functional and object-oriented programs. For a given [[Program|program]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105801.png" /> the task is to find a flow function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105802.png" /> that maps the program points in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105803.png" /> to approximate descriptions of the data that may reach those points. This may be formalized by a relation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105804.png" /> defined by a number of clauses depending on the syntax of the program <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105805.png" />. As a very simple example, consider the program <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105806.png" /> that takes an input, executes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105807.png" /> on it, then executes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105808.png" /> on the result obtained from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a1105809.png" />, and finally produces the result from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058010.png" /> as the overall result. The defining clause for this program may be written:''''''<table border="0" cellpadding="0" cellspacing="0" style="background-color:black;"> <tr><td> <table border="0" cellspacing="1" cellpadding="4" style="background-color:black;"> <tbody> <tr> <td colname="1" style="background-color:white;" colspan="1"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058011.png" /></td> <td colname="2" style="background-color:white;" colspan="1">if and only if</td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1"></td> <td colname="2" style="background-color:white;" colspan="1"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058012.png" /></td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1"></td> <td colname="2" style="background-color:white;" colspan="1"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058013.png" /></td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1"></td> <td colname="2" style="background-color:white;" colspan="1"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058014.png" /></td> </tr> </tbody> </table>
+
These have mainly been developed for imperative, functional and object-oriented programs. For a given [[Program|program]] $  p $
 +
the task is to find a flow function $  F \in { \mathop{\rm Inf}\nolimits} ^ { { \mathop{\rm Pts}\nolimits}} $
 +
that maps the program points in $  p $
 +
to approximate descriptions of the data that may reach those points. This may be formalized by a relation $  \vDash p:F $
 +
defined by a number of clauses depending on the syntax of the program $  p $.  
 +
As a very simple example, consider the program $  p = p _{1} ;p _{2} $
 +
that takes an input, executes $  p _{1} $
 +
on it, then executes $  p _{2} $
 +
on the result obtained from $  p _{1} $,  
 +
and finally produces the result from $  p _{2} $
 +
as the overall result. The defining clause for this program may be written:<table border="0" cellpadding="0" cellspacing="0" style="background-color:black;"> <tr><td> <table border="0" cellspacing="1" cellpadding="4" style="background-color:black;"> <tbody> <tr> <td colname="1" style="background-color:white;" colspan="1"> $  \vDash ( p _{1} ;p _{2} ) : F $
 +
</td> <td colname="2" style="background-color:white;" colspan="1">if and only if</td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1"></td> <td colname="2" style="background-color:white;" colspan="1"> $  \vDash p _{1} : F \  \land $
 +
</td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1"></td> <td colname="2" style="background-color:white;" colspan="1"> $  \vDash p _{2} : F \  \land $
 +
</td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1"></td> <td colname="2" style="background-color:white;" colspan="1"> $  F ( { \mathop{\rm end}\nolimits} ( p _{1} ) ) \subseteq F ( { \mathop{\rm begin}\nolimits} ( p _{2} ) ) $
 +
</td> </tr> </tbody> </table>
  
 
</td></tr> </table>
 
</td></tr> </table>
  
In the presence of recursive programs, the predicate cannot be defined by structural induction on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058015.png" /> and instead one appeals to co-induction; this merely amounts to taking the greatest fixed point of a function mapping sets of pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058016.png" /> into sets of pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058017.png" /> as described by the right-hand sides of the clauses.
+
In the presence of recursive programs, the predicate cannot be defined by structural induction on $  p $
 +
and instead one appeals to co-induction; this merely amounts to taking the greatest fixed point of a function mapping sets of pairs $  ( p,F ) $
 +
into sets of pairs $  ( p,F ) $
 +
as described by the right-hand sides of the clauses.
  
It is essential that the analysis information is semantically correct. Given a semantics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058018.png" /> for describing when the program <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058019.png" /> maps the input <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058020.png" /> to the output <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058021.png" />, this amounts in its simplest form to establishing a subject reduction result:
+
It is essential that the analysis information is semantically correct. Given a semantics $  v _{1} \rightarrow _{p} v _{2} $
 +
for describing when the program $  p $
 +
maps the input $  v _{1} $
 +
to the output $  v _{2} $,  
 +
this amounts in its simplest form to establishing a subject reduction result:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058022.png" /></td> </tr></table>
+
$$
 +
\vDash p: F \land v _{1} \in F ( { \mathop{\rm begin}\nolimits} ( p ) ) \land v _{1} \rightarrow _{p} v _{2}  $$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058023.png" /></td> </tr></table>
+
$$
 +
\Downarrow
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058024.png" /></td> </tr></table>
+
$$
 +
v _{2} \in F ( { \mathop{\rm end}\nolimits} ( p ) )
 +
$$
  
Finally, realization of the analysis amounts to finding the minimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058025.png" /> (with respect to the pointwise partial order on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058026.png" />) for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058027.png" />; this is related to the lattice-theoretic notion of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058028.png" /> being a Moore family. Often, this is done by obtaining a more implementation-oriented specification <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058029.png" /> for generating a set of constraints <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058030.png" />, deriving a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058031.png" /> mapping flow functions to flow functions, and ensuring that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058032.png" /> if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058033.png" />.
+
Finally, realization of the analysis amounts to finding the minimal $  F $(
 +
with respect to the pointwise partial order on $  { \mathop{\rm Inf}\nolimits} ^ { { \mathop{\rm Pts}\nolimits}} $)  
 +
for which $  \vDash p : F $;  
 +
this is related to the lattice-theoretic notion of $  \{ F : {\vDash p : F} \} $
 +
being a Moore family. Often, this is done by obtaining a more implementation-oriented specification $  \vdash p : C $
 +
for generating a set of constraints $  C $,  
 +
deriving a function $  {\widehat{C} } $
 +
mapping flow functions to flow functions, and ensuring that $  {\widehat{C} } ( F ) = F $
 +
if and only if $  \vDash p : F $.
  
 
==Inference-based approaches to program analysis.==
 
==Inference-based approaches to program analysis.==
These have mainly been developed for functional, imperative, and concurrent programs. For a given [[Program|program]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058034.png" />, the task is to find an  "annotated type"  <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058035.png" /> describing the overall effect of executing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058036.png" />: if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058037.png" /> describes the input data, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058038.png" /> describes the output data, and properties of the dynamic executions (e.g., whether or not errors can occur) are described by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058039.png" />. This may be formalized by a relation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058040.png" /> defined by a number of inference rules (cf. [[Permissible law (inference)|Permissible law (inference)]]); some of these are defined compositionally in the structure of the program as in:
+
These have mainly been developed for functional, imperative, and concurrent programs. For a given [[Program|program]] $  p $,  
 +
the task is to find an  "annotated type"   $  q _{1} { \stackrel \sigma  \rightarrow } q _{2} $
 +
describing the overall effect of executing $  p $:  
 +
if $  q _{1} $
 +
describes the input data, then $  q _{2} $
 +
describes the output data, and properties of the dynamic executions (e.g., whether or not errors can occur) are described by $  \sigma $.  
 +
This may be formalized by a relation $  \vdash p: ( q _{1} { \stackrel \sigma  \rightarrow  } q _{2} ) $
 +
defined by a number of inference rules (cf. [[Permissible law (inference)|Permissible law (inference)]]); some of these are defined compositionally in the structure of the program as in:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058041.png" /></td> </tr></table>
+
$$
 +
{
 +
\frac{ {\vdash p _{1} : ( q _{0} { \stackrel{ {\sigma _ 1}} \rightarrow  } q _{1} )} \quad {\vdash p _{2} : ( q _{1} { \stackrel{ {\sigma _ 2}} \rightarrow  } q _{2} )}}{ {\vdash ( p _{1} ; p _{2} ) : q _{0} { \stackrel{ {\sigma _{1} \uplus \sigma _ 2}} \rightarrow  } q _ 2}}
 +
}
 +
$$
  
Here, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058042.png" /> combines the effects <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058043.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058044.png" /> into an overall effect; in the simplest cases, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058045.png" /> simply equals <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058046.png" />, but in more complex cases the causal structure (i.e. which effect comes first) must be retained. Successful analysis of recursive programs demands that invariants be guessed in order to obtain a finite inference tree. (This form of program analysis is closely related to program verification: see [[Theoretical programming|Theoretical programming]], the section on  "The logical theory of programs" .)
+
Here, $  \sigma _{1} \uplus \sigma _{2} $
 +
combines the effects $  \sigma _{1} $
 +
and $  \sigma _{2} $
 +
into an overall effect; in the simplest cases, $  \sigma _{1} \uplus \sigma _{2} $
 +
simply equals $  \sigma _{1} \cup \sigma _{2} $,  
 +
but in more complex cases the causal structure (i.e. which effect comes first) must be retained. Successful analysis of recursive programs demands that invariants be guessed in order to obtain a finite inference tree. (This form of program analysis is closely related to program verification: see [[Theoretical programming|Theoretical programming]], the section on  "The logical theory of programs" .)
  
Semantic correctness necessitates relations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058047.png" /> between values and properties, e.g. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058048.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058049.png" /> between the effects of the dynamic semantics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058050.png" /> and the analysis, e.g. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058051.png" />. The correctness statement then amounts to a subject reduction result:
+
Semantic correctness necessitates relations $  \vDash v:q $
 +
between values and properties, e.g. $  ( q = \textrm{ integer } ) \Rightarrow ( v \in \{ \dots , - 1, 0,1, \dots \} ) $,
 +
and $  \vDash d \sim \sigma $
 +
between the effects of the dynamic semantics $  ( v _{1} {\rightarrow ^ d} _{p} v _{2} ) $
 +
and the analysis, e.g. $  d = \sigma $.  
 +
The correctness statement then amounts to a subject reduction result:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058052.png" /></td> </tr></table>
+
$$
 +
( \vDash v _{1} : q _{1} ) \land ( v _{1} {\rightarrow ^ d} _{p} v _{2} ) \land \vdash p : ( q _{1} { \stackrel \sigma  \rightarrow  } q _{2} )
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058053.png" /></td> </tr></table>
+
$$
 +
\Downarrow
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058054.png" /></td> </tr></table>
+
$$
 +
( \vDash v _{2} : q _{2} ) \land ( \vDash d \sim \sigma )
 +
$$
  
Implementation of the analysis amounts to defining an [[Algorithm|algorithm]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058055.png" /> that is syntactically sound, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058056.png" />, and syntactically complete, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058057.png" /> implies that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058058.png" /> is an  "instance"  of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058059.png" />; the precise definition of  "instance"  depends on the nature of the properties and may have some complexity to it. Often, the algorithm works using constraints somewhat similar to those used in the flow-based approach.
+
Implementation of the analysis amounts to defining an [[Algorithm|algorithm]] $  {\mathcal Q} [\![p]\!] = q $
 +
that is syntactically sound, i.e. $  \vdash p : q $,  
 +
and syntactically complete, i.e. $  \vdash p : ( q _{1} { \stackrel \sigma  \rightarrow  } q _{2} ) $
 +
implies that $  q _{1} { \stackrel \sigma  \rightarrow  } q _{2} $
 +
is an  "instance"  of $  q $;  
 +
the precise definition of  "instance"  depends on the nature of the properties and may have some complexity to it. Often, the algorithm works using constraints somewhat similar to those used in the flow-based approach.
  
 
==Semantics-based approaches to program analysis.==
 
==Semantics-based approaches to program analysis.==
These attempt to define the analysis in much the same way that the semantics itself is defined. These developments have mainly been based on denotational semantics (where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058060.png" /> is written <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058061.png" />) for imperative and functional programs, although work has started on similar developments based on (so-called small-step) structural operational semantics and (so-called big-step) natural semantics (cf. also [[Theoretical programming|Theoretical programming]]). The denotational approach may be formulated as an analysis function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058062.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058063.png" /> is a mathematical entity describing the input to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058064.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058065.png" /> is a mathematical entity describing the output from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058066.png" />. The denotational approach calls for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058067.png" /> to be defined compositionally in terms of the syntactic constituents of the program <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058068.png" />; an example of this is the functional composition
+
These attempt to define the analysis in much the same way that the semantics itself is defined. These developments have mainly been based on denotational semantics (where $  v _{1} \rightarrow _{p} v _{2} $
 +
is written $  {\mathcal S} [\![p]\!] ( v _{1} ) = v _{2} $)  
 +
for imperative and functional programs, although work has started on similar developments based on (so-called small-step) structural operational semantics and (so-called big-step) natural semantics (cf. also [[Theoretical programming|Theoretical programming]]). The denotational approach may be formulated as an analysis function $  {\mathcal A} [\![p]\!] ( l _{1} ) = l _{2} $,  
 +
where $  l _{1} $
 +
is a mathematical entity describing the input to $  p $
 +
and $  l _{2} $
 +
is a mathematical entity describing the output from $  p $.  
 +
The denotational approach calls for $  {\mathcal A} [\![p]\!] $
 +
to be defined compositionally in terms of the syntactic constituents of the program $  p $;  
 +
an example of this is the functional composition
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058069.png" /></td> </tr></table>
+
$$
 +
{\mathcal A} [\![ {p _{1} ; p _ 2} ]\!] = {\mathcal A}
 +
[\![ {p _ 2} ]\!] \circ {\mathcal A} [\![ {p _ 1} ]\!] ,
 +
$$
  
i.e., <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058070.png" />. In case of recursive programs, a least fixed point is taken of a suitable functional.
+
i.e., $  {\mathcal A} [\![ {p _{1} ; p _ 2} ]\!] ( l ) = {\mathcal A} [\![ {p _ 2} ]\!] ( {\mathcal A} [\![ {p _ 1} ]\!] ( l ) ) $.  
 +
In case of recursive programs, a least fixed point is taken of a suitable functional.
  
To express the correctness of the analysis one needs to define correctness relations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058071.png" /> on the data, e.g. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058072.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058073.png" /> on the computations, e.g. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058074.png" />; the latter definition is usually called a logical relation. Semantic soundness then amounts to establishing
+
To express the correctness of the analysis one needs to define correctness relations $  v R _{t} l $
 +
on the data, e.g. $  v \in l $,  
 +
and $  f R _{ {t _{1} \rightarrow t _ 2}} h $
 +
on the computations, e.g. $  \forall v, l : ( v R _{ {t _ 1}} l ) \Rightarrow ( f ( v ) R _{ {t _ 2}} h ( l ) ) $;  
 +
the latter definition is usually called a logical relation. Semantic soundness then amounts to establishing
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058075.png" /></td> </tr></table>
+
$$
 +
{\mathcal S} [\![ p]\!] R _{ {t _{1} \rightarrow t _ 2}} {\mathcal A} [\![ p]\!]
 +
$$
  
for all programs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058076.png" /> of type <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058077.png" />. Efficient implementation of the analyses often demands that they be presented in a form less dependent on the program syntax, e.g. using graphs or constraints, and then the solution procedure must be shown sound and complete.
+
for all programs $  p $
 +
of type $  t _{1} \rightarrow t _{2} $.  
 +
Efficient implementation of the analyses often demands that they be presented in a form less dependent on the program syntax, e.g. using graphs or constraints, and then the solution procedure must be shown sound and complete.
  
 
==Abstract interpretation.==
 
==Abstract interpretation.==
This is a general framework for obtaining cheaper and more approximative solutions to program analysis problems. In its most general form it attempts to be independent of the semantic base, but usually it is formulated in either a structural operational or a denotational setting. Given the semantics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058078.png" /> of a program <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058079.png" />, the first step is often to define the uncomputable collecting semantics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058080.png" />. The intention is that
+
This is a general framework for obtaining cheaper and more approximative solutions to program analysis problems. In its most general form it attempts to be independent of the semantic base, but usually it is formulated in either a structural operational or a denotational setting. Given the semantics $  v _{1} \rightarrow _{p} v _{2} $
 +
of a program $  p $,  
 +
the first step is often to define the uncomputable collecting semantics $  s _{1} \Rightarrow _{p} s _{2} $.  
 +
The intention is that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058081.png" /></td> </tr></table>
+
$$
 +
s _{1} \Rightarrow _{p} s _{2}  $$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058082.png" /></td> </tr></table>
+
$$
 +
\Updownarrow
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058083.png" /></td> </tr></table>
+
$$
 +
s _{2} = \left \{ {v _ 2} : {v _{1} \rightarrow _{p} v _{2} \land v _{1} \in s _ 1} \right \}
 +
$$
  
 
and this may be regarded as a soundness as well as a completeness statement.
 
and this may be regarded as a soundness as well as a completeness statement.
  
To obtain a more approximate and eventually computable analysis, the most commonly used approach is to replace the sets of values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058084.png" /> by some more approximate properties <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058085.png" />. Both <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058086.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058087.png" /> are partially ordered in such a way that one generally wants the least correct element and where the greatest element is uninformative. (By the lattice duality principle, one could instead have desired the greatest correct element, and part of the literature on data flow analysis uses this vocabulary.) The relation between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058088.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058089.png" /> is often formalized by a pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058090.png" /> of functions:
+
To obtain a more approximate and eventually computable analysis, the most commonly used approach is to replace the sets of values $  s \in S $
 +
by some more approximate properties $  l \in L $.  
 +
Both $  S $
 +
and $  L $
 +
are partially ordered in such a way that one generally wants the least correct element and where the greatest element is uninformative. (By the lattice duality principle, one could instead have desired the greatest correct element, and part of the literature on data flow analysis uses this vocabulary.) The relation between $  S $
 +
and $  L $
 +
is often formalized by a pair $  ( \alpha, \gamma ) $
 +
of functions:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058091.png" /></td> </tr></table>
+
$$
 +
S \mathop{\rightleftarrows} ^  \alpha  _ \gamma L,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058092.png" /> is the concretization function giving the  "meaning"  of properties in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058093.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058094.png" /> is the abstraction function. It is customary to demand that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058095.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058096.png" /> are complete lattices (cf. [[Complete lattice|Complete lattice]]). Furthermore, it is customary to demand that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058097.png" /> is a Galois connection, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058098.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a11058099.png" /> are monotone and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580100.png" /> is extensive <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580101.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580102.png" /> is reductive <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580103.png" />, or, equivalently, that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580104.png" /> is an adjunction, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580105.png" />. This ensures that a set of values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580106.png" /> always has a best description <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580107.png" />; in fact, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580108.png" />.
+
where $  \gamma : L \rightarrow S $
 +
is the concretization function giving the  "meaning"  of properties in $  L $,  
 +
and $  \alpha : S \rightarrow L $
 +
is the abstraction function. It is customary to demand that $  S $
 +
and $  L $
 +
are complete lattices (cf. [[Complete lattice|Complete lattice]]). Furthermore, it is customary to demand that $  ( \alpha, \gamma ) $
 +
is a Galois connection, i.e. $  \alpha $
 +
and $  \gamma $
 +
are monotone and $  \gamma \circ \alpha $
 +
is extensive $  ( \forall s: \gamma ( \alpha ( s ) ) \supset s ) $
 +
and $  \alpha \circ \gamma $
 +
is reductive $  ( \forall l: \alpha ( \gamma ( l ) ) \subset  l ) $,  
 +
or, equivalently, that $  ( \alpha, \gamma ) $
 +
is an adjunction, i.e. $  \forall s, l: \alpha ( s ) \subset  l \iff s \subset  \gamma ( l ) $.  
 +
This ensures that a set of values $  s $
 +
always has a best description $  l $;  
 +
in fact, $  l = \alpha ( s ) $.
  
Abstract interpretation then amounts to define an analysis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580109.png" /> on the properties of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580110.png" /> such that the following correctness statement holds:
+
Abstract interpretation then amounts to define an analysis $  l _{1} \Rightarrow _ p^ \prime  l _{2} $
 +
on the properties of $  L $
 +
such that the following correctness statement holds:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580111.png" /></td> </tr></table>
+
$$
 +
l _{1} \Rightarrow _ p^ \prime  l _{2}  $$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580112.png" /></td> </tr></table>
+
$$
 +
\Downarrow
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580113.png" /></td> </tr></table>
+
$$
 +
\exists s : \gamma ( l _{1} ) \Rightarrow _{p} s \land s \subseteq \gamma ( l _{2} )
 +
$$
  
 
This may be illustrated by the diagram
 
This may be illustrated by the diagram
Line 83: Line 212:
 
It expresses that a certain diagram semi-commutes.
 
It expresses that a certain diagram semi-commutes.
  
Example analyses that can be obtained in this way include intervals as approximations of sets of integers, polygons in the plane as approximations of sets of pairs of numbers, representations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580114.png" /> of arithmetical congruences as approximations of periodic sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580115.png" /> as well as combinations of these.
+
Example analyses that can be obtained in this way include intervals as approximations of sets of integers, polygons in the plane as approximations of sets of pairs of numbers, representations $  ( k _{1} , k _{2} ) $
 +
of arithmetical congruences as approximations of periodic sets $  \{ {k _{1} + n \cdot k _ 2} : {n \in \{ \dots , - 1, 0, 1, \dots \}} \} $
 +
as well as combinations of these.
  
 
==Fixed-point approximation.==
 
==Fixed-point approximation.==
This is essential for program analysis because most of the approaches can be reformulated as finding the least fixed point (cf. also [[Fixed point|Fixed point]]) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580116.png" /> of a monotone function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580117.png" /> on a [[Complete lattice|complete lattice]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580118.png" /> of properties; often, each element <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580119.png" /> describes a set of values. There are two characterizations of the least fixed point that are of relevance here. Tarski's theorem ensures that
+
This is essential for program analysis because most of the approaches can be reformulated as finding the least fixed point (cf. also [[Fixed point|Fixed point]]) $  { \mathop{\rm lfp}\nolimits} ( f ) $
 +
of a monotone function $  f : L \rightarrow L $
 +
on a [[Complete lattice|complete lattice]] $  L = ( L, \subset  ) $
 +
of properties; often, each element $  l \in L $
 +
describes a set of values. There are two characterizations of the least fixed point that are of relevance here. Tarski's theorem ensures that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580120.png" /></td> </tr></table>
+
$$
 +
{ \mathop{\rm lfp}\nolimits} ( f ) = \bigcap { \mathop{\rm Red}\nolimits} ( f ) ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580121.png" /></td> </tr></table>
+
$$
 +
{ \mathop{\rm Red}\nolimits} ( f ) = \left \{ {l \in L} : {f ( l ) \subset  l} \right \} ,
 +
$$
  
satisfies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580122.png" /> and hence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580123.png" /> is the least fixed point of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580124.png" />. [[Transfinite induction|Transfinite induction]] allows one to define
+
satisfies $  f ( { \mathop{\rm lfp}\nolimits} ( f ) ) = { \mathop{\rm lfp}\nolimits} ( f ) $
 +
and hence $  { \mathop{\rm lfp}\nolimits} ( f ) \in { \mathop{\rm Red}\nolimits} ( f ) $
 +
is the least fixed point of $  f $.  
 +
[[Transfinite induction|Transfinite induction]] allows one to define
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580125.png" /></td> </tr></table>
+
$$
 +
f ^ {[ \kappa + 1 ]} = f ( f ^ {[ \kappa ]} ) \  \textrm{ for } \  \kappa+1 \  \textrm{ a   successor  ordinal } ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580126.png" /></td> </tr></table>
+
$$
 +
f ^ {[ \kappa ]} = \bigcup _{ {\kappa^ \prime } < \kappa} f ^ {[ \kappa^ \prime  ]} \  \textrm{ for } \  \kappa \  \textrm{ a   limit  ordinal } ,
 +
$$
  
 
and then the least fixed point is given by
 
and then the least fixed point is given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580127.png" /></td> </tr></table>
+
$$
 +
{ \mathop{\rm lfp}\nolimits} ( f ) = f ^ {[ \lambda ]}
 +
$$
  
for some cardinal number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580128.png" />; it suffices to let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580129.png" /> be the cardinality of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580130.png" />. In simple cases, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580131.png" /> has finite height (all strictly increasing sequences are finite) and then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580132.png" /> may be taken to be a natural number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580133.png" />; this then suggests a computable procedure for calculating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580134.png" />: simply do the finite number of calculations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580135.png" />.
+
for some cardinal number $  \lambda $;  
 +
it suffices to let $  \lambda $
 +
be the cardinality of $  2^{L} $.  
 +
In simple cases, $  L $
 +
has finite height (all strictly increasing sequences are finite) and then $  \lambda $
 +
may be taken to be a natural number $  n _{0} $;  
 +
this then suggests a computable procedure for calculating $  { \mathop{\rm lfp}\nolimits} ( f ) $:  
 +
simply do the finite number of calculations $  f ^ {[ 0 ]} \dots f ^ {[ n _{0} ]} = f ^ {n _ 0} ( f ^ {[ 0 ]} ) = { \mathop{\rm lfp}\nolimits} ( f ) $.
  
In general, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580136.png" /> cannot simply be taken to be a natural number and then one has to be content with a computable procedure for finding some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580137.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580138.png" />; for pragmatic reasons <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580139.png" /> should be  "as close to"  <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580140.png" /> as possible. An upper bound operator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580141.png" /> is a not necessarily monotone function such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580142.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580143.png" />. Given a not necessarily increasing sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580144.png" />, one can then define
+
In general, $  \lambda $
 +
cannot simply be taken to be a natural number and then one has to be content with a computable procedure for finding some $  { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) \in L $
 +
such that $  { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) \supset { \mathop{\rm lfp}\nolimits} ( f ) $;  
 +
for pragmatic reasons $  { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) $
 +
should be  "as close to"   $ { \mathop{\rm lfp}\nolimits} ( f ) $
 +
as possible. An upper bound operator $  \nabla : {L \times L} \rightarrow L $
 +
is a not necessarily monotone function such that $  l _{1} \nabla l _{2} \supset l _{1} \bigcup l _{2} $
 +
for all $  l _{1} , l _{2} \in L $.  
 +
Given a not necessarily increasing sequence $  ( l _{n} ) _{n} $,  
 +
one can then define
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580145.png" /></td> </tr></table>
+
$$
 +
l _ 0^ \nabla  = l _{0} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580146.png" /></td> </tr></table>
+
$$
 +
l _{ {n + 1}^ \nabla } = l _ n^ \nabla  \nabla l _{ {n + 1}} ,
 +
$$
  
for all natural numbers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580147.png" />. It is easy to show that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580148.png" /> is in fact an increasing sequence. A widening operator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580149.png" /> is an upper bound operator such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580150.png" /> eventually stabilizes (i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580151.png" />) for all increasing sequences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580152.png" />. (A trivial and uninteresting example of a widening operator is given by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580153.png" />.) One can now define
+
for all natural numbers $  n $.  
 +
It is easy to show that $  ( l _ n^ \nabla  ) _{n} $
 +
is in fact an increasing sequence. A widening operator $  \nabla : {L \times L} \rightarrow L $
 +
is an upper bound operator such that $  ( l _ n^ \nabla  ) _{n} $
 +
eventually stabilizes (i.e. $  \exists n _{0} \forall n \geq n _{0} :l _ n^ \nabla  = l _{ {n _ 0}^ \nabla } $)  
 +
for all increasing sequences $  ( l _{n} ) _{n} $.  
 +
(A trivial and uninteresting example of a widening operator is given by $  l _{1} \nabla l _{2} = \bigcup L $.)  
 +
One can now define
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580154.png" /></td> </tr></table>
+
$$
 +
f _ \nabla  ^ {\left \langle  0 \right \rangle} = \bigcap L \  \textrm{ (the  least  element  of } \  L \textrm{ ) } ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580155.png" /></td> </tr></table>
+
$$
 +
f _ \nabla  ^ {\left \langle  {n + 1} \right \rangle} = \left \{
 +
{ {f _ \nabla  ^ {\left \langle  n \right \rangle} \ \  \textrm{ if } \  f ( f _ \nabla  ^ {\left \langle  n \right \rangle} ) \subset  f _ \nabla  ^ {\left \langle  n \right \rangle} ,} \atop {f _ \nabla  ^ {\left \langle  n \right \rangle}\nabla f ( f _ \nabla  ^ {\left \langle  n \right \rangle} ) \ \  \textrm{ otherwise } ,}} \right .
 +
$$
  
for all natural numbers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580156.png" />. One can show that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580157.png" /> eventually stabilizes and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580158.png" /> equals <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580159.png" /> for some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580160.png" /> and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580161.png" />. Hence, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580162.png" /> and, furthermore, there is a computable procedure for calculating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580163.png" />: simply do the finite number of calculations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110580/a110580164.png" />. Examples of  "good"  widening operators can be found in the literature.
+
for all natural numbers $  n $.  
 +
One can show that $  ( f _ \nabla  ^ {\langle  n \rangle} ) _{n} $
 +
eventually stabilizes and that $  { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) = \bigcup _{n} f _ \nabla  ^ {\langle  n \rangle} $
 +
equals $  f _ \nabla  ^ {\langle  {n _ 0} \rangle} $
 +
for some $  n _{0} $
 +
and that $  { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) \in { \mathop{\rm Red}\nolimits} ( f ) $.  
 +
Hence, $  { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) \supset { \mathop{\rm lfp}\nolimits} ( f ) $
 +
and, furthermore, there is a computable procedure for calculating $  { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) $:  
 +
simply do the finite number of calculations $  f _ \nabla  ^ {\langle  0 \rangle} \dots f _ \nabla  ^ {\langle  {n _ 0} \rangle} = { \mathop{\rm fix}\nolimits} _ \nabla  ( f ) $.  
 +
Examples of  "good"  widening operators can be found in the literature.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  P. Cousot,  "Semantic foundations of program analysis"  S.S. Muchnick (ed.)  N.D. Jones (ed.) , ''Program Flow Analysis: Theory and Applications'' , Prentice-Hall  (1981)  pp. 303–342</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  S. Jagannathan,  S. Weeks,  "A unified treatment of flow analysis in higher-order languages" , ''Proc. POPL '95'' , ACM Press  (1995)  pp. 393–407</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  N.D. Jones,  F. Nielson,  "Abstract interpretation: a semantics-based tool for program analysis"  S. Abramsky (ed.)  D.M. Gabbay (ed.)  T.S.E. Maibaum (ed.) , ''Handbook of Logic in Computer Science'' , '''4''' , Oxford Univ. Press  (1995)  pp. 527–636</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  J.P. Talpin,  P. Jouvelot,  "The type and effect discipline" , ''Information and Computation'' , '''111'''  (1994)</TD></TR></table>
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  P. Cousot,  "Semantic foundations of program analysis"  S.S. Muchnick (ed.)  N.D. Jones (ed.) , ''Program Flow Analysis: Theory and Applications'' , Prentice-Hall  (1981)  pp. 303–342</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  S. Jagannathan,  S. Weeks,  "A unified treatment of flow analysis in higher-order languages" , ''Proc. POPL '95'' , ACM Press  (1995)  pp. 393–407</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  N.D. Jones,  F. Nielson,  "Abstract interpretation: a semantics-based tool for program analysis"  S. Abramsky (ed.)  D.M. Gabbay (ed.)  T.S.E. Maibaum (ed.) , ''Handbook of Logic in Computer Science'' , '''4''' , Oxford Univ. Press  (1995)  pp. 527–636</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  J.P. Talpin,  P. Jouvelot,  "The type and effect discipline" , ''Information and Computation'' , '''111'''  (1994)</TD></TR></table>

Latest revision as of 18:23, 5 February 2020


program analysis

Program analysis offers static (i.e. compile-time) techniques for predicting safe and computable approximations to the set of values or behaviours arising dynamically (i.e. at run-time) when executing a program upon a computer. A main application is to allow compilers to generate more efficient code by avoiding to generate code to handle errors that can be shown not to arise, and to generate more specialized and faster code by taking advantage of the availability of data in registers and caches. Among the newer applications is the validation of software (possibly developed by subcontractors) to reduce the likelihood of malicious behaviour. To illustrate the breadth of the approaches, five types of analyses will be discussed: i) flow-based analysis; ii) inference-based analysis; iii) semantics-based analysis; iv) abstract interpretation; and v) fixed-point approximation techniques. These approaches borrow mathematical techniques from lattice theory and logical systems.

Flow-based approaches to program analysis.

These have mainly been developed for imperative, functional and object-oriented programs. For a given program $ p $ the task is to find a flow function $ F \in { \mathop{\rm Inf}\nolimits} ^ { { \mathop{\rm Pts}\nolimits}} $ that maps the program points in $ p $ to approximate descriptions of the data that may reach those points. This may be formalized by a relation $ \vDash p:F $ defined by a number of clauses depending on the syntax of the program $ p $. As a very simple example, consider the program $ p = p _{1} ;p _{2} $ that takes an input, executes $ p _{1} $ on it, then executes $ p _{2} $ on the result obtained from $ p _{1} $, and finally produces the result from $ p _{2} $

as the overall result. The defining clause for this program may be written:

<tbody> </tbody>
$ \vDash ( p _{1} ;p _{2} ) : F $ if and only if
$ \vDash p _{1} : F \ \land $
$ \vDash p _{2} : F \ \land $
$ F ( { \mathop{\rm end}\nolimits} ( p _{1} ) ) \subseteq F ( { \mathop{\rm begin}\nolimits} ( p _{2} ) ) $

In the presence of recursive programs, the predicate cannot be defined by structural induction on $ p $ and instead one appeals to co-induction; this merely amounts to taking the greatest fixed point of a function mapping sets of pairs $ ( p,F ) $ into sets of pairs $ ( p,F ) $ as described by the right-hand sides of the clauses.

It is essential that the analysis information is semantically correct. Given a semantics $ v _{1} \rightarrow _{p} v _{2} $ for describing when the program $ p $ maps the input $ v _{1} $ to the output $ v _{2} $, this amounts in its simplest form to establishing a subject reduction result:

$$ \vDash p: F \land v _{1} \in F ( { \mathop{\rm begin}\nolimits} ( p ) ) \land v _{1} \rightarrow _{p} v _{2} $$

$$ \Downarrow $$

$$ v _{2} \in F ( { \mathop{\rm end}\nolimits} ( p ) ) $$

Finally, realization of the analysis amounts to finding the minimal $ F $( with respect to the pointwise partial order on $ { \mathop{\rm Inf}\nolimits} ^ { { \mathop{\rm Pts}\nolimits}} $) for which $ \vDash p : F $; this is related to the lattice-theoretic notion of $ \{ F : {\vDash p : F} \} $ being a Moore family. Often, this is done by obtaining a more implementation-oriented specification $ \vdash p : C $ for generating a set of constraints $ C $, deriving a function $ {\widehat{C} } $ mapping flow functions to flow functions, and ensuring that $ {\widehat{C} } ( F ) = F $ if and only if $ \vDash p : F $.

Inference-based approaches to program analysis.

These have mainly been developed for functional, imperative, and concurrent programs. For a given program $ p $, the task is to find an "annotated type" $ q _{1} { \stackrel \sigma \rightarrow } q _{2} $ describing the overall effect of executing $ p $: if $ q _{1} $ describes the input data, then $ q _{2} $ describes the output data, and properties of the dynamic executions (e.g., whether or not errors can occur) are described by $ \sigma $. This may be formalized by a relation $ \vdash p: ( q _{1} { \stackrel \sigma \rightarrow } q _{2} ) $ defined by a number of inference rules (cf. Permissible law (inference)); some of these are defined compositionally in the structure of the program as in:

$$ { \frac{ {\vdash p _{1} : ( q _{0} { \stackrel{ {\sigma _ 1}} \rightarrow } q _{1} )} \quad {\vdash p _{2} : ( q _{1} { \stackrel{ {\sigma _ 2}} \rightarrow } q _{2} )}}{ {\vdash ( p _{1} ; p _{2} ) : q _{0} { \stackrel{ {\sigma _{1} \uplus \sigma _ 2}} \rightarrow } q _ 2}} } $$

Here, $ \sigma _{1} \uplus \sigma _{2} $ combines the effects $ \sigma _{1} $ and $ \sigma _{2} $ into an overall effect; in the simplest cases, $ \sigma _{1} \uplus \sigma _{2} $ simply equals $ \sigma _{1} \cup \sigma _{2} $, but in more complex cases the causal structure (i.e. which effect comes first) must be retained. Successful analysis of recursive programs demands that invariants be guessed in order to obtain a finite inference tree. (This form of program analysis is closely related to program verification: see Theoretical programming, the section on "The logical theory of programs" .)

Semantic correctness necessitates relations $ \vDash v:q $ between values and properties, e.g. $ ( q = \textrm{ integer } ) \Rightarrow ( v \in \{ \dots , - 1, 0,1, \dots \} ) $, and $ \vDash d \sim \sigma $ between the effects of the dynamic semantics $ ( v _{1} {\rightarrow ^ d} _{p} v _{2} ) $ and the analysis, e.g. $ d = \sigma $. The correctness statement then amounts to a subject reduction result:

$$ ( \vDash v _{1} : q _{1} ) \land ( v _{1} {\rightarrow ^ d} _{p} v _{2} ) \land \vdash p : ( q _{1} { \stackrel \sigma \rightarrow } q _{2} ) $$

$$ \Downarrow $$

$$ ( \vDash v _{2} : q _{2} ) \land ( \vDash d \sim \sigma ) $$

Implementation of the analysis amounts to defining an algorithm $ {\mathcal Q} [\![p]\!] = q $ that is syntactically sound, i.e. $ \vdash p : q $, and syntactically complete, i.e. $ \vdash p : ( q _{1} { \stackrel \sigma \rightarrow } q _{2} ) $ implies that $ q _{1} { \stackrel \sigma \rightarrow } q _{2} $ is an "instance" of $ q $; the precise definition of "instance" depends on the nature of the properties and may have some complexity to it. Often, the algorithm works using constraints somewhat similar to those used in the flow-based approach.

Semantics-based approaches to program analysis.

These attempt to define the analysis in much the same way that the semantics itself is defined. These developments have mainly been based on denotational semantics (where $ v _{1} \rightarrow _{p} v _{2} $ is written $ {\mathcal S} [\![p]\!] ( v _{1} ) = v _{2} $) for imperative and functional programs, although work has started on similar developments based on (so-called small-step) structural operational semantics and (so-called big-step) natural semantics (cf. also Theoretical programming). The denotational approach may be formulated as an analysis function $ {\mathcal A} [\![p]\!] ( l _{1} ) = l _{2} $, where $ l _{1} $ is a mathematical entity describing the input to $ p $ and $ l _{2} $ is a mathematical entity describing the output from $ p $. The denotational approach calls for $ {\mathcal A} [\![p]\!] $ to be defined compositionally in terms of the syntactic constituents of the program $ p $; an example of this is the functional composition

$$ {\mathcal A} [\![ {p _{1} ; p _ 2} ]\!] = {\mathcal A} [\![ {p _ 2} ]\!] \circ {\mathcal A} [\![ {p _ 1} ]\!] , $$

i.e., $ {\mathcal A} [\![ {p _{1} ; p _ 2} ]\!] ( l ) = {\mathcal A} [\![ {p _ 2} ]\!] ( {\mathcal A} [\![ {p _ 1} ]\!] ( l ) ) $. In case of recursive programs, a least fixed point is taken of a suitable functional.

To express the correctness of the analysis one needs to define correctness relations $ v R _{t} l $ on the data, e.g. $ v \in l $, and $ f R _{ {t _{1} \rightarrow t _ 2}} h $ on the computations, e.g. $ \forall v, l : ( v R _{ {t _ 1}} l ) \Rightarrow ( f ( v ) R _{ {t _ 2}} h ( l ) ) $; the latter definition is usually called a logical relation. Semantic soundness then amounts to establishing

$$ {\mathcal S} [\![ p]\!] R _{ {t _{1} \rightarrow t _ 2}} {\mathcal A} [\![ p]\!] $$

for all programs $ p $ of type $ t _{1} \rightarrow t _{2} $. Efficient implementation of the analyses often demands that they be presented in a form less dependent on the program syntax, e.g. using graphs or constraints, and then the solution procedure must be shown sound and complete.

Abstract interpretation.

This is a general framework for obtaining cheaper and more approximative solutions to program analysis problems. In its most general form it attempts to be independent of the semantic base, but usually it is formulated in either a structural operational or a denotational setting. Given the semantics $ v _{1} \rightarrow _{p} v _{2} $ of a program $ p $, the first step is often to define the uncomputable collecting semantics $ s _{1} \Rightarrow _{p} s _{2} $. The intention is that

$$ s _{1} \Rightarrow _{p} s _{2} $$

$$ \Updownarrow $$

$$ s _{2} = \left \{ {v _ 2} : {v _{1} \rightarrow _{p} v _{2} \land v _{1} \in s _ 1} \right \} $$

and this may be regarded as a soundness as well as a completeness statement.

To obtain a more approximate and eventually computable analysis, the most commonly used approach is to replace the sets of values $ s \in S $ by some more approximate properties $ l \in L $. Both $ S $ and $ L $ are partially ordered in such a way that one generally wants the least correct element and where the greatest element is uninformative. (By the lattice duality principle, one could instead have desired the greatest correct element, and part of the literature on data flow analysis uses this vocabulary.) The relation between $ S $ and $ L $ is often formalized by a pair $ ( \alpha, \gamma ) $ of functions:

$$ S \mathop{\rightleftarrows} ^ \alpha _ \gamma L, $$

where $ \gamma : L \rightarrow S $ is the concretization function giving the "meaning" of properties in $ L $, and $ \alpha : S \rightarrow L $ is the abstraction function. It is customary to demand that $ S $ and $ L $ are complete lattices (cf. Complete lattice). Furthermore, it is customary to demand that $ ( \alpha, \gamma ) $ is a Galois connection, i.e. $ \alpha $ and $ \gamma $ are monotone and $ \gamma \circ \alpha $ is extensive $ ( \forall s: \gamma ( \alpha ( s ) ) \supset s ) $ and $ \alpha \circ \gamma $ is reductive $ ( \forall l: \alpha ( \gamma ( l ) ) \subset l ) $, or, equivalently, that $ ( \alpha, \gamma ) $ is an adjunction, i.e. $ \forall s, l: \alpha ( s ) \subset l \iff s \subset \gamma ( l ) $. This ensures that a set of values $ s $ always has a best description $ l $; in fact, $ l = \alpha ( s ) $.

Abstract interpretation then amounts to define an analysis $ l _{1} \Rightarrow _ p^ \prime l _{2} $ on the properties of $ L $ such that the following correctness statement holds:

$$ l _{1} \Rightarrow _ p^ \prime l _{2} $$

$$ \Downarrow $$

$$ \exists s : \gamma ( l _{1} ) \Rightarrow _{p} s \land s \subseteq \gamma ( l _{2} ) $$

This may be illustrated by the diagram

Figure: a110580a

It expresses that a certain diagram semi-commutes.

Example analyses that can be obtained in this way include intervals as approximations of sets of integers, polygons in the plane as approximations of sets of pairs of numbers, representations $ ( k _{1} , k _{2} ) $ of arithmetical congruences as approximations of periodic sets $ \{ {k _{1} + n \cdot k _ 2} : {n \in \{ \dots , - 1, 0, 1, \dots \}} \} $ as well as combinations of these.

Fixed-point approximation.

This is essential for program analysis because most of the approaches can be reformulated as finding the least fixed point (cf. also Fixed point) $ { \mathop{\rm lfp}\nolimits} ( f ) $ of a monotone function $ f : L \rightarrow L $ on a complete lattice $ L = ( L, \subset ) $ of properties; often, each element $ l \in L $ describes a set of values. There are two characterizations of the least fixed point that are of relevance here. Tarski's theorem ensures that

$$ { \mathop{\rm lfp}\nolimits} ( f ) = \bigcap { \mathop{\rm Red}\nolimits} ( f ) , $$

$$ { \mathop{\rm Red}\nolimits} ( f ) = \left \{ {l \in L} : {f ( l ) \subset l} \right \} , $$

satisfies $ f ( { \mathop{\rm lfp}\nolimits} ( f ) ) = { \mathop{\rm lfp}\nolimits} ( f ) $ and hence $ { \mathop{\rm lfp}\nolimits} ( f ) \in { \mathop{\rm Red}\nolimits} ( f ) $ is the least fixed point of $ f $. Transfinite induction allows one to define

$$ f ^ {[ \kappa + 1 ]} = f ( f ^ {[ \kappa ]} ) \ \textrm{ for } \ \kappa+1 \ \textrm{ a successor ordinal } , $$

$$ f ^ {[ \kappa ]} = \bigcup _{ {\kappa^ \prime } < \kappa} f ^ {[ \kappa^ \prime ]} \ \textrm{ for } \ \kappa \ \textrm{ a limit ordinal } , $$

and then the least fixed point is given by

$$ { \mathop{\rm lfp}\nolimits} ( f ) = f ^ {[ \lambda ]} $$

for some cardinal number $ \lambda $; it suffices to let $ \lambda $ be the cardinality of $ 2^{L} $. In simple cases, $ L $ has finite height (all strictly increasing sequences are finite) and then $ \lambda $ may be taken to be a natural number $ n _{0} $; this then suggests a computable procedure for calculating $ { \mathop{\rm lfp}\nolimits} ( f ) $: simply do the finite number of calculations $ f ^ {[ 0 ]} \dots f ^ {[ n _{0} ]} = f ^ {n _ 0} ( f ^ {[ 0 ]} ) = { \mathop{\rm lfp}\nolimits} ( f ) $.

In general, $ \lambda $ cannot simply be taken to be a natural number and then one has to be content with a computable procedure for finding some $ { \mathop{\rm fix}\nolimits} _ \nabla ( f ) \in L $ such that $ { \mathop{\rm fix}\nolimits} _ \nabla ( f ) \supset { \mathop{\rm lfp}\nolimits} ( f ) $; for pragmatic reasons $ { \mathop{\rm fix}\nolimits} _ \nabla ( f ) $ should be "as close to" $ { \mathop{\rm lfp}\nolimits} ( f ) $ as possible. An upper bound operator $ \nabla : {L \times L} \rightarrow L $ is a not necessarily monotone function such that $ l _{1} \nabla l _{2} \supset l _{1} \bigcup l _{2} $ for all $ l _{1} , l _{2} \in L $. Given a not necessarily increasing sequence $ ( l _{n} ) _{n} $, one can then define

$$ l _ 0^ \nabla = l _{0} , $$

$$ l _{ {n + 1}^ \nabla } = l _ n^ \nabla \nabla l _{ {n + 1}} , $$

for all natural numbers $ n $. It is easy to show that $ ( l _ n^ \nabla ) _{n} $ is in fact an increasing sequence. A widening operator $ \nabla : {L \times L} \rightarrow L $ is an upper bound operator such that $ ( l _ n^ \nabla ) _{n} $ eventually stabilizes (i.e. $ \exists n _{0} \forall n \geq n _{0} :l _ n^ \nabla = l _{ {n _ 0}^ \nabla } $) for all increasing sequences $ ( l _{n} ) _{n} $. (A trivial and uninteresting example of a widening operator is given by $ l _{1} \nabla l _{2} = \bigcup L $.) One can now define

$$ f _ \nabla ^ {\left \langle 0 \right \rangle} = \bigcap L \ \textrm{ (the least element of } \ L \textrm{ ) } , $$

$$ f _ \nabla ^ {\left \langle {n + 1} \right \rangle} = \left \{ { {f _ \nabla ^ {\left \langle n \right \rangle} \ \ \textrm{ if } \ f ( f _ \nabla ^ {\left \langle n \right \rangle} ) \subset f _ \nabla ^ {\left \langle n \right \rangle} ,} \atop {f _ \nabla ^ {\left \langle n \right \rangle}\nabla f ( f _ \nabla ^ {\left \langle n \right \rangle} ) \ \ \textrm{ otherwise } ,}} \right . $$

for all natural numbers $ n $. One can show that $ ( f _ \nabla ^ {\langle n \rangle} ) _{n} $ eventually stabilizes and that $ { \mathop{\rm fix}\nolimits} _ \nabla ( f ) = \bigcup _{n} f _ \nabla ^ {\langle n \rangle} $ equals $ f _ \nabla ^ {\langle {n _ 0} \rangle} $ for some $ n _{0} $ and that $ { \mathop{\rm fix}\nolimits} _ \nabla ( f ) \in { \mathop{\rm Red}\nolimits} ( f ) $. Hence, $ { \mathop{\rm fix}\nolimits} _ \nabla ( f ) \supset { \mathop{\rm lfp}\nolimits} ( f ) $ and, furthermore, there is a computable procedure for calculating $ { \mathop{\rm fix}\nolimits} _ \nabla ( f ) $: simply do the finite number of calculations $ f _ \nabla ^ {\langle 0 \rangle} \dots f _ \nabla ^ {\langle {n _ 0} \rangle} = { \mathop{\rm fix}\nolimits} _ \nabla ( f ) $. Examples of "good" widening operators can be found in the literature.

References

[a1] P. Cousot, "Semantic foundations of program analysis" S.S. Muchnick (ed.) N.D. Jones (ed.) , Program Flow Analysis: Theory and Applications , Prentice-Hall (1981) pp. 303–342
[a2] S. Jagannathan, S. Weeks, "A unified treatment of flow analysis in higher-order languages" , Proc. POPL '95 , ACM Press (1995) pp. 393–407
[a3] N.D. Jones, F. Nielson, "Abstract interpretation: a semantics-based tool for program analysis" S. Abramsky (ed.) D.M. Gabbay (ed.) T.S.E. Maibaum (ed.) , Handbook of Logic in Computer Science , 4 , Oxford Univ. Press (1995) pp. 527–636
[a4] J.P. Talpin, P. Jouvelot, "The type and effect discipline" , Information and Computation , 111 (1994)
How to Cite This Entry:
Analysis of programs. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Analysis_of_programs&oldid=44381
This article was adapted from an original article by H.R. NielsonF. Nielson (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article