SAFT: Static Analysis with Finite Tree Automata

The SAFT project is a research project (1.1.2007 - 31.12 2010) funded by the Danish Natural Science Research Council.

Project objectives

Finite Tree Automata (FTAs) are mathematical "machines" that define sets of tree-structured terms; they provide a foundation for describing a variety of computational structures such as data, computation states and computation traces. The field of finite tree automata attracts growing attention from researchers in automatic program analysis and verification; there have been promising applications using FTAs in program specialisation, data flow analysis for a variety of languages, shape analysis of pointer-based data structures, polymorphic type inference, binding time analysis, termination analysis, infinite state model checking and cryptographic protocol analysis. The study of FTAs originated in formal language theory and logic, but unlike string automata, which have been widely applied in computer science, especially in compiler theory, tree automata have not yet entered the mainstream of computing theory.

Automatic program analysis methods (in contrast to traditional formal verification of programs) rely on finitely expressible, safe approximations of computations. Thus the study of automatic program analysis is driven to a large degree by the trade-offs between precision of the approximations and the complexity of the analysis algorithms; the more expressive and precise the approximation, the more complex the analysis tends to be. The interest of FTAs lies firstly in their expressiveness; the set of of sets of terms definable by FTAs is surprisingly rich and many computation structures and properties are naturally expressible as FTAs. Algorithms for manipulating FTAs are well-known and extensions of tree automata with various features such as constraints have been also studied.

The main question for investigation is how to exploit this expressiveness and incorporate systematically the known theory of tree automata into practical program analysis algorithms. Two main formal areas of computing underlie the current project proposal; tree automata and abstract interpretation. Abstract interpretation is a theory of abstract program semantics, using Galois connections to relate the concrete and abstract semantics and it provides a general mathematical framework for designing program analysis tools. The first goal of the project is to define a systematic approach to defining abstract interpretations based on tree automata extending previous work by Cousot and Cousot and others. This work will lay out the general algebraic structure and algorithmic properties of various kinds of FTA-based abstract domains and explore their relations to other static analyses, model checking and type systems. The extension of FTA-based abstract domains with constraints, in particular linear arithmetic constraints, will also be studied.

The following more algorithmic topics will also be investigated: improvement of the efficiency of FTA-based analyses by exploiting operations on FTAs such as determinisation and minimisation; optimisations of fixpoint computations (which are fundamental in program analysis) exploiting operations of FTAs such as checking inclusion of one FTA in another; development of new representations of tree automata more suited to large-scale program analyses, such as representations based on ROBDDs (reduced ordered binary decision diagrams) and use of high-performance Boolean satisfiability (SAT) algorithms to perform operations on FTAs; and the use of widening techniques for trading precision for efficiency in abstract interpretations.