Intelligent Java build  

Notes by Niels Jørgensen.

A Java build tool

A Java build tool shall do safe, incremental re-builds of Java programs. "Incremental" in the sense of re-using build-results created previously, and "safe" in the sense of producing a build-result equivalent with the result of a complete build-from-scratch. The tool shall be a supplement to the widely used make program which reads rules written in makefiles and invokes re-compilation based on the time stamps which show when a file was last written to.

This text defines terminology, requirements, and rebuild rules expressed in a high-level manner as well as in makefile syntax.

Motivation
The fact that there is not already an appropriate Java build tool available may suggest that such a tool is not needed.

The prevailing view in the community associated with jikes (see below) seems to be that a large Java program is best compiled by passing all files directly to the jikes compiler, rather than using make to control the compilation. This relies on using the built-in features of jikes for incremental compilation to avoid redundant re-builds. This approach also avoids the overhead associated with make-based recompilation which is due to repeated compiler-invokations, each invokation serving to compile only a single file or a small set of files. Another reason that make is less suited for Java compilation is that the Java language does not allow for splitting source files into implementation and interface (header) files as in C/C++. With C/C++, make can be used as follows: Recompile a file either if the file itself changes, or an interface file on which it depends changes - but not merely if there is a change in the implementation of what is declared in the interface. In contrast, with Java the coarse-grained approach of make, which is based entirely on comparing the time-last-modified stamps on files, implies that certain files are redundantly recompiled - because a dependent-on file, say a file that defines a super-class, has been changed even though the change involves only implementation of a method and not the method's signature. (Btw., is this related to green vs. red builds?)

Make-based incremental recompilation has a potential for being much faster than ordinary, brute-force Java compilation. Make-based incremental recompilation, will, however be slower than what can be attained by using a built-in feature such as that of the jikes compiler, since this built-in feature can distinguish between changes in implementation vs. changes in interface/signature.

While this improvement is lost by placing the "recompilation intelligence" in a separate tool, there are important advantages to be attained by modularization, i.e., by separating the build-feature from the compiler. These advantages include:

  • Make-based compilation of a Java project is useful if there are chains of dependencies due to compilation in multiple steps, say by using the Compiler-Compiler JavaCC, analogously to the use of yacc with C/C++.
  • Separation from the compiler is also useful if one wants to associate additional commands to the command that invokes compilation of a file, say in a build system where one wants to write configuration information to a log file.
  • Finally, make-based compilation provides the flexibility that allows for tayloring by hand the dependencies that guide the compilation, say to accomodate dynamic dependencies (cf. the method forname() of Java's class java.lang.Class).
Related work
Some make tools have been developed by the Java community:
  • The open source Java dependency analyzer JavaDeps developed by Steven Robbins outputs makefile rules that are similar to those described here. The present work complements JavaDeps by describing and motivating these rules. JavaDeps does not support inner classes. (See example.) JavaDeps also leads to redundant compilation if a single file defines two classes, A and B, where A depends on B, and A is defined before B; the redundancy is due to the rule "A.class: B.class", which may fire because the first compilation creates A.class just before B.class.
  • Jikes, an open source Java compiler currently developed at IBM, has several features that support make-style re-building:
    • The ability to do so-called full dependency checking (option +F). This checks dependencies recursively, starting from the source file(s) passed to an invokation of the compiler. There appears to be no documentation available as to how this feature is specified, but it does appear to safe. (See the jikes documentation)).
    • The full dependency checking can be utilized with the compiler running in a so-called incremental mode (option ++). This keeps the compiler running online where it can be requested to do a "make all".
    • The ability to output the source dependencies recognized by the compiler (options +M and +DR=File). This would be useful as input for a Java build tool as described in these notes. Specifically, option +M creates a file MYFILE.u for each java source file MYFILE.java compiled, and lists therein all the files that MYFILE.class depends on, including indirect dependencies and excluding system classes. Option +DR=FILE creates the file and lists therein, for each class file created by an invokation of jikes, the files that the class file depends on, excluding indirect dependencies and including system classes. Neither option lists information about inner classes.
  • The open source mmake tool written by Jan-Henrik Haukeland and distributed with Debian Linux. This tool generates makefiles for Java automatically. It can read files output by the jikes compiler (.u files), cf. option +M described above. However, since neither jikes nor mmake does any analysis of the dependencies, the case of circular dependencies (see below) will not be treated correctly. For example, if gnumake (gmake) is used as the underlying make program, and detects a circular dependency in a makefile generated by mmake, it will remove the circularity by disregarding one of the dependencies.
For C/C++ a number of tools for use with make are available, including
  • The unix utility program makedepend and option +M of the gcc gnu C/C++ compiler parses the implementation source files passed to them, detects their dependency on header files (as indicated by the presence of #include directives) and writes makefile rules that expresses these dependencies.
  • The unix utility program mkmf has the ability to generate full makefiles including rules for building archives and programs from object files.
The tools for C/C++ set a basic standard for terminology and show that a de facto standard for writing makefiles can be established.

The new problems posed by Java are: a more difficult problem of circular dependency (more difficult because Java does not have the distinction between implementation/header files) and new language constructs such as inner classes.

The build-from-scratch
A notion of a "brute-force" build or build-from-scratch is necessary as a starting point to
  • guide the design of a safe incremental build
  • establish a reference against which to prove safeness of the incremental build (this text does not contain such a proof)
An invokation of a Java compiler takes Java source code as input in the form of .java files, and produces so-called bytecode as output in the form of .class files. In a further step, the class files may be stored in a Java archive (.jar) file or in a zip-file, which is a simple step not considered here.

A single .java file is called a compilation unit and is the smallest unit that can be passed to a Java compiler. A .java file may define one or more classes and one or more interfaces, possibly nested within classes, and for each class or interface a separate .class file is produced as output.

By the definition below of the reference Java build, circularly dependent source files are passed to the same compiler invokation. This is based on the following definition of dependency.

Dependency and the dependency graph
Dependency is defined as a relation between classes. In mathematical terms the dependency relation is a subset of the set of all pairs of all classes.
X depends on Y if the source for X refers to an entity defined in Y.java
X depends on X (reflexivity)
For example, given three Java classes that define a LIFO (Last-In, First-Out) data structure, Element.java, Stack.java, and LifoApp.java, the dependency is as follows:

Dependency Element Stack LifoApp
Element + + -
Stack + + -
LifoApp + + +

-where the first row says that Element depends on Element and Stack, and not on LifoApp.

(Alternatively, dependency can be defined as a relation between .java files. Defining the relation in a single domain, rather than involving both .java and .class files, keeps the treatment of circular dependency simpler).

The nodes of the dependency graph are the classes. There is an edge from class Y to class X if X depends on Y.

Paths are defined in the usual way as sequences of connected edges. Two nodes are circularly dependent if there is a path from one to the other and vice versa.

Example. Element and Stack are circularly dependent.

Compilation components
A compilation component is a strongly connected component of the dependency graph.

The graph theoretical notion of strongly connected components can be explained as follows: A strongly connected component is a set of nodes that are circularly dependent. Strongly connected components are maximal, which means that there is no node Z outside a component such that there are paths in both directions between Z and a node of the component. It is always the case that each node belongs to exactly one component.

Example. In the LIFO application there are two strongly connected components:

  • [Stack and Element]
  • [LifoApp]

A Java build-from-scratch is defined as follows:

  • A build step is the invokation of the Java compiler where the input files are all the source files of some compilation component.
  • A brute-force Java build is a topologically ordered sequence of build steps, comprising all compilation components.
Topological ordering can be explained as follows. Let the notion of derived graph be defined as follows: The nodes of the derived graph are the strongly connected components of the dependency graph. In the derived graph there is an edge from component A to component B if for some class X of A there is an edge in the dependency graph from X to a class Y in B. By definition of strongly connected components, such a graph can have no circles, and so it allows for a topological ordering. This means they can be ordered so that if node A appears before node B, then there is no path from B to A.

Passing all source files of a component to the same compiler invokation is already required by Sun's JDK. (Failing to do so entails an error message, except in the case where the compiler can find the bytecode files, in which case they must have been produced previously in a common invokation anyway).

Compiling in topological order means that dependent-on components are always compiled before depending-on components.

Example.The strongly connected components of the LIFO application can be topologically sorted in exactly one way, namely:

[Stack and Element], [LifoApp]

Therefore in a brute-force build they shall be compiled in the following sequence (javac is used to denote some Java compiler, not necessaryly JDK's).

  • javac Stack.java Element.java
  • javac LifoApp.java
A safe incremental build
An incremental build performs a subset of all the steps of a brute-force build. Requirements and design is defined in the following sections.
Requirements
The Java build tool shall take a list of source files as input, and produce a makefile as output. The makefile shall define the following targets:
  • make depend re-creates the makefile, by invoking the Java build tool.
  • make all brings all class files up-to-date (equivalently: make, by the convention that all is the default target).
  • make X.class brings file X.class up-to-date.
The semantics of the build make targets shall be as follows.
  • make X.class
    • compiles Y.java for each class Y on a path in the dependency graph from an edited file to X.
    • these files are compiled component-wise and in topological order
  • make all - has the same effect as make X.class Y.class .. etc. as defined above. The effect is that a class is compiled if there is a path from an edited class to the class.
Thus, whenever a file is re-compiled, so are all the other files of the particular component.

Additional features of a Java build tool would be:

  • The build tool creates a makefile which is able to detect automatically whether a re-creation of the makefile is required.
  • The time spent parsing Java source files is reduced by passing to the build tool the source files known to have changed; based on this information and the dependencies recorded in the makefile the build tool infers for which files the dependencies cannot have changed, and skips parsing these files.
Rebuild rules expressed in "high-level" or human language
The semantics outlined above can be expressed in the following way in terms of recursive rules. These rules are, in turn, easy to implement as makefile rules.

For each compilation component there are the following rules:

  • One edit rule: A rule saying that all classes of the component shall be rebuilt if one or more of the source files of the classes have been edited.
  • Zero, one, or more recursive rules: For each parent of the component, a rule saying that if the parent has been re-built, the component itself shall be re-built.
Example.The LIFO application is captured by the following three rules:
  • Component [Element and Stack]:
    • Edit rule: Rebuild if source file Element.java and/or Stack.java is edited.
  • Component [LifoApp]:
    • Edit rule: Rebuild if source file LifoApp.java is edited.
    • Recursive rule: Rebuild if component [Element and Stack] is rebuilt.
Implementation as makefile rules
The Java build tool shall auto-generate the following makefile rules:
Stack.class Element.class : Stack.java Element.java
javac Stack.java Element.java
LifoApp.class : LifoApp.java
javac LifoApp.java
LifoApp.class : Stack.class Element.class