Intelligent Java build
Notes by Niels Jørgensen.
A Java build tool
A Java build tool shall do safe, incremental re-builds of Java programs.
"Incremental" in the sense of re-using build-results created previously,
and "safe" in the sense of producing a
build-result equivalent with the result of a complete build-from-scratch.
The tool shall be a supplement to the widely used make program
which reads rules written in makefiles
and invokes re-compilation based on the time stamps
which show when a file was last written to.
This text defines terminology, requirements,
and rebuild rules expressed in a high-level manner as well
as in makefile syntax.
Motivation
The fact that there is not already an appropriate
Java build tool available may suggest that
such a tool is not needed.
The prevailing view in the community associated with
jikes (see below)
seems to be that a large Java program is best compiled
by passing all files directly to the jikes compiler,
rather than using make to control the compilation.
This relies on using the built-in features of jikes for incremental
compilation to avoid redundant re-builds.
This approach also avoids the overhead associated
with make-based recompilation which is due to
repeated compiler-invokations,
each invokation serving to compile only a single file
or a small set of files.
Another reason that make is less
suited for Java compilation is that
the Java language does not
allow for splitting source files into
implementation and interface (header) files
as in C/C++.
With C/C++, make can be used as follows:
Recompile a file either if the file itself changes,
or an interface file on which it depends changes -
but not merely if there is a change in the
implementation of what is declared in the interface.
In contrast, with Java the coarse-grained approach
of make,
which is based entirely on comparing
the time-last-modified stamps on files,
implies that certain files are redundantly recompiled
-
because a dependent-on file, say a file that defines a super-class,
has been changed even though the change
involves only implementation of a method and not the method's signature.
(Btw., is this related to green vs. red builds?)
Make-based incremental recompilation
has a potential for being much faster
than ordinary, brute-force Java compilation.
Make-based incremental recompilation, will,
however be slower than what can be attained
by using a built-in feature such as that
of the jikes compiler,
since this built-in feature
can distinguish between changes in
implementation vs. changes in interface/signature.
While this improvement is lost
by placing the "recompilation intelligence"
in a separate tool,
there are important advantages
to be attained by modularization,
i.e., by separating the build-feature
from the compiler.
These advantages include:
-
Make-based compilation of a Java project is useful
if there are chains of dependencies due to
compilation in multiple steps,
say by using the Compiler-Compiler JavaCC,
analogously to the use of yacc with C/C++.
-
Separation from the compiler is also useful if
one wants to
associate additional commands to the command
that invokes compilation of a file,
say in a build system where one wants to
write configuration information to a log file.
-
Finally, make-based compilation
provides the flexibility that
allows for tayloring by hand the dependencies
that guide the compilation,
say to accomodate dynamic dependencies
(cf. the method forname() of Java's class
java.lang.Class).
Related work
Some make tools have been developed by
the Java community:
-
The open source Java dependency analyzer
JavaDeps
developed by Steven Robbins
outputs
makefile rules that are similar to those described here. The present work
complements JavaDeps by describing and motivating these rules.
JavaDeps does not support inner classes.
(See example.)
JavaDeps also leads to redundant compilation if
a single file defines two classes, A and B,
where A depends on B, and A is defined before B;
the redundancy is due to the rule "A.class: B.class",
which may fire because the first compilation
creates A.class just before B.class.
- Jikes,
an open source Java compiler currently developed at IBM,
has several features that support make-style re-building:
-
The ability to do so-called full dependency checking (option +F).
This checks dependencies recursively, starting from the
source file(s) passed to an invokation of the compiler.
There appears to be no documentation available
as to how this feature is specified,
but it does appear to safe.
(See the
jikes documentation)).
-
The full dependency checking can be utilized with the compiler running in a so-called
incremental mode (option ++). This keeps the compiler running online where
it can be requested to do
a "make all".
-
The
ability to output the source dependencies recognized by the compiler
(options +M and +DR=File).
This would be useful
as input for a
Java build tool
as described in these notes.
Specifically, option +M creates a file
MYFILE.u for each java source file MYFILE.java
compiled, and lists therein all the files that
MYFILE.class depends on,
including indirect dependencies and excluding system classes.
Option +DR=FILE creates the file
and lists therein, for each class file created by an
invokation of jikes, the files that the class file depends on,
excluding indirect dependencies and including system classes.
Neither option lists information about inner classes.
-
The
open source
mmake
tool written by Jan-Henrik Haukeland and distributed with Debian Linux.
This tool generates makefiles for Java
automatically. It can read files output
by the jikes compiler (.u files),
cf. option +M described above.
However, since neither jikes nor mmake
does any analysis of the dependencies,
the case of
circular dependencies (see below) will not be treated correctly.
For example, if gnumake (gmake) is used as the underlying make program,
and detects a circular dependency in a makefile generated by mmake,
it will
remove the circularity
by disregarding one of the dependencies.
For C/C++ a number of tools for use with make are available, including
-
The unix utility program makedepend and option +M of the gcc gnu C/C++ compiler
parses the implementation source files passed to them,
detects
their dependency on header files (as indicated by the presence of #include directives)
and writes makefile rules that expresses these dependencies.
-
The unix utility program mkmf has the ability to generate full makefiles including rules
for building archives and programs from object files.
The tools for C/C++ set a basic standard for terminology and show that a
de facto standard for writing makefiles can be established.
The new problems posed by Java are: a more difficult problem of circular
dependency (more difficult because Java does not have the distinction
between implementation/header files) and new language constructs such
as inner classes.
The build-from-scratch
A notion of a "brute-force" build or build-from-scratch is necessary
as a starting point to
-
guide the design of a safe incremental build
-
establish a reference against which to
prove safeness of the incremental build
(this text does not contain such a proof)
An invokation of a Java compiler takes Java source code as input in the form of
.java files, and produces so-called bytecode as output in the form of
.class files. In a further step, the class files may be
stored in a Java archive (.jar) file or in a zip-file,
which is a simple step not considered here.
A single .java file is called a compilation unit and is the smallest unit
that can be passed to a Java compiler. A .java file may define one or
more classes and one or more interfaces, possibly nested within classes,
and for each class or interface
a separate .class file is produced as output.
By the definition below of the reference Java build,
circularly dependent source files
are passed
to the same compiler invokation.
This is based on the following definition of dependency.
Dependency and the dependency graph
Dependency is defined as a relation between classes.
In mathematical terms the dependency relation is a subset of the set
of all pairs of all classes.
X
|
depends on
|
Y
|
if the source for X refers to an entity defined in Y.java
|
X
|
depends on
|
X
|
(reflexivity)
|
For example, given three Java classes that define a LIFO (Last-In, First-Out) data structure,
Element.java,
Stack.java, and
LifoApp.java,
the dependency is as follows:
Dependency
|
Element
|
Stack
|
LifoApp
|
Element
|
+
|
+
|
-
|
Stack
|
+
|
+
|
-
|
LifoApp
|
+
|
+
|
+
|
-where the first row says that Element depends on Element and Stack,
and not on LifoApp.
(Alternatively, dependency can be defined as a relation between
.java files. Defining the relation in a single domain,
rather than involving both .java and .class files,
keeps the treatment of circular dependency simpler).
The nodes of the dependency graph are the classes.
There is an edge from class Y to class X if X depends on Y.
Paths are defined in the usual way as sequences of connected edges.
Two nodes are circularly dependent if there is a path from
one to the other and vice versa.
Example.
Element and Stack are circularly dependent.
Compilation components
A compilation component is a strongly connected component of the
dependency graph.
The graph theoretical notion of strongly connected components
can be explained as follows:
A strongly connected component
is a set of nodes that are
circularly dependent.
Strongly connected components are maximal,
which means that
there is no node Z outside a component such that there are paths
in both directions between Z
and a node
of the component.
It is always the case that each node belongs to exactly one component.
Example.
In the LIFO application there are two strongly connected components:
-
[Stack and Element]
-
[LifoApp]
A Java build-from-scratch is defined as follows:
-
A build step is the invokation of the Java compiler
where the input files are all the source files
of some compilation component.
-
A brute-force Java build is a
topologically ordered sequence
of build steps,
comprising all compilation components.
Topological ordering can be explained as follows.
Let the notion of derived graph be defined as follows:
The nodes of the derived graph are the
strongly connected components of the dependency graph.
In the derived graph there is an edge from component A to
component B if for some class X of A there is an edge in the dependency graph
from X to a class Y in B.
By definition of strongly connected components,
such a graph can have no circles, and so it
allows for a topological ordering. This means they can be ordered so
that if node A appears before node B, then there is no
path from B to A.
Passing all source files of a component
to the same compiler invokation is
already required by Sun's JDK.
(Failing to do so entails an
error message,
except in the case where
the compiler can find the bytecode files,
in which case they must have been produced previously
in a common invokation anyway).
Compiling in
topological order means that dependent-on components
are always compiled
before depending-on components.
Example.The strongly connected components of the LIFO application
can be topologically sorted in exactly one way, namely:
[Stack and Element], [LifoApp]
Therefore in a brute-force build they shall be compiled
in the following sequence
(javac is used to denote some Java compiler,
not necessaryly JDK's).
-
javac Stack.java Element.java
-
javac LifoApp.java
A safe incremental build
An incremental build performs a subset of all the steps
of a brute-force build.
Requirements and design is defined in the following sections.
Requirements
The Java build tool
shall take a list of source
files as input,
and produce a makefile as output.
The makefile shall
define the following targets:
-
make depend
re-creates the makefile,
by invoking the Java build tool.
-
make all
brings all class files up-to-date
(equivalently: make, by the convention that all is the default target).
-
make X.class
brings file X.class up-to-date.
The semantics of the build make targets shall be as follows.
-
make X.class
-
compiles Y.java
for each class Y on a path in the dependency graph
from an edited file
to X.
-
these files are compiled
component-wise and in
topological order
-
make all
- has the same effect as make X.class Y.class .. etc. as defined above. The effect is that
a class is compiled if there is a path from an edited class to the class.
Thus, whenever a file is re-compiled,
so are all the other files of the particular component.
Additional features of a Java build tool
would be:
-
The build tool creates a makefile
which is able to detect automatically
whether a re-creation of the makefile is required.
-
The time spent parsing Java source files
is reduced by
passing to the build tool
the source files known to have changed;
based on this information and
the dependencies recorded in the makefile
the build tool
infers for which files
the dependencies cannot
have changed,
and skips parsing these files.
Rebuild rules expressed in "high-level" or human language
The semantics outlined above can be expressed in the following way in terms
of recursive rules. These rules are, in turn, easy to implement
as makefile rules.
For each compilation component there are the following rules:
-
One
edit
rule:
A rule saying that all classes of the component shall be
rebuilt if one or more of the source files
of the classes have been edited.
-
Zero, one, or more
recursive
rules:
For each parent of the component, a rule saying that
if the parent has been re-built, the component itself shall be re-built.
Example.The LIFO application is captured by the following three rules:
-
Component [Element and Stack]:
-
Edit rule: Rebuild if source file Element.java and/or Stack.java is edited.
-
Component [LifoApp]:
-
Edit rule: Rebuild if source file LifoApp.java is edited.
-
Recursive rule: Rebuild if component [Element and Stack] is rebuilt.
Implementation as makefile rules
The Java build tool shall auto-generate the following makefile rules:
Stack.class Element.class :
|
Stack.java Element.java
|
|
javac Stack.java Element.java
|
LifoApp.class :
|
LifoApp.java
|
|
javac LifoApp.java
|
LifoApp.class :
|
Stack.class Element.class
|
|