Henning Christiansen
Research group PLIS: Programming, Logic and Intelligent Systems
Department of People and Technology,
Roskilde University, Denmark
© Henning Christiansen 2002, 2010, 2011, 2016
Version 1.2 adapted to changes in SWI Prolog 7; expected to run with earlier versions of SWI.
It has not been tested with SICStus revently.
Version 1.1, small updates July 19, 2011.
Version 1.0, released December 3, 2010.
This document explains how to use CHRG which is a grammar notation implemented on top of the language of Constraint Handling Rules, CHR. You need to have a recent version of SICSTUS Prolog 4 or SWI Prolog to run the CHRG system.
To read this users' guide, you need to be familiar with Prolog and some knowledge of CHR may also be useful.
Please report any bugs, unclear points, etc. to the author.
CHRG has become a surprisingly powerful grammar formalism and language analysis system with the following main features. CHRGs work bottom-up and present the following features:
The following assumes that you have installed and have started a recent version of SICSTUS Prolog 4 or SWI Prolog.
You will need to compile the CHRG source files to use the CHRG system. The simplest way to do this is to do to do that from your own source file, as shown in the following file included with the system with the name simpleCHRG.
:- compile(chrg). :- chrg_symbols np/0, verb/0, sentence/0. np, verb, np ::> sentence. [peter] ::> np. [mary] ::> np. [likes] ::> verb. end_of_CHRG_source. |
Assuming that you have a running Prolog session, you can load this file as follows.
?- compile(simpleCHRG). |
When the file is loaded, these declarations and rules are first translated into CHR dittos and in turn compiled as any other file containing a mix of Prolog and CHR. (Notice that the file chrg loads the CHR library and the libraries for lists and terms.)
You can also load CHRG from the Prolog prompt as follows, if you prefer, before loading your own CHRG files.
?- compile(chrg). |
The source file shown above declares three grammar symbols, for example, np/0 declares a grammar symbol of arity 0 that may be used in the rules that follows. Grammar symbols may have attributes, therefore the arity; the full syntax and more examples is given later.
The first rule says that a sequence np,verb,np is understood as a sentence. Operationally, it means that if a sequence of adjacent np,verb,np is observed within the underlying constraint store, then a representation of sentence is added. The three subsequent rules apply a notation for terminal symbols using square brackets similarly as in DCG. The lefthand side - referred to as the head of the rule - may consist of arbitrary sequences of terminal and nonterminal symbols. Notice that the purposes of lefthand and righthand sides of a rule are opposite to DCG and most other grammar notations; this convention is chosen as to keep syntax analogous to CHR and also to indicate the bottom-up nature of CHRG. In order to execute the parser, call the predicate parse with a list of Prolog atoms in the following way.
?- parse([peter,likes,mary]). |
A listing of the text is printed out with a numbering of word boundaries (optional; can be switched off).
<0> peter <1> likes <2> mary <3> |
The result from the parsing process is given as a set of constraints.
all(0,3), np(0,1), verb(1,2), np(2,3), sentence(0,3), token(0,1,peter), token(1,2,likes), token(2,3,mary) ? |
The two numerical arguments of each constraint refer to word boundaries, and the constraint sentence(0,3) should be understood as "A sentence has been recognized between word boundaries 0 and 3". Token constraints such as token(0,1,peter) represent terminal symbols; the square brackets around terminal symbols in the CHRG rules are nothing but syntactic sugar for token symbols.
The all(0,3) is not interesting in itself; it holds the start and end marker for the entire string and is used for implementing the special all grammar symbol (section 3.3) and the accept and acceptc predicates (section 5).
It may be instructive to take a look at the CHR declarations plus rules produced temporarily by CHRG when a source file is read in:
:- chr_constraint np/2, verb/2, sentence/2, token/3. np(N0,N1), verb(N1,N2), np(N2,N3) ==> sentence(N0,N3). token(peter,N0,N1) ==> np(N0,N1). token(mary, N0,N1) ==> np(N0,N1). token(likes,N0,N1) ==> verb(N0,N1). |
So what the CHRG system basically does is to add two extra arguments to grammar symbols and replace the arrow symbol with its CHR equivalent. You can have the system to print out these gererated rules during compilation; see section 4.
We describe syntax by means of BNF-like notation and occationally by example. The semantics is described informally and in terms of examples; a formal (and less redable) description can be found in the scientific paper referred to in the beginning of this document.
Typewriter font is used for text to be literally part of a source text whereas italic is used for parts to be replaced by actual text defined elsewhere, i.e., the two fonts characterize terminals, resp., nonterminals in the (meta-) grammar for CHR grammars. Square brackets in roman font [ ] (hopefully distinguishable from square brackets in typewriter font [ ]) denotes optional component.
3.1 Syntax of CHRG source files
3.2 Syntax of CHRG rules
3.3 Semantics of CHRG rules
:- chrg_symbols spec, .., spec.
[:- chrg_abducibles spec, .., spec.] rule . .. rule . end_of_CHRG_source. |
Each spec is of the form Prolog-atom/arity similarly to specs in Prolog or CHR; notice that the mode and type information is not available for declarations of chr symbols and abducibles. The declarations of chrg_symbols and chrg_abducibles define the vocabulary to be used in CHRG rules; recall that you can intermix with arbitrary CHR and Prolog declarations and definitions. You may also write rules in CHR about declared abducibles and even about declared grammar symbols, when you have found out how they are translated into CHR constraints.
For enhanced readability, the CHRG syntax includes but singular and plural synonymes for these declarations, i.e., :-chrg_symbol, :-chrg_symbols, :-chrg_abducible, :-chrg_abducibles and even :-chr_constraint, :-chr_constraints.
When CHRG rules are read in, the translation into CHR rules pays different attention to grammar symbols and the other two categories. Grammar symbols are extended with two extra arguments referring to word boundaries and the other two are applied as they are. Constraints behave as in CHR and you can also write CHR rules about them (and you can even use grammar symbols in CHR rule provided that you add manually two extra arguments). Abducibles behave basically like constraints except that a few extra facilities are added; this is explained in section 6.
As indicated, only the grammar_symbols declaration is required, and the three different sorts of declarations can be given in any order.
The following grammar symbols are implicitly declared by the system and must not appear in declarations provided by the user.
Notice that the The purpose of the end marker end_of_CHRG_source is essential; if you leave it out, the system may not observe this and the compilation of your source file will be a strange half-baked thing.
To list of contents To start of section 3
[ left-context -\ ] core-of-head [ /- right-context ] arrow [ guard | ] body |
All that comes before the arrow is called the head of the rule. A guard left out is considered equivalent to true. The guard may be any test that satisfied the requirements as a gourad in a CHR rule; see documentation for SICSTUS Prolog 4 or SWI Prolog.
The core of the head is defined by the following grammar, where grammar-symbol and constraint refer to terms built from declared symbols and with a number of attributes as indicated by the declaration (the roman font vertical bar | denotes alternatives):
core::= grammar-symbol | core:(Prolog-var, Prolog-var) | optional( grammar-symbol ) | optional( grammar-symbol , substitution ) | { any-callable-Prolog-term } | [ Prolog-term [ , .., Prolog-term ] ] | core , core | core $$ core |
First alternative is a usual grammar symbol;
the colon notation in the second alternative makes it possible to unify the word boundaries
normally hidden for the user
(see section 3.3)
to Prolog variables; a variable may be used here as well (will be bound to a comma-pair
during execution).
The two variants with "optional" are used for grammar
symbols that are optional elements in the core-of-head;
a possible substitution can be used for assigning variables
occurring in the optional part; only activated when the rule applies with no
match for the optional core part; see about substitution with the
"where" notation below.
The next borrows notation from DCG in that nongrammatical symbols (here
constraints only) are sourrounded by curly brackets.
Next, square brackets for sequences of terminal symbols;
comma, the usual sequencing operator; and finally $$
for parallel matching (described below).
NB: The core of a rule must contain at least one grammar symbol.
The colon operator should only be applied to instances of
A core must contain at least one non-optional grammar symbol. Optional parts are implementated by compiling different versions of the rule, one with and one without the optional part; the assigment is executed at compile time similar to the "where" notation (below). Optional parts may not be applied in context parts.
Example:
optional(a(X),X=no), optional(b), c ::> d(X)is an abbreviation for the following four rules: a(X), b, c ::> d(X) b, c ::> d(no) a(X), c ::> d(X) c ::> d(no) |
Parantheses can be used as in Prolog in order to group things. The $$ operator has precedence number slightly higher that comma which means that (a,b$$c,d) is understood as ((a,b)$$(c,d)).
Notice that grammar symbols include gaps, "..." and "n...m" for integers 0≤n<m. It is possible to have values of n and m to be determined dynamically referring attributes of other grammar symbols in the head but it is not checked that the actual values are feasible so lower-level error messages may occationally occur; the CHRG system will issue a warning when a rule with dynamic gap bounds are read in.
There is a restriction on the use of gaps in the core of a head
so that the core must be bounded defined in the following way.
This ensures that the core matches a specific interval of word boundaries
when applied (and thus meaningful boundaries for whatever the body might
add).
NB; The colon operator should only be applied to instances of
|
The left and right context is either of the same
shape as a core or
of the form ( core ; .. ; core )
(no restrictions to gaps apply).
The use of semicolon in left and right context in a rule is syntactic
sugar for the set of rules that results from an unfolding of
the semicolon structure.
Example:
(a ; b) -\ c /- (d ; e) ::> fis an abbreviation for the following four rules: a -\ c /- d ::> f b -\ c /- d ::> f a -\ c /- e ::> f b -\ c /- e ::> f |
A body is in principle any callable Prolog term, but we apply some syntactic conventions as to keep a correspondence with the syntax used in the head.
body::= grammar-symbol | body:(Prolog-var, Prolog-var) | ( body , .., body ) | ( body ; ..; body ) | ( test ->body ; ..; [ test -> ] body ) | { any-callable-Prolog-term } | true | false | fail | term=term | dif(term,term) test ::= anything that Prolog's conditional notation accepts as a test. |
Gaps and the parallel match operator are not allowed in the body.
Notice that when the colon operator is applied in the head of a rule, it typically reads some boundaries from the constraint store, whereas when used in the body, they typically set boundaries (over-riding the default; to be explained); a variable may be used here as bould should be bound to a comma-pair of boundaries during execution (otherwise a failure may occur).
Warning: SICStus Prolog has a syntactic peculiarity which makes it read rules having disjunctions
(semicolon) or conditionals in a wrong way.
For example, SICStus reads the rule
a(X), b(Y) <:> X > Y -> ab(X) ; ba(Y).as if it was written a(X), b(Y) <:> X > Y -> ab(X) | ba(Y).which gives no sense. To get things right in SICStus, you must add a trivial guard as follows. a(X), b(Y) <:> true | X > Y -> ab(X) ; ba(Y).SWI get things right. |
As it appears, a few standard built-in Prolog predicates can be written without curly brackets; these predicates are rarely confused with grammar symbols and it looks nicer without brackets when such a single goal comprise the body. Special operators to be used in the body are available for Assumptions Grammars, see section 7.
The arrow is one of ::> in which case the rule is called a propagation rule and <:> in which case the rule is either a simplification or a simpagation rule. The latter two are distinguished by the simpagation rule having one or more symbols prefixed by an exclamation mark "!" (examples and explanation will follow).
Constraints can be used in both head and body of a rule, as in the following example:
:- chr_constraint h/1. [a] ::> {h(a)}, aa. [b], {h(X)} ::> bb(X). |
According to the semantics described below, the first rule when applied will produce a hypothesis h(a) together with nonterminal symbol aa. This hypothesis makes it possible for the second rule to apply for terminal [b] thus producing nonterminal symbol bb(a).
Finally, it is possible to attach some extra components to a rule, some of which are inherited from CHR, so that where we indicated a rule in the grammar for a source file, the full syntax is as follows:
[ name @@ ] rule [ gpragma pragma ] [where substitution] |
The optional name and
pragma parts are exactly as in CHR with the same syntax and
meaning and not described here; grammar symbols and constraints in the
head can be followed by "# Prolog-variable" as in CHR;
see documentation of
SICSTUS Prolog 4
or SWI Prolog.
Notice that CHR's symbols @ and
pragma have been replaced in the CHRG syntax
by @@ and
gpragma
due to subtle implementation problems.
A "where substitution'' part serves to simplify the writing of
complicated rules, e.g., with complicated attribute expressions or
other parts of the rule that could be separated out.
Example:
a(A) -\ B /- ..., q(X,Y) ::> {C}, sentence_in_funny_context(A,Z) where A = ugly(st(r,u,c(t,u,r(e)))), B = (np, verb, np), C = (append(X,Y,Z), write(Z)) |
The meaning is that any occurrence in the rule of A, B, and C is replaced by the indicated term. The transformation is purely syntactical and executed when CHRG reads the rule from a source file.
The where notation can also be applied for Prolog and CHR rules in a CHRG source file.
To list of contents To start of section 3
A propagation rules such as
a, b ::> c |
means that adjacent occurrences of grammar symbols "a" and "b" will give rise to a an occurrence of "c" spanning over the combined boundaries of "a" and "b", i.e., the start boundary of "c" is the same as the start boundary of "a", and its end boundary is the same as the end boundary of "b". As the rule is a propagation rule, the occurrences of "a" and "b" stays in the store.
A simplification rule
a, b <:> c |
works in a similar way, except that the occurrences of a" and "b", to which the rule is applied, are remioved from the constraint store.
A simpagation rule is similar to a simplification rule, except that one or more grammar symbols in its head are prefixed by a "!" symbol, meaning that such a grammar symbol is not removed, but stays in the store. For example:
a, !b <:> c |
This, when this rule is applied, "a" will be removed, "b" stays, and a "c" is added. The head of a rule may also include constraints which, then serve as conditions for the rule to apply. For example:
{con}, a, b ::> c |
When con is in the store (without any relation to boundaries), the rules applies as a,b::>c; without con, the rule cannot apply. Boundaries for "a", "b", and "c" are calculated as if {con} was not there.
Grammar rules may also contain guards, which has the seme meaning as in CHR. For example:
a(X), b(Y) ::> X is Y+Z, Z > 17 | c(Z) |
The rule can only apply when the actual occurrences of "a" and "b" have attribute values whose sum is greater that 17.
When a rule is applied, the contents of the body of the rule (its right hand side) is executed by adding and grammar symbols and executiong the code within curly bracket. For example:
a, b <:> c, {write(hello)}. |
Grammar symbols "a" and "b" are replaced by "c" as above, and the Prolog goal write(hello) is executed.
A grammar rule may contain more that one grammar symbol, for example:
a, b <:> c, {write(hello)}, d. |
It works similarly as the previous rule, with both "c" and "d" added having that same boundaries, i.e., both span the same substring as "a,b". Grammar symbols "a" and "b" are replaced by "c" as above, and the Prolog goal write(hello) is executed.
Terminal symbols are treated as any other grammar symbols. Consider, for example, the following two rules.
[the], n <:> np. [the,chinese,dragon], blabla <:> chinese_dragon_statement. |
These rules works as expected; notice that the subsequence [the,chinese,dragon] will be treated as three different, adjacent grammar symbols.
(1) a ::> b (2) c1 -\ a ::> b (3) a /- c2 ::> b (4) c1 -\ a /- c2 ::> b |
Each one of them may, from an occurrence of an "a" produce a "b" spanning over the same boundaries. For rule (2), however, this can only happen when the is a c1 immediately before the "a"; for rule (3), a c2 is required immediately after the "a"; for rule (4), both are required.
In the case of simplification (or simpagation) rule, the context parts are not removed from the store, only the core in between the special markers.
Together with the gaps described in the following, these context parts are quite flexible; we show more examples below, combining the two.
a, ..., b <:> ab. |
This rule can apply when an "a" and a "b" occurs, with the "a" being zero or more positions before the "b". The boundaries for the new ab are found from "a" and "b". In case of a simplification rule as shown in the example, only the "a" and the "b" are removed from the store. Whatever grammar symbols may be in between them, and perhaps overlapping with them, are not affected.
There are three different kinds of gaps. The one shown above, "...", representing a distance of any size, "...n" represents a distance of exactly n, and "n...m" any distance of sized from (and including) n to (and including) m. Distances are counted such that two adjacent grammar symbols have distance zero. The following rule shows a combination of all.
a, ..., b, ...7, c, 2...1000, d <:> abcd. |
The following examples show gaps applied with context parts.
(1) a, ... -\ b <:> c (2) a, ... -\ b /- ...5, d <:> c |
Rule (1) means that a "b" can change into a "c" provided that there is an "a" somewhere before the "b"; rule (2) requires, furthermore, that the "b" is followed by a "d" exactly 5 positions to the right. Rules like this may be useful to formalize conditions that apply in descriptions of biologocal sequences data such as DNA, RNA and proteins.
a,...,b $$ c ::> d. |
When an "a" is followed by a "b" in a certain distance, and the same boundaries also covers a "c", we produce a "d". In case of simplification rules, all alternative grammar symbols matched in the head will be removed. The following is an example of a simpagation rule with parallel matching.
![what], ... $$ sentence <:> question. |
Thus, a sentence starting with [what] is converted into a question without removing the [what].
Notice that parallel matching, gaps and left-right contexts can be combined in arbitrary ways. However, a rule can only have one left and one right context part; these cannot be separately specified for parallel alternatives (which anyhow would not add any new expressive power).
(1) a:(0,_) <:> b. (2) a:(X,Y) <:> {X10 is X+10, Y10 is Y+10}, b:(X10,Y10). (3) a:Range <:> {auxiliary_predicate(Range,RangeNew)}, b:RangeNew. |
The first rule will apply to an occurence of a if it appears at the beginning of the string only. Rule (2) will replace any occurrence of a moved 10 positions to the right (which will be a weird thing to do). Rule (3) unifies the variable of Range with a comma structure giving the boundaries and use an addition predicate, expected to be programmed in Prolog, to set new boundaries for of b.
The following example shows a useful application using also parallel matching and gaps.
a:R1, ... $$ ...,a:R2 <:> overlap(R1,R2) | a. overlap((A1,B1),(A2,B2)):- A1 < B2, A2 < B1. |
Using the auxiliary predicate overlap defined as shown in the guard, the grammar rule will join any two overlapping occurrences of a into a new a. By recursion, the rule reduces the occurrences of as to those that correspond to maximal substrings that can be reached by jumping between overlaps of the original as.
(1) a(X), b(Y) <:> true | X > Y -> ab(X) ; ba(Y). (2) a(X), b(Y) <:> true | ab(X) ; ba(Y). |
As noticed above, the trivial guard in nedded in SICStus Prolog, but can be avoided in SWI. Rule (1) will deterministically determin, based on the values of X and Y which of ab(X) and ba(Y) that will replace a(X),b(Y); rule (2) sets of a choice point, so that firstly a(X),b(Y) is replaced by ab(X), and under backtracking, this may be changed into ba(Y).
To list of contents To start of section 3
Directive | Default | Purpose | Restrictions |
---|---|---|---|
:- chrg_option(show_text,{on,off}). | on | Print out tokenlist given to the parse predicate
with numbered word boundaries. Note: Each token is printed when actually called, so it may be repeated under backtracking. |
Can be switched on and off any time. |
:- chrg_option(show_rules,{on,off}). | off | Whenever a CHRG rule is read from a source file,
print its translation into CHR.
Useful for debugging and for understanding CHRG. Note: You will also see a lot of rules generated automatically by the system that serves al sorts of internal purposes. |
Can be switched on and off any time. |
:- chrg_option(lr_optimize,{on,off}). | off | Optimize translation into CHR by adding passive pragmas to all but leftmost symbols of core and possible right context. NB: For grammars of propagation rules with only right contexts, the semantics is not changed. When left and right context or simplification or simpagation rules are used, there are subtle cases where a rule is not applied. | Can be switched on and off any time; in source files, will
be in action for rules following. NB: Works at CHRG compile time, so if you want to change already compiled rules, compile your file once again. |
:- chrg_option(compact_abduction,{on,off}). | off | When on, abductive CHRGs tend to produce abductive solutions with fewer elements before others by continually trying to unify new abducible with old ones, however with all solutions generated on backtracking. See below for details, | Can be switched on and off any time; in source files, will
be in action for rules following. NB: Works at CHRG compile time, so if you want to change allready compiled rules, compile your file once again. |
:- chrg_option(verbose_AG,{on,off}). | off | When on, answers including unsatisfied expectations
are allowed; all auxiliary constraints and unused/unsatisfied
assumptions are given as part of the answer. When off, unsatisfied expectations are not allowed in final state and all auxiliaries are removed before answer is printed. If optionCGHR(show_text,on), backtracking needed to get rid of answers with unsatisfied expectations can be observed. |
Can be switched on and off any time. |
The CHRG system does not reset the options when a file is recompiled, which may occasionally lead to other results than you expect. If you are switching options on and off in your source files and recompile, it may be useful to include the following directive in the start of the file.
:- set_default_chrg_options. |
It may also be called (without the ":-") from the command line.
Predicate | Meaning | Remarks |
---|---|---|
parse(list-of-tokens) | Converts a list of tokens into a sequence of call of token constraints. | Its use described in
section 2. See also option show_text in section 4. |
accept(grammar-symbol) | If possible, unifies grammar-symbol with a grammar symbol in the constraint store that covers the entire string; otherwise it fails. | Intended to be called following a call to parse. |
acceptc(grammar-symbol) | Similar to accept but it also removes all constraints in the constraint store. | Useful to avoid having the constraint store printed out as part of the answer. |
Abducible predicates works more or less the same way as CHR constraints in that they can be used within heads and bodies of CHR rules as well as CHRG rules; in the latter, withing curly brackets. However, using an abducibles' declaration exemplified as follows provides some additional facilities.
abducibles h1/1, h2/1. |
This makes predicates h1, h1_, h2, and h2_ available as constraints. Those ending with an underline are negated version so that h1_(7) represents an approximation to the logical negation of h1(7). It is not allowed to declare an abducible predicate whose name ends with a underscore.
The system generates automatically CHR rules that prevent, say, h1_(fido) and h1(fido) to exist in a constraint store at the same time. However, the usual axiom for logical negation, stating that we must have either h1_(fido) or h1(fido) is not supported.
CAUTION: When doing abduction with CHRG, you should ensure that the grammar is syntactically unambiguous - otherwise abducibles for different interpretations of the text are cluttered up in the constraint store. |
Integrity constraints are rules that describe general properties of abducibles. They are conveniently written. Assume for example that h1 and h2 describe two incompatible properties; this can be defined by a CHR rule as follows:
h1(X), h2(X) ==> fail. |
This integrity constraint will effectively prevent incompatible pairs of hypotheses to be create so that other choices are tried instead.
The body of an integrity constraint may contain other abducibles and built-in Prolog predicates, quite often a unification as in the following which is relevant when family relations are learned:
father_of(X,Y), father(X,Z) ==> X=Z |
I.e., any given value for X can have at most one father.
If the set of abducible atoms is finite for a given abducible predicate, say {h(a),h(b)} for predicate h, this can be expressed by the following integrity constraint.
h(X) ==> (X=a ; X=b). |
Notice: In the methodology we have devised for abduction, abducibles have no place in the head of a grammar rule. However, we do not prohibit this in case someone could find new ways to extend our methods by somehow merging grammar rules and integrity constraints. |
Abducibles declared in this way can also be used in
Prolog predicates defined in your CHRG soiuce files.
Example:
p(X):- h1(X), q(X,Y), h2(Y). p(X):- h2_(X). |
Predicate p can now be used in grammar rule for hypotheses that must be satisfied, and the combined semantics of CHRG, CHR, and Prolog will ensure that the two different ways to derive p(X) are tried out on backtracking.
For an example of abduction in CHRG, see the file abductiveGarfield.
Important notice:
You can use the CHRG system for abductive logic programming
with integrity constraints. Declare abducible predicates, add integrity
constraints as CHR rules and predicate definitions in Prolog (keeping the
above mentioned restriction in mind),
and forget everything about
the
grammar rule notation. (But you still need to end your file with end_of_CHRG_source.) |
All different. A=B, C different. A=C, B different. B=C, C different. A=B=C. |
Thus H may contain solutions such as {h(a), h(b), h(c)} and {h(d)} that both may be minimal, but in some cases there may be reasons to prefer the one with the fewest elements.
The CHRG system provides an option
:- chrg_option(compact_abduction,on). |
which dynamically tries to unifiy as many abducibles as possible, trying out alternatives on backtracking. Alternatively, instead of using this option, you may use the predicate compact_abducibles in order to have the smaller solutions produced, but as a post-parsing phase, as follows:
?- parse([a,neat,little,story]), compact_abducibles. |
(The combined designer and implementer of CHRG has forgotten why he added this facility.)
Most earlier approaches to abductive logic programming are inherently based on the principle that we called compaction above. In our own publications on abduction, we have argued strongly against taking this principle as default, as we find it intuitively wrong. Furthermore, it may often lead to combinatorial explosions. - We have included compaction for compatibility and comparison only.
Dahl, V., Tarau, P., Li, R., Assumption grammars for processing natural language. Proc. Fourteenth Int'l Conf. on Logic Programming. pp. 256-270, MIT Press, 1997. |
The present version of Assumption Grammars is described in more detail in the journal paper referenced in the beginning of this document..
The CHRG notation is extended with the prefix operators +, *, -, =+, =*, =- which can be used in the body of CHRG rules; these operators should not be placed inside curly brackets but be placed at the same level as grammar symbols. ((This for implementation reasons: The CHRG compiler needs to manipulate some of these operators and by convention, it does not analyze what is inside curly brackets.))
The operators can prefix arbitrary hypotheses that need not be seperately declared. (In fact if a constraint, grammar symbol, or abducible h/1 is declared, it will be unrelated to h/1 used with the Assumption Grammar operators). Their meaning is the following where h is some arbitrary function symbol; arity can be arbirary.
+h(a) | Assert linear assumption h(a) for
available for subsequent text Linear means "can be used once". |
*h(a) | Assert intuitionistic assumption for
subsequent text Intuitionistic means "can be used any number of times" |
-h(X) | So-called expectation: consume/apply an assumption and unify its argument with X. |
=+h(a), =*h(X), =-h(X) | as above but textual order of assertion and application/consumption can be arbitrary. |
In many examples, the argument to operators with a star or plus are ground but that need not be the case. Operators including "=" are called time-less, the others sequential.
An expectation is satisfied whenever it has been matched with a corresponding assumption given by by a suitable one of +, *, =+, or =*. For a parse to succeed, all expectations must be satisfied. To ensure this, the system may backtrack if it reaches a final state that includes unsatisfied expectations; if you have :- chrg_option(show_text,on) (which is default) you will see this backtracking.
You can decide to accept final states with unsatisfied expectations using optionCGHR(verbose_AG,on); see section 4.
CAUTION: When using Assumption Grammars in CHRG, you must ensure that the grammar is syntactically unambiguous - otherwise assumptions for different interpretations of the text are cluttered up in the constraint store. |
For an example of an Assumption Grammar in CHRG, see the file simpleAG.
For locally unambigous grammars (no node ever involved in more than one syntax tree apart from subtree relations) that do not use extra constraints or abducibles, complexity is linear.
This may also be obtained by using simplification rules only and-or using :- chrg_option(lr_optimize,on).
For arbitrary grammar without attributes: Cubic.
For grammars with high degree of local ambiguity and attributes, the complexity can be very high. The following grammar is a worst case:
[a] ::> a(0) a(T1), a(T2) ::> a(t(T1,T2)) |
This is due to the explosive number of different parsetrees that can be constructed - so give it a string of more that 30 a's and you'll never get the result.
Finally, it should be mentioned that grammars with an extensive use abduction with compaction, or assumptions very often can run into combinatorial explosions. If your application depends heavily on assumptions, and you experience scale-up problems, you want to know that there implementation of assumptions are a bit coarse and that there is lot of room for improvements.
General advice: If CHRG stops reading your file with an incomprehensive error message, try to set :- show_rules(on). and try again. This may indicate how far you got in the file before things went wrong. |
Another general advice: In case CHRG, or the underlying CHR system, stops compilation due to some error, the system is not good at cleaning up, so if you experience strange things when trying to compile you source file again, close down Prolog and the CHRG system and start it again. |
The following things are not checked and may give confusing behaviour:
It should not take long time to adapt a program written for version 0 to version 1, and we have added gaps with a fixed length.
A bug has been fixed, some subtle cases of rules with gaps could not be compiled correctly in version 0.