|
Difference Lists
|
 
|
A difference list is the conventional
Prolog programmer terminology for
list representations that have TWO components,
a list and some tail of that list.
We have seen difference lists primarily
in DCGs, as the two arguments added
in the compiled rule procedure:
np --> det nom
compiles as:
np(In,Out) :- det(In,Mid), nom(Mid,Out).
In and Out are respectively the input string and the
output string. For example, with the right grammar,
we might call:
np([the,book,is, thick],_).
and Prolog might return:
np([the,book,is, thick],[is,thick]).
The In argument is bound to a list,
and the Out argument to some tail.
|
Three ways
|
 
|
There are 3 ways of looking at difference lists,
not really mutually exclusive, and all Prolog native.
We will briefly mention all three and discuss
the third a bit, because it
bears on some grammar writing issues later
in the course:
- Threaded argument: In, Out:
Many prolog predicates have distinguished argument
positions for some datastructure that is to be modified,
where the modified data structure
and the input data structure are NOT
UNIFIABLE. These are often called threaded arguments,
where the variable in the input (instantiated
when the predicate is called) is conventionally called
XIn, and the output in the output (uninstantiated when the
predicate is called) is conventionally called XOut.
For example when the in and out data structures are
Prolog lists, we would have ListIn and List Out.
For example:
%%%%%%%%%%%%%%%%%%%% count_off/3 %%%%%%%%%%%%%%%%%%%%%%%%
%%% Returns the nth tail of a list,
%%% For: count_off(0,ListIn,ListOut)
%%% ListIn=ListOut
%%% For: count_off(1,ListIn,ListOut)
%%% ListIn=[_|Tail1]
%%% ListOut=Tail1
%%% For: count_off(2,ListIn,ListOut)
%%% ListIn=[_,_|Tail2]
%%% ListOut=Tail2
%%%%%%%%%%%%%%%%%%%% and so on %%%%%%%%%%%%%%%%%%%%%%%%
%%% + + -
%%% count_off(N, List, NthTail)
count_off(0,Tail,Tail):- !.
count_off(N,[_|Rest],Tail):-
M is N-1,
count_off(M,Rest,Tail).
Obviously np/2 above in its compiled form is
another threaded-argument predicate:
np([the,book,is,thick],[is,thick]).
- There is a natural way of representing
positions in strings as integer indices:
|
0
|
 
|
1
|
 
|
2
|
 
|
3
|
 
|
4
|
|
 
|
the
|
 
|
book
|
 
|
is
|
 
|
thick
|
 
|
Each index represents a position "between words".
Difference lists can also be thought of as
an alternative way
of representing positions
in input strings, with the following kind of "translation"
convention:
|
Index
|
List
|
|
0
|
[the,book,is,thick]
|
|
1
|
[book,is,thick]
|
|
2
|
[is,thick]
|
|
3
|
[thick]
|
|
4
|
[]
|
Note that the tails representing each index
are exactly what count_off/3 returns for
the given index, given the entire input
as ListIn.
Thus a fact representing that we have
discovered an np from position 0 to position to
might be represented in one of two equivalent ways:
np(0,2)
np([the,book,is,thick],[])
- We can also view difference lists
as the two chunks of information required
to represent a particular sublist of a given
list. This is the diea that leads to representing
difference lists as a single terms, such as:
[the,book,is,thick]-[the,book]
which is a repsentation of the sublist:
[the,book]
the sublist spanned by the np in our example.
More on this below.
|
|
Representations
|
 
|
The same datum or piece of information or mathematical
object can be represented in a variety of ways.
Consider the number "2":
- 2: arabic numeral
- II: Roman numeral
- 11: "unary" numbers(3 = 111, 4 = 1111, and so on)
- 5-3: difference
- 6-4: difference
The particular use we have in mind
for the representation will determine which representation
is most useful.
For example, suppose we have a list of 6 things and
we wish to take enough away to leave 2 things, then
the representation of 2 as:
6-4
is useful because we can read the number of things we have
to take away right off the representation. In other
words this representation saves us a subtraction computation.
Frequently the usefulness of a representation
is that it makes some computation unnecessary or trivial to perform.
|
A new
representation
of word strings
|
 
|
We're going to represent subsequences of words as differences.
Representation of the boy in a larger sequence:
[the, boy, in, the, park]-[in, the, park]
Take the string "in the park" away from
"the boy in the park"
and you have "the boy".
[the, boy] = [the, boy, in, the, park]-[in, the, park]
"The boy" can also be represented as:
- [the, boy, on, the, street]-[on, the, street]
- [the, boy, walked, through, the, lobby]-[walked, through, the, lobby]
and an infinite number of other "differences", just as
a number like 2 can be represented with an
infinite number of
distinct subtractions.
Notice that this idea applies to any
list, not just one representing a substring.
Any subsequence of the list of data items
can be represented as a difference list.
|
Difference
Lists
|
 
|
The right hand side of the equality
[the, boy] = [the, boy, in, the, park]-[in, the, park]
is called the difference list representation.
|
Application:
Partially
Instantiated
Lists (Queues)
|
 
|
We can implement queues in Prolog by
using a list that is partially instantiated in the following sense.
We know the first few members but it might be continued
in any number of ways. Here's one representation:
WE now have a list that can become longer by
unifying
something with the tail,
which is what we needs for updateable queues.
The trouble with this is that it not easy to
add an element at the end without
unwinding our way to through the whole list:
add_an_element(Elem, List):-
nonvar(List),
List=[_|Tail], !,
add_an_element(Elem,Tail).
add_an_element(Elem, List):-
var(List),
List=[Elem|_].
The longer the list the longer it takes to get to the end.
add_an_element(a,[b,c,d,e|Tail])
add_an_element(a,[c,d,e|Tail])
add_an_element(a,[d,e|Tail])
add_an_element(a,[e|Tail])
Suppose however we use a difference list repesentation of
our queue:
With this representation, we can add an element at the end much more
easily:
%% + + -
%% add_an_element(Elem, DiffList, NewDiffList)
add_an_element(Elem, (List-[Elem|NewTail]), (List-NewTail)).
Call:
add_an_element(a,([b,c,d,e|Tail]-Tail),NewDiffList)
This returns with:
add_an_element(a,([b,c,d,e|Tail]-Tail),NewDiffList)
NewDiffList=([a,b,c,d,e,a|NewTail]-NewTail)
Tail=[a|NewTail]
Note that NewDiffLists queue component still
unifies with DiffList's queue component. All
that has
changed is the Tail component.
|
Application:
Parsing
|
 
|
np on indexing representation:
np(0,2)
np on diff list representation:
np([the,boy,walks],[walks])
Comparing diff lists and indices
- The consequences
of this representation for parsing can be seen
by noting first that the rule definitions for DCGs
using diff lists:
s(In,Out):- np(In,Mid),vp(Mid,Out).
would also work if indices were being used.
An s from 0 to 3 can be assembled for
example, by finding an np from 0 to 2 and a vp
from 2 to 3.
- The difference difference lists
make comes with lex entries.
Then lexical rules will be as follows
- Indexed representation:
n(N,M):- word(Word,N,M),M is N+1,n(Word).
Note the need to call word/3 and to do arithmetic.
The index rep also needs a predicate word/3
for retrieving actual words from an input string
stored elsewehere. Call:
word(Word,2,3)
Rturn:
word(the,2,3)
We also assume predicates like:
- det(the).
- n(boy).
- v(walks).
- Diff list representation:
For the lexical entry
we use the difference-list representation
of a partially instantiated list:
And we make this a noun with the fact:
n([boy|T],T)
This of course is a representation
of "[boy]" like all our
other diff list reps.
But this representation of "[boy]" subsumes all our
previous difference lists representations of the string:
- [boy|T]-T subsumes [boy, in, the, park]-[in, the, park]
- [boy|T]-T subsumes [boy, on, the, street]-[on, the, street]
- [boy|T]-T subsumes [boy, walked, through, the, lobby]-[walked, through, the, lobby]
It is the most general difference list
representation of that string, and thus arguably the correct string
representation for our lexicon,
Multi-words (words with spaces) are naturally represented
in the same format:
- n([shut,in|T],T).
- np([the, white, house|T]).
And these lexical entries can
replace the entire lexical rule needed
in the caseof indices:
n(N,M):- word(Word,N,M),M is N+1,n(Word).
- The fundamental advantage of the diff
is that
we can do mathing against a subpart of the input
with pure unification.
|
Count off
reconsidered
|
 
|
%%% + + -
%%% count_off(N, List, NthTail)
count_off(N,[_|Rest],Tail):-
count_off(M,Rest,Tail),
N is M+1.
count_off(0,Tail,Tail).
Claim: count-off peels off the
first N members of List
and returns their difference list
representation.
count_off(4, [a,b,c,d,e,f], NewTail)
returns
count_off(4, [a,b,c,d,e,f], NewTail)
NewTail=[e,f]
|