Difference Lists

.
Difference Lists  

A difference list is the conventional Prolog programmer terminology for list representations that have TWO components, a list and some tail of that list.

We have seen difference lists primarily in DCGs, as the two arguments added in the compiled rule procedure:

np --> det nom
compiles as:
np(In,Out) :- det(In,Mid), nom(Mid,Out).
In and Out are respectively the input string and the output string. For example, with the right grammar, we might call:
np([the,book,is, thick],_).
and Prolog might return:
np([the,book,is, thick],[is,thick]).
The In argument is bound to a list, and the Out argument to some tail.
Three ways
 

There are 3 ways of looking at difference lists, not really mutually exclusive, and all Prolog native. We will briefly mention all three and discuss the third a bit, because it bears on some grammar writing issues later in the course:

  1. Threaded argument: In, Out:
      Many prolog predicates have distinguished argument positions for some datastructure that is to be modified, where the modified data structure and the input data structure are NOT UNIFIABLE. These are often called threaded arguments, where the variable in the input (instantiated when the predicate is called) is conventionally called XIn, and the output in the output (uninstantiated when the predicate is called) is conventionally called XOut. For example when the in and out data structures are Prolog lists, we would have ListIn and List Out. For example:
      %%%%%%%%%%%%%%%%%%%% count_off/3 %%%%%%%%%%%%%%%%%%%%%%%%
      %%% Returns the nth tail of a list,
      %%%  For: count_off(0,ListIn,ListOut)
      %%%       ListIn=ListOut
      %%%  For: count_off(1,ListIn,ListOut)
      %%%       ListIn=[_|Tail1]
      %%%       ListOut=Tail1
      %%%  For: count_off(2,ListIn,ListOut)
      %%%       ListIn=[_,_|Tail2]
      %%%       ListOut=Tail2
      %%%%%%%%%%%%%%%%%%%%  and so on %%%%%%%%%%%%%%%%%%%%%%%%
      %%%           +   +      -
      %%% count_off(N, List, NthTail)
      count_off(0,Tail,Tail):- !.
      count_off(N,[_|Rest],Tail):- 
        M is N-1,
        count_off(M,Rest,Tail).
      
      Obviously np/2 above in its compiled form is another threaded-argument predicate:
      np([the,book,is,thick],[is,thick]).
      
  2. There is a natural way of representing positions in strings as integer indices:
      0   1   2   3   4
        the   book   is   thick  
    Each index represents a position "between words".

    Difference lists can also be thought of as an alternative way of representing positions in input strings, with the following kind of "translation" convention:

      Index List
      0 [the,book,is,thick]
      1 [book,is,thick]
      2 [is,thick]
      3 [thick]
      4 []
    Note that the tails representing each index are exactly what count_off/3 returns for the given index, given the entire input as ListIn.

    Thus a fact representing that we have discovered an np from position 0 to position to might be represented in one of two equivalent ways:

    np(0,2)
    np([the,book,is,thick],[])
    
  3. We can also view difference lists as the two chunks of information required to represent a particular sublist of a given list. This is the diea that leads to representing difference lists as a single terms, such as:
    [the,book,is,thick]-[the,book]
    
    which is a repsentation of the sublist:
    [the,book]
    
    the sublist spanned by the np in our example. More on this below.
Representations  

The same datum or piece of information or mathematical object can be represented in a variety of ways.

Consider the number "2":

  • 2: arabic numeral
  • II: Roman numeral
  • 11: "unary" numbers(3 = 111, 4 = 1111, and so on)
  • 5-3: difference
  • 6-4: difference
The particular use we have in mind for the representation will determine which representation is most useful.

For example, suppose we have a list of 6 things and we wish to take enough away to leave 2 things, then the representation of 2 as:

    6-4
is useful because we can read the number of things we have to take away right off the representation. In other words this representation saves us a subtraction computation.

Frequently the usefulness of a representation is that it makes some computation unnecessary or trivial to perform.

A new
representation
of word strings
 

We're going to represent subsequences of words as differences.

Representation of the boy in a larger sequence:

    [the, boy, in, the, park]-[in, the, park]
Take the string "in the park" away from "the boy in the park" and you have "the boy".
    [the, boy] = [the, boy, in, the, park]-[in, the, park]

"The boy" can also be represented as:

  • [the, boy, on, the, street]-[on, the, street]
  • [the, boy, walked, through, the, lobby]-[walked, through, the, lobby]
and an infinite number of other "differences", just as a number like 2 can be represented with an infinite number of distinct subtractions.

Notice that this idea applies to any list, not just one representing a substring. Any subsequence of the list of data items can be represented as a difference list.

Difference
Lists
 

The right hand side of the equality

    [the, boy] = [the, boy, in, the, park]-[in, the, park]
is called the difference list representation.
Application:
Partially
Instantiated
Lists (Queues)
 

We can implement queues in Prolog by using a list that is partially instantiated in the following sense.

We know the first few members but it might be continued in any number of ways. Here's one representation:

    [the,boy|Tail]
WE now have a list that can become longer by unifying something with the tail, which is what we needs for updateable queues.

The trouble with this is that it not easy to add an element at the end without unwinding our way to through the whole list:

add_an_element(Elem, List):-
   nonvar(List), 
   List=[_|Tail], !,
   add_an_element(Elem,Tail).
add_an_element(Elem, List):-
   var(List), 
   List=[Elem|_].
The longer the list the longer it takes to get to the end.
    add_an_element(a,[b,c,d,e|Tail])
      add_an_element(a,[c,d,e|Tail])
        add_an_element(a,[d,e|Tail])
          add_an_element(a,[e|Tail])
            add_an_element(a,Tail)
              Tail=[Elem|_]

Suppose however we use a difference list repesentation of our queue:

    [the,boy|Tail] - Tail
With this representation, we can add an element at the end much more easily:
%%                  +      +          -
%% add_an_element(Elem, DiffList, NewDiffList)
add_an_element(Elem, (List-[Elem|NewTail]), (List-NewTail)).

Call:

    add_an_element(a,([b,c,d,e|Tail]-Tail),NewDiffList)
This returns with:
    add_an_element(a,([b,c,d,e|Tail]-Tail),NewDiffList)
      NewDiffList=([a,b,c,d,e,a|NewTail]-NewTail)
      Tail=[a|NewTail]
    Note that NewDiffLists queue component still unifies with DiffList's queue component. All that has changed is the Tail component.
Application:
Parsing
 

np on indexing representation:

    np(0,2)

np on diff list representation:

    np([the,boy,walks],[walks])

Comparing diff lists and indices

  1. The consequences of this representation for parsing can be seen by noting first that the rule definitions for DCGs using diff lists:
      s(In,Out):- np(In,Mid),vp(Mid,Out).
    would also work if indices were being used. An s from 0 to 3 can be assembled for example, by finding an np from 0 to 2 and a vp from 2 to 3.
  2. The difference difference lists make comes with lex entries. Then lexical rules will be as follows
    1. Indexed representation:
      n(N,M):- word(Word,N,M),M is N+1,n(Word).
      
      Note the need to call word/3 and to do arithmetic. The index rep also needs a predicate word/3 for retrieving actual words from an input string stored elsewehere. Call:
      word(Word,2,3)
      
      Rturn:
      word(the,2,3)
      

      We also assume predicates like:

      1. det(the).
      2. n(boy).
      3. v(walks).
    2. Diff list representation:

      For the lexical entry we use the difference-list representation of a partially instantiated list:

        [boy|T]-T
      And we make this a noun with the fact:
      n([boy|T],T)
      
      This of course is a representation of "[boy]" like all our other diff list reps. But this representation of "[boy]" subsumes all our previous difference lists representations of the string:
      • [boy|T]-T subsumes [boy, in, the, park]-[in, the, park]
      • [boy|T]-T subsumes [boy, on, the, street]-[on, the, street]
      • [boy|T]-T subsumes [boy, walked, through, the, lobby]-[walked, through, the, lobby]
      It is the most general difference list representation of that string, and thus arguably the correct string representation for our lexicon,

    Multi-words (words with spaces) are naturally represented in the same format:

    1. n([shut,in|T],T).
    2. np([the, white, house|T]).

    And these lexical entries can replace the entire lexical rule needed in the caseof indices:

    n(N,M):- word(Word,N,M),M is N+1,n(Word).
    
  3. The fundamental advantage of the diff is that we can do mathing against a subpart of the input with pure unification.
Count off
reconsidered
 

%%%           +   +      -
%%% count_off(N, List, NthTail)
count_off(N,[_|Rest],Tail):- 
  count_off(M,Rest,Tail),
  N is M+1.
count_off(0,Tail,Tail).
Claim: count-off peels off the first N members of List and returns their difference list representation.

    count_off(4, [a,b,c,d,e,f], NewTail)
returns
    count_off(4, [a,b,c,d,e,f], NewTail)
    NewTail=[e,f]