IDSC III: Lazy Seqs
Lazy evaluation is an important concept in functional programming. Running on the JVM, Clojure does not support general laziness, but it has a data-structure abstraction called lazy sequence that provides for many of the benefits of the more general strategy.
This post is a slight detour in the course of the immutable data-structure canon. In the previous installation, I described how some operations on lists return lazy sequences to retain immutability and performance characteristics. As this behavior is not specific to lists, and indeed fundamental to all Clojure data-structures, I decided that lazy seqs deserve a closer look.
The Rationale of Lazy Evaluation
In many cases it might not be necessary to use all the values in a data-structure. But it might be hard to determine what is required when the structure is defined, as the creation and the consumption could be in different parts of the program. Laziness allows you to write a general definition and later use only what you need without incurring the cost for needless computations.
Laziness also allows for an interesting special case: infinite sequences. You can specify the computation of elements for a series (e.g. Fibonacci numbers) and use parts of that series without problems, as long as you do not try to traverse the entire sequence.
Postponing Calculations
(c) iStockphoto.com
In a language with general support for lazy evaluation like Haskell, the compiler takes care that nothing gets evaluated before it is needed. In Clojure, however, eager evaluation is the norm. That means that when you call a function with the result of another function-call as a parameter, that second function gets evaluated immediately. In a lazy language, the second function would only be evaluated when the first function needs the respective parameter.
For this reason, lazy-seq (which as we saw last time is used to define lazy sequences), cannot be implemented as a function – the body of what we pass to it must not be evaluated at the time of definition. The solution is a construct often cited as the most powerful feature in Lisp: macros.
Macros are the extension mechanism of Clojure (and Lisp in general); they allow you as a programmer to add features to the language. This makes it possible to have a very small core language, but still provide for pleasant programming. Unlike functions, macros do not evaluate their arguments immediately. When Clojure comes across code that uses a macro it first expands the macro and replaces the code with the result before the compilation proceeds normally.
The definition of the lazy-seq macro in core.clj is surprisingly short:
(defmacro lazy-seq
[& body]
(list 'new 'clojure.lang.LazySeq
(list* '^{:once true} fn* [] body)))
As macro expansion and the corresponding escaping rules are beyond the scope here, let it suffice to say that (lazy-seq BODY) would expand to
(new clojure.lang.LazySeq (fn* [] BODY))
The body that defines the seq is wrapped into an anonymous function and passed to the constructor of the LazySeq class. (new is a special form for Java interoperability that allows us to construct a Java object.) The main part of the implementation of lazy seqs is in Java, more specifically in the aforementioned class clojure.lang.LazySeq.
LazySeq has three fields:
fn: a function object to store the definition body,sv: a generic object to store the computed value, ands: a representation of the current view of the sequence.
When we construct a LazySeq instance, only the function object is assigned a value.

For performance reasons, LazySeq does not simply call the function object and return its result, when the contents are accessed. Instead the other two fields are used to do some caching.
Caching
To understand how the access works, let us look at the first method in LazySeq.java:
public Object first(){
seq();
if(s == null)
return null;
return s.first();
}
The seq method that is called in line 2, is the key to how access to lazy seqs works. It is also called from all the other access methods (count, more, cons, etc.).
final synchronized public ISeq seq(){
sval();
if(sv != null)
{
Object ls = sv;
sv = null;
while(ls instanceof LazySeq)
{
ls = ((LazySeq)ls).sval();
}
s = RT.seq(ls);
}
return s;
}
The sval method called from line 2 executes the function object and stores the result in the sv field. sval also takes care that the function is not executed more than once. In line 5 the calculated value is assigned to the temporary variable ls, and the field for the calculated value is set to null in line 6. Lines 7-10 resolve nested lazy seqs until we get a value of a different type assigned to the temporary variable ls.
The return value is computed and assigned to the field s in line 11. RT is the runtime class that provides the fundamental Clojure functions for the Java code implemented as static methods. The seq method in RT turns its parameter into a sequence if it implements the Iterable Java interface.
This computation happens only once. In subsequent calls to seq the cached value in field s is returned, as sv is set to null (line 6) so that the condition in line 3 is false and the block in lines 4-12 is not executed. The sval method does not set sv again in later invokations either.
The concat Example
To clarify what’s happening in seq, let us go through an example. We’ll use an abbreviated version of concat.
(defn concat
[x y]
(lazy-seq
(let [s (seq x)]
(if s
(cons (first s) (concat (rest s) y)))
y))))
What happens when we take the seq resulting from a concatenation and call first on it? Via the first method we get to seq which in turn calls sval, so that the body gets executed. The body of the lazy-seq use in concat is the call to cons (line 6). The result of cons is an instance of clojure.lang.Cons – an abstraction with list semantics, i.e. it has a head and a tail. The Cons is assigned to the sv field and – in the seq method – to the temporary variable ls (line 5). The loop for nested lazy seqs (lines 7-10) is skipped, because a Cons is not an instance of LazySeq.
In line 11, the Cons gets assigned to the field s (RT.seq does not do anything to it). Back to first. s is not null, so we call first on the Cons and return the result. The first element of the Cons is the first element of the concatenated seq, just as expected.
Accessing the tail of the seq, happens analogously; the tail of the Cons is another lazy seq due to the recursive definition of concat which we discussed last time.
Summary
- Lazy sequences allow computations to be postponed. Even infinite sequences are possible
- Lazy seqs are created using the
lazy-seqmacro - Using a macro avoids immediately evaluating the definition of the seq
- The implementation of lazy seqs is in the Java class
clojure.lang.LazySeq - Lazy seqs cache the results of the computation
