<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Ibis and Baboon</title>
 <link href="http://jgre.org/atom.xml" rel="self"/>
 <link href="http://jgre.org/"/>
 <updated>2011-06-03T14:33:09+02:00</updated>
 <id>http://jgre.org/</id>
 <author>
   <name>Janico Greifenberg</name>
   <email>jgre@jgre.org</email>
 </author>
 
 
 <entry>
   <title>Chomsky, Norvig and Practical Machine Learning</title>
   <link href="http://jgre.org/2011/06/03/chomsky-norvig-ml"/>
   <updated>2011-06-03T00:00:00+02:00</updated>
   <id>http://jgre.org/2011/06/03/chomsky-norvig-ml</id>
   <content type="html">&lt;p&gt;Is it more important for science to describe &lt;em&gt;how&lt;/em&gt; things behave or to explain &lt;em&gt;why&lt;/em&gt; things behave the way they do? This seems to be the question behind statements Noam Chomsky made at a &lt;a href='http://mit150.mit.edu/symposia/brains-minds-machines'&gt;symposium&lt;/a&gt; regarding machine learning and an &lt;a href='http://norvig.com/chomsky.html'&gt;article&lt;/a&gt; by Peter Norvig discussing Chomsky&amp;#8217;s opinions.&lt;/p&gt;

&lt;p&gt;Chomsky is highly critical of the commonly used statistical models for machine learning that focus on the &amp;#8220;how&amp;#8221;-part. He discounts the practical success of these models as unimportant for the advancement of science. His main goal &amp;#8211; as far as I understand it &amp;#8211; is to find the principles on which language is based.&lt;/p&gt;

&lt;p&gt;Norvig argues in favour of statistical models and purely descriptive research as something well worth pursuing and not as rare in the history of science as Chomsky claims it is. He cites several examples from Chomsky&amp;#8217;s publications that show a lack of knowledge about the capabilities of machine learning algorithms. Norvig&amp;#8217;s most important argument &amp;#8211; to my understanding &amp;#8211; is that our current statistical models perform better in describing the reality of language than our current explanatory models.&lt;/p&gt;

&lt;p&gt;Regardless of the advantages of statistical models, explanatory models definitely are easier for humans to talk and think about. And however our brains really learn languages, rules are how we teach them. Beyond the scientific considerations Norvig&amp;#8217;s article and Chomsky&amp;#8217;s statements focus on, I find this aspect relevant for the practical use of machine learning as well. Many applications or services that rely on machine learning suffer in usability, because you cannot really understand why they do what they do.&lt;/p&gt;

&lt;p&gt;Spam filters are a good example for this problem. Originally, programs like &lt;a href='http://spamassassin.apache.org/'&gt;Spamassasin&lt;/a&gt; only used explicit rules (e.g. does the text contain the word &amp;#8220;viagra&amp;#8221;) to determine whether a mail was spam. For each matching rule a certain number of points is added to the mail&amp;#8217;s spam score. And usually the program adds a header where all the matching rules and the number of points resulting from it are listed (e.g. &lt;code&gt;DRUGS_ERECTILE=0.282&lt;/code&gt;). These scores are helpful when looking why some mails where not classified correctly.&lt;/p&gt;

&lt;p&gt;Spam detection was, however, greatly improved by the &lt;a href='http://www.paulgraham.com/spam.html'&gt;introduction of Bayesian filters&lt;/a&gt;. These probabilistic filters are trained with corpora of mails marked as spam or not-spam and calculate the probability of new mails being spam. In Spamassassin, this results in a single rather opaque score &lt;code&gt;BAYES_99=3.5&lt;/code&gt;. The other descriptive scores are still there, but from a brief look through my recent spam, the Bayesian classifier contributes the most significant numbers for the mails that were correctly filtered.&lt;/p&gt;

&lt;p&gt;The good news is that statistical models don&amp;#8217;t always mean that you won&amp;#8217;t get a good explanation. Amazon, for example, shows you why you get a certain recommendation. Below each recommended item, there is a link to &amp;#8220;fix this recommendation&amp;#8221;.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/norvig-chomsky-ml/amazon_fix_recommendation.jpg' alt='Fix recommendations on Amazon.com' /&gt;&lt;/p&gt;

&lt;p&gt;The page you get, shows items you bought or looked at that the &lt;a href='http://en.wikipedia.org/wiki/Recommender'&gt;recommender&lt;/a&gt; thought were similar to the one you now get as recommendation. The page also gives you ways to tell Amazon not to use these items for you in the future.&lt;/p&gt;

&lt;p&gt;In applications with a machine learning component, giving some explanation or reasoning goes a long way to improve the usability.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>On the News</title>
   <link href="http://jgre.org/2011/04/13/on-the-news"/>
   <updated>2011-04-13T00:00:00+02:00</updated>
   <id>http://jgre.org/2011/04/13/on-the-news</id>
   <content type="html">&lt;p&gt;Mandy Brown wrote a really good &lt;a href='http://aworkinglibrary.com/library/archives/on_the_news/'&gt;post&lt;/a&gt; about the evolution her news-reading behaviour. She started out reading &lt;em&gt;one&lt;/em&gt; newspaper and than transitioned to reading &lt;em&gt;many different&lt;/em&gt; online news sources. I can absolutely relate to that. I also share the transition in the behavioural pattern of reading news: with the newspaper I read it once in the morning during breakfast; now I&amp;#8217;m checking the news a day.&lt;/p&gt;

&lt;p&gt;An than Mandy talks about expectations from the news she reads:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I want a reading experience that defends the news from the circus that online advertising creates. I want good storytelling and analysis, not naked facts. I want news that admits and defends its point of view (and acknowledges that there is a truth to be uncovered), not news that parrots the party line while making claims to objectivity. I want long essays on the events at Fukushima and the consequences for nuclear power going forward, not shrieking dispatches of each new fire or setback. I want a history of American engagement in Libya, putting the events of the past few weeks in context. I want twenty thousand words on the recession and its effects on the middle class, not another lone statistic about the unemployment rate. I want thoughtful, investigative journalism that exposes the ways in which our government is failing us, so that we can make it better.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I can say &amp;#8216;yes&amp;#8217; to each of the points, except for the second item. I do want good storytelling and analysis, but I also want links to the naked facts. When I read about &lt;a href='http://www.zeit.de/auto/2011-04/diesel-energiesteuer'&gt;a planned increase in taxes for Diesel fuel&lt;/a&gt; where one quoted expert claims that preferring Diesel is bad from an environmental point of view, while car manufactures claim that Diesel is good for the environment, I want links to studies about the different properties of Diesel vs. gasoline with respect to environmental issues. When I read an op-ed about health care, where the author claims that the largest part of the health care costs for a person are incurred in the final year of their lives no matter how long the person lived, I want a link to a statistic supporting the claim. Bonus points for additional links to stats that do not agree and an explanation why they are less credible.&lt;/p&gt;

&lt;p&gt;I have another addition to this wish list while we&amp;#8217;re at it: I do want news that admits and defends its point of view, and I also want pointers to articles with different points of view.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Link: Punk Rock Languages</title>
   <link href="http://jgre.org/2011/03/07/punk-rock-languages"/>
   <updated>2011-03-07T00:00:00+01:00</updated>
   <id>http://jgre.org/2011/03/07/punk-rock-languages</id>
   <content type="html">&lt;p&gt;Chris Adamson wrote a &lt;a href='http://www.pragprog.com/magazines/2011-03/punk-rock-languages'&gt;polemic&lt;/a&gt; about programming languages for the &lt;a href='http://www.pragprog.com/magazines/download/21.HTML'&gt;March 2011 issue&lt;/a&gt; of the &lt;a href='http://www.pragprog.com/magazines'&gt;PragPub Magazin&lt;/a&gt; that is both entertaining and thoughprovoking.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The natural appeal of the language is to write software with it, not to mess with the language itself—Solve your users’ problems rather than indulging your own programming fetishes.&lt;/p&gt;
&lt;/blockquote&gt;</content>
 </entry>
 
 <entry>
   <title>Link: Presenting Like a Hacker</title>
   <link href="http://jgre.org/2011/03/03/presenting-like-a-hacker"/>
   <updated>2011-03-03T00:00:00+01:00</updated>
   <id>http://jgre.org/2011/03/03/presenting-like-a-hacker</id>
   <content type="html">&lt;p&gt;After blogging like a hacker with &lt;a href='https://github.com/mojombo/jekyll'&gt;Jekyll&lt;/a&gt; by Tom Preston-Werner, I recently came across a similar thing for presentations: &lt;a href='https://github.com/schacon/showoff'&gt;Showoff&lt;/a&gt; by Scott Chacon. With it you can create presentation in Markdown and show them in a browser. This is particularly usefull when you have code in your presentations which is a real pain in other applications.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Link: Full Text RSS Feed</title>
   <link href="http://jgre.org/2011/03/02/full-text-rss-feed"/>
   <updated>2011-03-02T00:00:00+01:00</updated>
   <id>http://jgre.org/2011/03/02/full-text-rss-feed</id>
   <content type="html">&lt;p&gt;&lt;a href='http://fulltextrssfeed.com/'&gt;Full Text RSS Feed&lt;/a&gt; is a cool simple web-service.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Love the ease of RSS, but hate when feeds don&amp;#8217;t display the whole article, forcing you to click through just to read it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I just hope it won&amp;#8217;t get taken down by those who use the money from annoying ads to pay lawyers.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Literate Programming</title>
   <link href="http://jgre.org/2011/01/09/literate-programming"/>
   <updated>2011-01-09T00:00:00+01:00</updated>
   <id>http://jgre.org/2011/01/09/literate-programming</id>
   <content type="html">&lt;blockquote&gt;
&lt;p&gt;Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.&lt;/p&gt;

&lt;p&gt;&amp;#8212; Donald Knuth. &amp;#8220;Literate Programming (1984)&amp;#8221; in Literate Programming. CSLI, 1992, pg. 99.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href='http://www.literateprogramming.com/'&gt;Literate Programming&lt;/a&gt; is hardly a new idea and it was never popular enough that &lt;a href='http://www.wired.com/magazine/2010/08/ff_webrip/all/1'&gt;Wired&lt;/a&gt; or &lt;a href='http://www.techcrunchit.com/2009/05/05/rest-in-peace-rss/'&gt;TechChrunch&lt;/a&gt; would have pronounced it dead at any given time. Recently however, a few tools appeared that implement some of the ideas from LP.&lt;/p&gt;

&lt;p&gt;The first of these is &lt;a href='http://jashkenas.github.com/docco/'&gt;Docco&lt;/a&gt; which generates HTML that shows the documentation alongside the code. The input is the commented source code, so that no tools are needed to convert it to be used as a program. With Knuth&amp;#8217;s original LP, you wrote in a special kind of language &amp;#8211; mixing your programming language with Latex &amp;#8211; from which the source code could be extracted.&lt;/p&gt;

&lt;p&gt;Inspired by Docco, Michael Fogus wrote &lt;a href='http://fogus.me/fun/marginalia/'&gt;Marginalia&lt;/a&gt; to work with Clojure code, extracting the text from both comments and function documentation.&lt;/p&gt;

&lt;p&gt;These tools, however, do not have all the features to be considered literate programming in the original sense. Most notably missing is the ability to organize the source strictly by the flow of the text and not by the needs of the compiler to have stuff declared before they are used.&lt;/p&gt;

&lt;p&gt;I tried out Marginalia and the literate-programming-lite-style for my clj-bookmarks library (it&amp;#8217;s a client implementation for the &lt;a href='http://www.delicious.com'&gt;Delicious&lt;/a&gt; and &lt;a href='http://pinboard.in'&gt;Pinboard&lt;/a&gt; APIs). The result is &lt;a href='/projects/2011/clj-bookmarks/'&gt;here&lt;/a&gt; and the code is on &lt;a href='https://github.com/jgre/clj-bookmarks'&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although there are some minor bugs in Marginalia, I am mostly happy with the result. I might continue this experiment.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>The Functional Elegance of Ring Middleware</title>
   <link href="http://jgre.org/2010/10/04/ring-middleware"/>
   <updated>2010-10-04T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/10/04/ring-middleware</id>
   <content type="html">&lt;p&gt;Higher-order functions are to me the most awe-inspiring feature of functional programming languages. However, like many incredibly elegant concepts, its greatness is not immediately obvious (recursion is another example that comes to mind). When I was exclusively working with imperative programming languages and came across the definition of higher-order functions, I thought the idea was in part trivial and in part irrelevant theoretical nonsense.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[&amp;#8230;] Higher-order functions [&amp;#8230;] are functions which do at least one of the following:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;take one or more functions as an input&lt;/li&gt;

&lt;li&gt;output a function.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;&amp;#8211; &lt;a href='http://en.wikipedia.org/wiki/Higher-order_function'&gt;Wikipedia&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first part, I could relate to. This is like passing a function pointer in C or an anonymous inner class in Java as e.g. event handlers. In both cases the syntax is so cumbersome that is feels like honest hard work, not like an awesome technique based on theory.&lt;/p&gt;

&lt;p&gt;The second part &amp;#8211; output a function &amp;#8211; why would I want that? The typical examples are as helpful as fibonacci numbers are for explaining recursion to someone who is biased towards a narrow notion of practicality. They often go look something like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;defn &lt;/span&gt;&lt;span class='nv'&gt;plus-x&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;x&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
  &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;fn &lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;+ &lt;/span&gt;&lt;span class='nv'&gt;x&lt;/span&gt; &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;)))&lt;/span&gt;

&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;plus2&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;plus-x&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;

&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;plus2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The function &lt;code&gt;plus-x&lt;/code&gt; takes a parameter &lt;code&gt;x&lt;/code&gt; and returns a function that takes a parameter &lt;code&gt;y&lt;/code&gt; and returns &lt;code&gt;x + y&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;(If you didn&amp;#8217;t believe me about the cumbersome syntax, here is the above example translated into &lt;a href='http://gist.github.com/607543#file_higher_order.java'&gt;Java&lt;/a&gt;. Related: &lt;a href='http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html'&gt;Execution in the Kingdom of Nouns&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m telling you this, because I want to highlight a particularly elegant and practical example of higher-order functions: Ring middleware. &lt;a href='http://github.com/mmcgrana/ring'&gt;Ring&lt;/a&gt; is a Clojure library for writing web apps. It gives you an abstraction on top of HTTP similar to &lt;a href='http://rack.rubyforge.org/'&gt;Rack&lt;/a&gt; for Ruby or &lt;a href='http://wsgi.org/wsgi/'&gt;WSGI&lt;/a&gt; for Python. Higher-level frameworks/libraries such as &lt;a href='http://github.com/weavejester/compojure/wiki'&gt;Compojure&lt;/a&gt;, &lt;a href='http://github.com/cgrand/moustache'&gt;Moustache&lt;/a&gt;, or &lt;a href='http://github.com/brentonashworth/sandbar/wiki'&gt;Sandbar&lt;/a&gt; are built on top of Ring.&lt;/p&gt;

&lt;p&gt;A simple &amp;#8220;Hello, World&amp;#8221; with Ring looks like this (from the &lt;a href='http://github.com/mmcgrana/ring/blob/master//README.md'&gt;README&lt;/a&gt;):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;use&lt;/span&gt; &lt;span class='ss'&gt;&amp;#39;ring&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;adapter&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;jetty&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;defn &lt;/span&gt;&lt;span class='nv'&gt;app&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;req&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
  &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;:status&lt;/span&gt;  &lt;span class='mi'&gt;200&lt;/span&gt;
   &lt;span class='nv'&gt;:headers&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;Content-Type&amp;quot;&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;text/html&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;}&lt;/span&gt;
   &lt;span class='nv'&gt;:body&lt;/span&gt;    &lt;span class='s'&gt;&amp;quot;Hello World from Ring&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;})&lt;/span&gt;

&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;run-jetty&lt;/span&gt; &lt;span class='nv'&gt;app&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;:port&lt;/span&gt; &lt;span class='mi'&gt;8080&lt;/span&gt;&lt;span class='p'&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;A ring handler (&lt;code&gt;app&lt;/code&gt; in the example) is simply a function that takes a map representing the incoming request and returns a map representing the response. Such a handler can be given to an adapter (Jetty in this case) that deals with the actual HTTP connection and calls the handler function. Thus the adapter is a higher-order function of the kind I once thought to be trivial.&lt;/p&gt;

&lt;p&gt;If we want to serve static files with this app, we can add middleware that wraps our app to look for requests to files in a given directory:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;use&lt;/span&gt; &lt;span class='ss'&gt;&amp;#39;ring&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;middleware&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;file&lt;/span&gt; &lt;span class='ss'&gt;&amp;#39;ring&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;adapter&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;jetty&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;defn &lt;/span&gt;&lt;span class='nv'&gt;hello&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;req&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
  &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;:status&lt;/span&gt; &lt;span class='mi'&gt;200&lt;/span&gt;
   &lt;span class='nv'&gt;:headers&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;Content-Type&amp;quot;&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;text/html&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;}&lt;/span&gt;
   &lt;span class='nv'&gt;:body&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;Hello World&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;})&lt;/span&gt;

&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;app&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;wrap-file&lt;/span&gt; &lt;span class='nv'&gt;hello&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;public&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;

&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;run-jetty&lt;/span&gt; &lt;span class='nv'&gt;app&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;:port&lt;/span&gt; &lt;span class='mi'&gt;8080&lt;/span&gt;&lt;span class='p'&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Here, we have a higher-order function of the second kind: &lt;code&gt;wrap-file&lt;/code&gt;. &lt;code&gt;hello&lt;/code&gt; is a handler function in its own right &amp;#8211; it is equivalent to &lt;code&gt;app&lt;/code&gt; in the previous example &amp;#8211; but we do not pass it the the adapter directly. The result of &lt;code&gt;wrap-file&lt;/code&gt; is a new handler function, it has to be, otherwise we couldn&amp;#8217;t pass it to the adapter.&lt;/p&gt;

&lt;p&gt;The &lt;a href='http://github.com/mmcgrana/ring/blob/master/ring-core/src/ring/middleware/file.clj'&gt;implementation&lt;/a&gt; of the wrapper is simple and clean:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;defn &lt;/span&gt;&lt;span class='nv'&gt;wrap-file&lt;/span&gt;
  &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;app&lt;/span&gt; &lt;span class='o'&gt;#&lt;/span&gt;&lt;span class='nv'&gt;^String&lt;/span&gt; &lt;span class='nv'&gt;root-path&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
  &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;ensure-dir&lt;/span&gt; &lt;span class='nv'&gt;root-path&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
  &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;fn &lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;req&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
    &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;if-not&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;= &lt;/span&gt;&lt;span class='nv'&gt;:get&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;:request-method&lt;/span&gt; &lt;span class='nv'&gt;req&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
      &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;app&lt;/span&gt; &lt;span class='nv'&gt;req&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
      &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;let &lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;path&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;substring&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;codec/url-decode&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;:uri&lt;/span&gt; &lt;span class='nv'&gt;req&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)]&lt;/span&gt;
        &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;or&lt;/span&gt;
          &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;response/file-response&lt;/span&gt; &lt;span class='nv'&gt;path&lt;/span&gt;
            &lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;:root&lt;/span&gt; &lt;span class='nv'&gt;root-path&lt;/span&gt; &lt;span class='nv'&gt;:index-files?&lt;/span&gt; &lt;span class='nv'&gt;true&lt;/span&gt; &lt;span class='nv'&gt;:html-files?&lt;/span&gt; &lt;span class='nv'&gt;true&lt;/span&gt;&lt;span class='p'&gt;})&lt;/span&gt;
          &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;app&lt;/span&gt; &lt;span class='nv'&gt;req&lt;/span&gt;&lt;span class='p'&gt;))))))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;First the function checks if the the directory is actually there by calling &lt;code&gt;ensure-dir&lt;/code&gt;. The rest of the code in &lt;code&gt;wrap-file&lt;/code&gt; is building the resulting handler function. If the request method is not &lt;code&gt;GET&lt;/code&gt;, the inner handler is called. Otherwise, we extract the path and try to make a response out of that using &lt;code&gt;file-response&lt;/code&gt;. If that returns &lt;code&gt;nil&lt;/code&gt;, the inner handler is called.&lt;/p&gt;

&lt;p&gt;Now, this is all nice and well, but the awesome elegance of this approach to middleware becomes apparent, when we realize that wrappers can be wrapped around other wrappers in arbitrary numbers:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;app&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;wrap-params&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;wrap-file-info&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;wrap-file&lt;/span&gt; &lt;span class='nv'&gt;hello&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;public&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;))))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The resulting handler has a wrapper that makes the request parameters easier to process and one that adds content-type headers to the response.&lt;/p&gt;

&lt;p&gt;As the prefix notation is not the most readable way to express such a chain of call, you would rather use the &lt;a href='http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/-%3e'&gt;threading macro&lt;/a&gt; to write this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;app&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;-&amp;gt; &lt;/span&gt;&lt;span class='nv'&gt;hello&lt;/span&gt;
             &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;wrap-file&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;public&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
             &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;wrap-file-info&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
             &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;wrap-params&lt;/span&gt;&lt;span class='p'&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;But that is just syntactic sugar, the beauty of this solution is possible, because it uses higher-order functions to great effect.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>IDSC V: Accessing and Modifying Vectors</title>
   <link href="http://jgre.org/2010/07/03/idsc-v-accessing-and-modifying-vectors"/>
   <updated>2010-07-03T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/07/03/idsc-v-accessing-and-modifying-vectors</id>
   <content type="html">&lt;p&gt;In the &lt;a href='/2010/06/idsc-4.html'&gt;previous post&lt;/a&gt; of the &lt;a href='/2010/05/idsc.html'&gt;Immutable Data-Structure Canon&lt;/a&gt; we looked at vectors, their internal structure, how they are created, and how more elements are inserted. In this post we continue where we left off and examine the code used to access and remove values from a vector.&lt;/p&gt;

&lt;h2 id='accessing_elements'&gt;Accessing Elements&lt;/h2&gt;

&lt;p&gt;The elements of a vector are accessible by index. The way vectors are usually implemented, this is a constant time operation, as it only takes the calculation of the offset of a memory location. As we&amp;#8217;ve seen, Clojure vectors are implemented as trees to allow for shared structures between persistent &amp;#8220;modified&amp;#8221; versions, so the access methods need to find their way around that structure.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s say, for example, that we have a vector &lt;code&gt;v&lt;/code&gt; with 1500 elements and we want to get the one at index 1101. There are three different ways to find an element in a vector by index: the &lt;code&gt;nth&lt;/code&gt; function, the &lt;code&gt;get&lt;/code&gt; function, and using the vector as a function. They differ in how they handle the vector being &lt;code&gt;nil&lt;/code&gt;, the index being out of range, and whether they support a &amp;#8220;not found&amp;#8221; argument. For this discussion we&amp;#8217;ll use &lt;code&gt;nth&lt;/code&gt;; it returns &lt;code&gt;nil&lt;/code&gt; if the vector is &lt;code&gt;nil&lt;/code&gt;, throws an exception, if the index is out of range, but you can pass an optional argument that is returned, when the index is not found.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;nth &lt;/span&gt;&lt;span class='nv'&gt;v&lt;/span&gt; &lt;span class='mi'&gt;1101&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Internally, this function is mapped to a call to the &lt;code&gt;nth&lt;/code&gt; method in PersistentVector:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;public&lt;/span&gt; &lt;span class='n'&gt;Object&lt;/span&gt; &lt;span class='nf'&gt;nth&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;i&lt;/span&gt;&lt;span class='o'&gt;){&lt;/span&gt;
	&lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[]&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;arrayFor&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;i&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;i&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='mh'&gt;0x01f&lt;/span&gt;&lt;span class='o'&gt;];&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;arrayFor&lt;/code&gt; method handles the lookup in the internal structure and returns on of the 32-element arrays where the elements are stored &amp;#8211; either from one of the leaf nodes or from the tail (line 2). The index in that array is calculated by applying a bit-mask to the index passed into the method (line 3). For our example that local index is 13.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;public&lt;/span&gt; &lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[]&lt;/span&gt; &lt;span class='nf'&gt;arrayFor&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;i&lt;/span&gt;&lt;span class='o'&gt;){&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;i&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='n'&gt;cnt&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;i&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;=&lt;/span&gt; &lt;span class='n'&gt;tailoff&lt;/span&gt;&lt;span class='o'&gt;())&lt;/span&gt;
			&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='k'&gt;for&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;level&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt; &lt;span class='n'&gt;level&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt; &lt;span class='n'&gt;level&lt;/span&gt; &lt;span class='o'&gt;-=&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
			&lt;span class='n'&gt;node&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Node&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;[(&lt;/span&gt;&lt;span class='n'&gt;i&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;level&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='mh'&gt;0x01f&lt;/span&gt;&lt;span class='o'&gt;];&lt;/span&gt;
		&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;node&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nf'&gt;IndexOutOfBoundsException&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;arrayFor&lt;/code&gt; method checks that the index is valid and throws an exception if it is not (lines 2, 11). If the requested index is greater than the number of values in the tree, the tail array is returned (lines 4,5). In our example, this is not the case, we need to look into the tree.&lt;/p&gt;

&lt;p&gt;As a tree with two layers (root and leaves) can hold 1024 elements, the tree in for a vector with 1500 elements needs three layers. The index we&amp;#8217;re looking for is in the second subtree, as it&amp;#8217;s greater than 1024. The field &lt;code&gt;shift&lt;/code&gt; which holds a multiple of 5 proportional to the height of the tree is 10 in our case, so that we enter the loop in line 7 with 10 as &lt;code&gt;level&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Inside the loop, we find the subtree holding our value by bit-shifting the index by the current level and applying a bit-mask to get it into the 32-element frame. The index for the subtree in our example is 1, as expected. In the second iteration of the loop, we look at the leaves, so that &lt;code&gt;level&lt;/code&gt; is decremented to 5. This time, we find the node with index 2. The loop terminates here, as there is not level further down. The method returns the array attached to the node we found.&lt;/p&gt;

&lt;p&gt;Summarizing all the index-offsets, we have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the second subtree of the root (the first subtree contains 1024 elements),&lt;/li&gt;

&lt;li&gt;the third leaf of that subtree (the leaves contain 32 elements each), and&lt;/li&gt;

&lt;li&gt;the fourteenth element of the leaf (calculated in &lt;code&gt;nth&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thus we have the 1102nd element of the vector. Bingo!&lt;/p&gt;

&lt;p&gt;Like the &lt;code&gt;cons&lt;/code&gt; operation we saw last time, the number of steps necessary to find the nth element depends on the height of the tree, so the complexity here is once again &lt;code&gt;O(log32 N)&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id='deleting_elements'&gt;Deleting Elements&lt;/h2&gt;

&lt;p&gt;To finish the discussion of vectors, let&amp;#8217;s look at deleting elements. The only position where efficient deletions are possible is the end. This can be done using the &lt;code&gt;pop&lt;/code&gt; function that is conveniently mapped to the &lt;code&gt;pop&lt;/code&gt; method of PersistentVector:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;public&lt;/span&gt; &lt;span class='n'&gt;PersistentVector&lt;/span&gt; &lt;span class='nf'&gt;pop&lt;/span&gt;&lt;span class='o'&gt;(){&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nf'&gt;IllegalStateException&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;Can&amp;#39;t pop empty vector&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;EMPTY&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;withMeta&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;meta&lt;/span&gt;&lt;span class='o'&gt;());&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;cnt&lt;/span&gt;&lt;span class='o'&gt;-&lt;/span&gt;&lt;span class='n'&gt;tailoff&lt;/span&gt;&lt;span class='o'&gt;()&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[]&lt;/span&gt; &lt;span class='n'&gt;newTail&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;length&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;];&lt;/span&gt;
		&lt;span class='n'&gt;System&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;arraycopy&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newTail&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newTail&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;length&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nf'&gt;PersistentVector&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;meta&lt;/span&gt;&lt;span class='o'&gt;(),&lt;/span&gt; &lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newTail&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[]&lt;/span&gt; &lt;span class='n'&gt;newtail&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;arrayFor&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
	&lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;newroot&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;popTail&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
	&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;newshift&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;newroot&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;newroot&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;EMPTY_NODE&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;shift&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;newroot&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;newroot&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Node&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='n'&gt;newroot&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;];&lt;/span&gt;
		&lt;span class='n'&gt;newshift&lt;/span&gt; &lt;span class='o'&gt;-=&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nf'&gt;PersistentVector&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;meta&lt;/span&gt;&lt;span class='o'&gt;(),&lt;/span&gt; &lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newshift&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newroot&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newtail&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;First, the edge-cases are handled: &lt;code&gt;pop&lt;/code&gt; on an empty vector does not work (lines 2,3), and &lt;code&gt;pop&lt;/code&gt; on a one-element vector returns an empty vector (lines 4,5). Then we look at the easy case that the tail holds more than one element (line 6); we just copy all tail elements except for the last one into a new array and use it to create a new vector that shares the tree with the original (lines 8-10).&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-v/clara-empty-tail.png' alt='Why more than one? Why not one or more?' /&gt;&lt;/p&gt;

&lt;p&gt;The condition for the simple case is phrased so that we don&amp;#8217;t use it on a tail with only one element. This is because we need to change the tree when the tail run empty; the last node becomes the new tail in that case.&lt;/p&gt;

&lt;p&gt;Shrinking the tree starts with getting the array from the last node by calling the &lt;code&gt;arrayFor&lt;/code&gt; method (line 12). Next, we call &lt;code&gt;popTail&lt;/code&gt; to get a new root node for a tree without the last node (line 13). This leaves us with three cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The node we removed was the only node, so that our return vector gets an empty node as root (lines 15-18).&lt;/li&gt;

&lt;li&gt;The tree has intermediate levels between the root and the leaves and the removed node was the only leaf in the second subtree, so that we can remove one layer (lines 19-23).&lt;/li&gt;

&lt;li&gt;Otherwise the new tree still has the same height as the old tree.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Like the other vector operations we looked at, &lt;code&gt;pop&lt;/code&gt; also has a complexity of &lt;code&gt;O(log32 N)&lt;/code&gt;, because the number of steps necessary is dependent on the height of the tree.&lt;/p&gt;

&lt;h2 id='summary'&gt;Summary&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Vectors are implemented based on a tree and a tail array.&lt;/li&gt;

&lt;li&gt;All arrays (both in the nodes of the tree and in the tail) have up to 32 elements.&lt;/li&gt;

&lt;li&gt;The leaves always have exactly 32 elements in them.&lt;/li&gt;

&lt;li&gt;Elements can added and removed efficiently at the end of the vector.&lt;/li&gt;

&lt;li&gt;Addition, removal, and index lookup are &lt;code&gt;O(log32 N)&lt;/code&gt;, which is essentially constant time in practice.&lt;/li&gt;
&lt;/ul&gt;</content>
 </entry>
 
 <entry>
   <title>IDSC IV: Creating and Growing Vectors</title>
   <link href="http://jgre.org/2010/06/28/idsc-iv-creating-and-growing-vectors"/>
   <updated>2010-06-28T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/06/28/idsc-iv-creating-and-growing-vectors</id>
   <content type="html">&lt;p&gt;Vectors provide constant time random access to any element referenced by an index. Like their fixed length cousins, arrays, they are usually implemented by storing their elements in consecutive memory locations. Such a strait forward implementation, however, doesn&amp;#8217;t allow for immutability &amp;#8211; at least not when we are interested in the performance characteristics, but in that case we could go with lists anyway.&lt;/p&gt;

&lt;p&gt;This post is part of the &lt;a href='/2010/05/idsc.html'&gt;immutable data-structure canon&lt;/a&gt;. While the implementation of &lt;a href='/2010/05/idsc-2.html'&gt;immutable lists&lt;/a&gt; is relatively simple, immutable persistent vectors require quite a bit of work. I decided to split this topic into two parts: this post covers how vectors are created and how elements; we&amp;#8217;ll also learn about the structure used to store the values. The second part to be posted next week, covers how vectors are accessed and how elements are modified and deleted.&lt;/p&gt;

&lt;p&gt;Before we dive in, I need to correct an error in the previous parts of the series. Until now, I used &amp;#8220;sequence&amp;#8221; and &amp;#8220;seq&amp;#8221; synonymously as a generic term for data-structures that can hold multiple values. I realized now that the correct generic term is &lt;em&gt;collection&lt;/em&gt;. &lt;em&gt;Sequence&lt;/em&gt; in the Clojure context is a collection that a series of values without reordering them where values may or may not exist yet. &lt;em&gt;seq&lt;/em&gt; specifically refers to an API for using collections with the funtions &lt;code&gt;first&lt;/code&gt; and &lt;code&gt;rest&lt;/code&gt;. Henceforth I shall use the right terminology. And now: vectors.&lt;/p&gt;

&lt;p&gt;To achieve both immutability and performance, Clojure only stores limited segments of a vector in a row in memory. The overall structure used to represent the vector behind the scenes is a tree. Each of the tree&amp;#8217;s nodes holds an array that contains a segment of the vector. Functions that &amp;#8220;modify&amp;#8221; a vector return a new object that shares all the nodes not affected by the operation with the original. Only the segments that are different get stored in separate node objects.&lt;/p&gt;

&lt;p&gt;The implementation of vectors is in the Java class &lt;code&gt;clojure.lang.PersistentVector&lt;/code&gt;. Instances of that class provide the API through which vectors are used and they hold a reference to the root of the tree. The nodes are implemented in the internal class &lt;code&gt;Node&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id='creating_vectors'&gt;Creating Vectors&lt;/h2&gt;

&lt;p&gt;To create a vector in the Java part of the language, you call the static method &lt;code&gt;create&lt;/code&gt; from &lt;a href='http://github.com/richhickey/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java'&gt;PersistentVector.java&lt;/a&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;static&lt;/span&gt; &lt;span class='kd'&gt;public&lt;/span&gt; &lt;span class='n'&gt;PersistentVector&lt;/span&gt; &lt;span class='nf'&gt;create&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;List&lt;/span&gt; &lt;span class='n'&gt;items&lt;/span&gt;&lt;span class='o'&gt;){&lt;/span&gt;
	&lt;span class='n'&gt;TransientVector&lt;/span&gt; &lt;span class='n'&gt;ret&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;EMPTY&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;asTransient&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
	&lt;span class='k'&gt;for&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Object&lt;/span&gt; &lt;span class='n'&gt;item&lt;/span&gt; &lt;span class='o'&gt;:&lt;/span&gt; &lt;span class='n'&gt;items&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='n'&gt;ret&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;ret&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;conj&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;item&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;ret&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;persistent&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The actual work here is done by &lt;code&gt;TransientVector&lt;/code&gt;, an internal class used for efficiently modifying vectors. The method starts by creating an empty &lt;code&gt;TransientVector&lt;/code&gt; (line 2). The elements are inserted into it by one by calling &lt;code&gt;conj&lt;/code&gt; (lines 3, 4). Finally, the resulting transient is converted to a persistent representation (line 5).&lt;/p&gt;

&lt;p&gt;&lt;a href='http://clojure.org/transients'&gt;Transients&lt;/a&gt; are a feature of Clojure that allows you (and the language&amp;#8217;s internals) to use mutable data-structures in performance critical parts. Transients are thread-save and they cannot be shared with other code. As we&amp;#8217;re looking at immutable data-structures here, we won&amp;#8217;t go into the details of the implementation here. The vector that we get from the call to &lt;code&gt;persistent&lt;/code&gt; has the same structure we would get by starting with an empty persistent vector and adding the values with immutable operations. So let&amp;#8217;s look at that.&lt;/p&gt;

&lt;h2 id='adding_elements'&gt;Adding Elements&lt;/h2&gt;

&lt;p&gt;When elements are added to a &lt;code&gt;PersistentVector&lt;/code&gt;, the &lt;code&gt;cons&lt;/code&gt; method is called.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;public&lt;/span&gt; &lt;span class='n'&gt;PersistentVector&lt;/span&gt; &lt;span class='nf'&gt;cons&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Object&lt;/span&gt; &lt;span class='n'&gt;val&lt;/span&gt;&lt;span class='o'&gt;){&lt;/span&gt;
	&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;cnt&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='c1'&gt;//room in tail?&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='n'&gt;tailoff&lt;/span&gt;&lt;span class='o'&gt;()&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;32&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[]&lt;/span&gt; &lt;span class='n'&gt;newTail&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;length&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;];&lt;/span&gt;
		&lt;span class='n'&gt;System&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;arraycopy&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newTail&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;length&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='n'&gt;newTail&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;length&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;val&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nf'&gt;PersistentVector&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;meta&lt;/span&gt;&lt;span class='o'&gt;(),&lt;/span&gt; &lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newTail&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='c1'&gt;//full tail, push into tree&lt;/span&gt;
	&lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;newroot&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;tailnode&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;Node&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;edit&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt;&lt;span class='n'&gt;tail&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
	&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;newshift&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='c1'&gt;//overflow root?&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;((&lt;/span&gt;&lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;))&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;newroot&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;Node&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;edit&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='n'&gt;newroot&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='n'&gt;newroot&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;newPath&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;edit&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt;&lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;tailnode&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='n'&gt;newshift&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='k'&gt;else&lt;/span&gt;
		&lt;span class='n'&gt;newroot&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;pushTail&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;shift&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;tailnode&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nf'&gt;PersistentVector&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;meta&lt;/span&gt;&lt;span class='o'&gt;(),&lt;/span&gt; &lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newshift&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;newroot&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;Object&lt;/span&gt;&lt;span class='o'&gt;[]{&lt;/span&gt;&lt;span class='n'&gt;val&lt;/span&gt;&lt;span class='o'&gt;});&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;For some reason, the method starts with initializing a variable &lt;code&gt;i&lt;/code&gt; that is never used (line 2). After that, we are confronted with the details of structure used to store the values.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-iv/clara-tree-tail1.png' alt='Why tail? Wasnt this supposed to be a tree?' /&gt;&lt;/p&gt;

&lt;p&gt;Trees don&amp;#8217;t have tails &amp;#8211; not even the strange computer science kind that grows downward. The tail here is not part of the tree, it&amp;#8217;s an array holding values that are not part of the tree. Up to 32 values end up in the tail, the rest goes into the tree. As the name suggests, the last elements of the vector get stored in the tail.&lt;/p&gt;

&lt;p&gt;32 is a magic number for vectors: the leaves of the tree have exactly 32 values, the other nodes have up to 32 children, and the aforementioned 32-element tail. The &lt;code&gt;tailoff&lt;/code&gt; method called in line 4 returns the greatest multiple of 32 that&amp;#8217;s less than the current length, i.e. the number of elements in stored in the tree. The field &lt;code&gt;cnt&lt;/code&gt; is the length of the vector.&lt;/p&gt;

&lt;p&gt;When there is still room in the tail, the tail array is copied into a new array (lines 6, 7) and the new value is appended (line 8). The return value is a new &lt;code&gt;PersistentVector&lt;/code&gt; instance with the new tail array, an incremented count, and otherwise the same properties as the original vector (line 9). In this case adding an element is a constant time operation.&lt;/p&gt;

&lt;p&gt;Only when the vector to which we&amp;#8217;re adding has exactly &lt;code&gt;32*n&lt;/code&gt; elements (with &lt;code&gt;n&lt;/code&gt; being a positive integer), we need to deal with the tree. In that case, a new tree node to hold the original vector&amp;#8217;s tail is created (line 13). Then we create a tree with the new node in it (lines 14-24), and we return a new instance with the updated tree, the incremented count, and a new tail array that only contains the added element (line 25).&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-iv/clara-hand-wave1.png' alt='Hey, dont just hand-wave about the tree construction' /&gt;&lt;/p&gt;

&lt;p&gt;When constructing the tree, we have two cases: if the tree is completely full at the current height, the resulting tree needs a new level (lines 18-21). Otherwise, we only need to find the right place for the new node (line 24).&lt;/p&gt;

&lt;p&gt;The field &lt;code&gt;shift&lt;/code&gt; is the height of the tree multiplied by 5. The height is represented in this form so that it can be used with the count and the capacity in bit-shift operations for additional efficiency. When we add the 33rd element, &lt;code&gt;cnt&lt;/code&gt; is 32, as that is the value of the original vector. &lt;code&gt;shift&lt;/code&gt; is 5, the initial value when there is only an empty root node. As a consequence, the root of the tree for the returned vector is constructed by calling &lt;code&gt;pushTail&lt;/code&gt;.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;private&lt;/span&gt; &lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='nf'&gt;pushTail&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;level&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;parent&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;tailnode&lt;/span&gt;&lt;span class='o'&gt;){&lt;/span&gt;
	&lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;subidx&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;((&lt;/span&gt;&lt;span class='n'&gt;cnt&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;level&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='mh'&gt;0x01f&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;ret&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;Node&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;parent&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;edit&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;parent&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;clone&lt;/span&gt;&lt;span class='o'&gt;());&lt;/span&gt;
	&lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;nodeToInsert&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;level&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;nodeToInsert&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;tailnode&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='k'&gt;else&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;Node&lt;/span&gt; &lt;span class='n'&gt;child&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Node&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='n'&gt;parent&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;subidx&lt;/span&gt;&lt;span class='o'&gt;];&lt;/span&gt;
		&lt;span class='n'&gt;nodeToInsert&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;child&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;)?&lt;/span&gt;
		                &lt;span class='n'&gt;pushTail&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;level&lt;/span&gt;&lt;span class='o'&gt;-&lt;/span&gt;&lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt;&lt;span class='n'&gt;child&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;tailnode&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		                &lt;span class='o'&gt;:&lt;/span&gt;&lt;span class='n'&gt;newPath&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;root&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;edit&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt;&lt;span class='n'&gt;level&lt;/span&gt;&lt;span class='o'&gt;-&lt;/span&gt;&lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='o'&gt;,&lt;/span&gt; &lt;span class='n'&gt;tailnode&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='n'&gt;ret&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;array&lt;/span&gt;&lt;span class='o'&gt;[&lt;/span&gt;&lt;span class='n'&gt;subidx&lt;/span&gt;&lt;span class='o'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;nodeToInsert&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;ret&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The method starts by calculating the index where to add the new node depending on the length of the original vector and the level of the tree where we are looking to insert the node (line 2). In the example of adding the 33rd value, the index of the new node is 0, as this is the first node to be pushed into the root. When we add the 65th element &amp;#8211; thus having filled up the tail for the second time &amp;#8211; we get the index 1.&lt;/p&gt;

&lt;p&gt;The return value of &lt;code&gt;pushTail&lt;/code&gt; is a new node that clones the references to all the children of the original parent node, so that the child nodes are shared between the old and the new vector (line 3). If we are at the level where the child nodes are leaves, we directly insert the node we&amp;#8217;re supposed to add at the calculated index of the returned node (lines 7, 16, 17).&lt;/p&gt;

&lt;p&gt;When the tree is already higher, we cannot add the leaf node directly to the parent node. Instead we find the child node at the calculated index (line 11) and use the recursive nature of trees to call &lt;code&gt;pushTree&lt;/code&gt; again for that subtree with a lower level (line 13). If no child node in the calculated place exists, we call &lt;code&gt;newPath&lt;/code&gt; to create a new subtree of the appropriate hight that contains nodes with exactly one child down to our newly added leaf node (line 14). These operations only copy the subtree where the new value is added. Because of the 32-children condition these are at most &lt;code&gt;O(log32 N)&lt;/code&gt; steps.&lt;/p&gt;

&lt;p&gt;This was the part of the insertion process, when we can insert into the tree without adding another level. Back in the &lt;code&gt;cons&lt;/code&gt; method, we still need to look at the case where we grow the tree in height (lines 18-21). Here we create an all new root for the returned vector (line 18) that gets the original vector&amp;#8217;s root as the first child (line 19) and a new subtree of the right height with the new leaf node created by &lt;code&gt;newPath&lt;/code&gt; as the second child (line 20). Finally, the field &lt;code&gt;shift&lt;/code&gt; for the new vector is incremented by 5 to reflect the new height (line 21). As the number of steps for this operation is bounded by the height of the tree, the complexity here is also &lt;code&gt;O(log32 N)&lt;/code&gt;, which is consequently the complexity of the whole algorithm.&lt;/p&gt;

&lt;h2 id='summary'&gt;Summary&lt;/h2&gt;

&lt;p&gt;In this post we have mainly looked at adding elements to immutable vectors. The algorithm and data-structure described here is used for both creating and growing vectors, although the creation process uses optimizations based on mutable state which we did not discuss.&lt;/p&gt;

&lt;p&gt;I want to close this post with an illustration of the structure using a before/after example. Let us assume we have a vector with 1056 elements and want to append another value calling &lt;code&gt;cons&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-iv/growing-vector1.png' alt='Growing Vector' /&gt;&lt;/p&gt;

&lt;p&gt;The old vector (shown in blue) has a root node with 32 children each holding 32 values, and a tail with 32 more values. This means the tree is full at is current level. The new vector (shown in red) has another layer in its tree. The first child of the new root is the old root, thus sharing the structure of the old vector without copying. The second child of the new root is a node that has one leaf as child which contains a reference to the old tail &amp;#8211; again, sharing not copying. The tail of the new vector contains a single value, the one we added.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>IDSC III: Lazy Seqs</title>
   <link href="http://jgre.org/2010/06/10/idsc-iii-lazy-seqs"/>
   <updated>2010-06-10T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/06/10/idsc-iii-lazy-seqs</id>
   <content type="html">&lt;p&gt;Lazy evaluation is an important concept in functional programming. Running on the JVM, Clojure does not support general laziness, but it has a data-structure abstraction called &lt;em&gt;lazy sequence&lt;/em&gt; that provides for many of the benefits of the more general strategy.&lt;/p&gt;

&lt;p&gt;This post is a slight detour in the course of the &lt;a href='/2010/05/idsc.html'&gt;immutable data-structure canon&lt;/a&gt;. In the &lt;a href='/2010/05/idsc-2.html'&gt;previous installation&lt;/a&gt;, I described how some operations on lists return lazy sequences to retain immutability and performance characteristics. As this behavior is not specific to lists, and indeed fundamental to all Clojure data-structures, I decided that lazy seqs deserve a closer look.&lt;/p&gt;

&lt;h2 id='the_rationale_of_lazy_evaluation'&gt;The Rationale of Lazy Evaluation&lt;/h2&gt;

&lt;p&gt;In many cases it might not be necessary to use all the values in a data-structure. But it might be hard to determine what is required when the structure is defined, as the creation and the consumption could be in different parts of the program. Laziness allows you to write a general definition and later use only what you need without incurring the cost for needless computations.&lt;/p&gt;

&lt;p&gt;Laziness also allows for an interesting special case: &lt;em&gt;infinite sequences&lt;/em&gt;. You can specify the computation of elements for a series (e.g. Fibonacci numbers) and use parts of that series without problems, as long as you do not try to traverse the entire sequence.&lt;/p&gt;

&lt;h2 id='postponing_calculations'&gt;Postponing Calculations&lt;/h2&gt;

&lt;p&gt;&lt;img src='/images/idsc-iii/lazy-guy.png' alt='Laziness Illustration' /&gt; (c) iStockphoto.com&lt;/p&gt;

&lt;p&gt;In a language with general support for lazy evaluation like Haskell, the compiler takes care that nothing gets evaluated before it is needed. In Clojure, however, eager evaluation is the norm. That means that when you call a function with the result of another function-call as a parameter, that second function gets evaluated immediately. In a lazy language, the second function would only be evaluated when the first function needs the respective parameter.&lt;/p&gt;

&lt;p&gt;For this reason, &lt;code&gt;lazy-seq&lt;/code&gt; (which as we saw last time is used to define lazy sequences), cannot be implemented as a function &amp;#8211; the body of what we pass to it must not be evaluated at the time of definition. The solution is a construct often cited as the most powerful feature in Lisp: &lt;em&gt;macros&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Macros are the extension mechanism of Clojure (and Lisp in general); they allow you as a programmer to add features to the language. This makes it possible to have a very small core language, but still provide for pleasant programming. Unlike functions, macros do not evaluate their arguments immediately. When Clojure comes across code that uses a macro it first &lt;em&gt;expands&lt;/em&gt; the macro and replaces the code with the result before the compilation proceeds normally.&lt;/p&gt;

&lt;p&gt;The definition of the &lt;code&gt;lazy-seq&lt;/code&gt; macro in &lt;a href='http://github.com/richhickey/clojure/blob/master/src/clj/clojure/core.clj'&gt;core.clj&lt;/a&gt; is surprisingly short:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;defmacro &lt;/span&gt;&lt;span class='nv'&gt;lazy-seq&lt;/span&gt;
  &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;&amp;amp;&lt;/span&gt; &lt;span class='nv'&gt;body&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
  &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;list &lt;/span&gt;&lt;span class='ss'&gt;&amp;#39;new&lt;/span&gt; &lt;span class='ss'&gt;&amp;#39;clojure&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;lang&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;LazySeq&lt;/span&gt;
    &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;list* &lt;/span&gt;&lt;span class='ss'&gt;&amp;#39;^&lt;/span&gt;&lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;:once&lt;/span&gt; &lt;span class='nv'&gt;true&lt;/span&gt;&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='nv'&gt;fn*&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt; &lt;span class='nv'&gt;body&lt;/span&gt;&lt;span class='p'&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As macro expansion and the corresponding escaping rules are beyond the scope here, let it suffice to say that &lt;code&gt;(lazy-seq BODY)&lt;/code&gt; would expand to&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;new &lt;/span&gt;&lt;span class='nv'&gt;clojure&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;lang&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='nv'&gt;LazySeq&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;fn*&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt; &lt;span class='nv'&gt;BODY&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The body that defines the seq is wrapped into an anonymous function and passed to the constructor of the &lt;code&gt;LazySeq&lt;/code&gt; class. (&lt;code&gt;new&lt;/code&gt; is a special form for Java interoperability that allows us to construct a Java object.) The main part of the implementation of lazy seqs is in Java, more specifically in the aforementioned class &lt;code&gt;clojure.lang.LazySeq&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;LazySeq&lt;/code&gt; has three fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fn&lt;/code&gt;: a function object to store the definition body,&lt;/li&gt;

&lt;li&gt;&lt;code&gt;sv&lt;/code&gt;: a generic object to store the computed value, and&lt;/li&gt;

&lt;li&gt;&lt;code&gt;s&lt;/code&gt;: a representation of the current view of the sequence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When we construct a &lt;code&gt;LazySeq&lt;/code&gt; instance, only the function object is assigned a value.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-iii/clara-caching2.png' alt='What are the other two fields for?' /&gt;&lt;/p&gt;

&lt;p&gt;For performance reasons, &lt;code&gt;LazySeq&lt;/code&gt; does not simply call the function object and return its result, when the contents are accessed. Instead the other two fields are used to do some caching.&lt;/p&gt;

&lt;h2 id='caching'&gt;Caching&lt;/h2&gt;

&lt;p&gt;To understand how the access works, let us look at the &lt;code&gt;first&lt;/code&gt; method in &lt;a href='http://github.com/richhickey/clojure/blob/master/src/jvm/clojure/lang/LazySeq.java'&gt;LazySeq.java&lt;/a&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;public&lt;/span&gt; &lt;span class='n'&gt;Object&lt;/span&gt; &lt;span class='nf'&gt;first&lt;/span&gt;&lt;span class='o'&gt;(){&lt;/span&gt;
	&lt;span class='n'&gt;seq&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;s&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;s&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;first&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;seq&lt;/code&gt; method that is called in line 2, is the key to how access to lazy seqs works. It is also called from all the other access methods (&lt;code&gt;count&lt;/code&gt;, &lt;code&gt;more&lt;/code&gt;, &lt;code&gt;cons&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='kd'&gt;final&lt;/span&gt; &lt;span class='kd'&gt;synchronized&lt;/span&gt; &lt;span class='kd'&gt;public&lt;/span&gt; &lt;span class='n'&gt;ISeq&lt;/span&gt; &lt;span class='nf'&gt;seq&lt;/span&gt;&lt;span class='o'&gt;(){&lt;/span&gt;
	&lt;span class='n'&gt;sval&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
	&lt;span class='k'&gt;if&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;sv&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
		&lt;span class='o'&gt;{&lt;/span&gt;
		&lt;span class='n'&gt;Object&lt;/span&gt; &lt;span class='n'&gt;ls&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;sv&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='n'&gt;sv&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
		&lt;span class='k'&gt;while&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;ls&lt;/span&gt; &lt;span class='k'&gt;instanceof&lt;/span&gt; &lt;span class='n'&gt;LazySeq&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;
			&lt;span class='o'&gt;{&lt;/span&gt;
			&lt;span class='n'&gt;ls&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;((&lt;/span&gt;&lt;span class='n'&gt;LazySeq&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt;&lt;span class='n'&gt;ls&lt;/span&gt;&lt;span class='o'&gt;).&lt;/span&gt;&lt;span class='na'&gt;sval&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
			&lt;span class='o'&gt;}&lt;/span&gt;
		&lt;span class='n'&gt;s&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;RT&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;seq&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;ls&lt;/span&gt;&lt;span class='o'&gt;);&lt;/span&gt;
		&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;s&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;sval&lt;/code&gt; method called from line 2 executes the function object and stores the result in the &lt;code&gt;sv&lt;/code&gt; field. &lt;code&gt;sval&lt;/code&gt; also takes care that the function is not executed more than once. In line 5 the calculated value is assigned to the temporary variable &lt;code&gt;ls&lt;/code&gt;, and the field for the calculated value is set to &lt;code&gt;null&lt;/code&gt; in line 6. Lines 7-10 resolve nested lazy seqs until we get a value of a different type assigned to the temporary variable &lt;code&gt;ls&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The return value is computed and assigned to the field &lt;code&gt;s&lt;/code&gt; in line 11. &lt;code&gt;RT&lt;/code&gt; is the runtime class that provides the fundamental Clojure functions for the Java code implemented as static methods. The &lt;code&gt;seq&lt;/code&gt; method in &lt;code&gt;RT&lt;/code&gt; turns its parameter into a sequence if it implements the &lt;code&gt;Iterable&lt;/code&gt; Java interface.&lt;/p&gt;

&lt;p&gt;This computation happens only once. In subsequent calls to &lt;code&gt;seq&lt;/code&gt; the cached value in field &lt;code&gt;s&lt;/code&gt; is returned, as &lt;code&gt;sv&lt;/code&gt; is set to &lt;code&gt;null&lt;/code&gt; (line 6) so that the condition in line 3 is false and the block in lines 4-12 is not executed. The &lt;code&gt;sval&lt;/code&gt; method does not set &lt;code&gt;sv&lt;/code&gt; again in later invokations either.&lt;/p&gt;

&lt;h2 id='the__example'&gt;The &lt;code&gt;concat&lt;/code&gt; Example&lt;/h2&gt;

&lt;p&gt;To clarify what&amp;#8217;s happening in &lt;code&gt;seq&lt;/code&gt;, let us go through an example. We&amp;#8217;ll use an abbreviated version of &lt;code&gt;concat&lt;/code&gt;.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;defn &lt;/span&gt;&lt;span class='nv'&gt;concat&lt;/span&gt;
  &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;x&lt;/span&gt; &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
    &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;lazy-seq&lt;/span&gt;
      &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;let &lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;seq &lt;/span&gt;&lt;span class='nv'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)]&lt;/span&gt;
        &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;if &lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt;
          &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;cons &lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;first &lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;concat &lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;rest &lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;)))&lt;/span&gt;
          &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;))))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;What happens when we take the seq resulting from a concatenation and call &lt;code&gt;first&lt;/code&gt; on it? Via the &lt;code&gt;first&lt;/code&gt; method we get to &lt;code&gt;seq&lt;/code&gt; which in turn calls &lt;code&gt;sval&lt;/code&gt;, so that the body gets executed. The body of the &lt;code&gt;lazy-seq&lt;/code&gt; use in &lt;code&gt;concat&lt;/code&gt; is the call to &lt;code&gt;cons&lt;/code&gt; (line 6). The result of &lt;code&gt;cons&lt;/code&gt; is an instance of &lt;code&gt;clojure.lang.Cons&lt;/code&gt; &amp;#8211; an abstraction with list semantics, i.e. it has a head and a tail. The &lt;code&gt;Cons&lt;/code&gt; is assigned to the &lt;code&gt;sv&lt;/code&gt; field and &amp;#8211; in the &lt;code&gt;seq&lt;/code&gt; method &amp;#8211; to the temporary variable &lt;code&gt;ls&lt;/code&gt; (line 5). The loop for nested lazy seqs (lines 7-10) is skipped, because a &lt;code&gt;Cons&lt;/code&gt; is not an instance of &lt;code&gt;LazySeq&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In line 11, the &lt;code&gt;Cons&lt;/code&gt; gets assigned to the field &lt;code&gt;s&lt;/code&gt; (&lt;code&gt;RT.seq&lt;/code&gt; does not do anything to it). Back to &lt;code&gt;first&lt;/code&gt;. &lt;code&gt;s&lt;/code&gt; is not &lt;code&gt;null&lt;/code&gt;, so we call &lt;code&gt;first&lt;/code&gt; on the &lt;code&gt;Cons&lt;/code&gt; and return the result. The first element of the &lt;code&gt;Cons&lt;/code&gt; is the first element of the concatenated seq, just as expected.&lt;/p&gt;

&lt;p&gt;Accessing the tail of the seq, happens analogously; the tail of the &lt;code&gt;Cons&lt;/code&gt; is another lazy seq due to the recursive definition of &lt;code&gt;concat&lt;/code&gt; which we discussed last time.&lt;/p&gt;

&lt;h2 id='summary'&gt;Summary&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Lazy sequences allow computations to be postponed. Even infinite sequences are possible&lt;/li&gt;

&lt;li&gt;Lazy seqs are created using the &lt;code&gt;lazy-seq&lt;/code&gt; macro&lt;/li&gt;

&lt;li&gt;Using a macro avoids immediately evaluating the definition of the seq&lt;/li&gt;

&lt;li&gt;The implementation of lazy seqs is in the Java class &lt;code&gt;clojure.lang.LazySeq&lt;/code&gt;&lt;/li&gt;

&lt;li&gt;Lazy seqs cache the results of the computation&lt;/li&gt;
&lt;/ul&gt;</content>
 </entry>
 
 <entry>
   <title>The Design Of Design</title>
   <link href="http://jgre.org/2010/06/04/the-design-of-design"/>
   <updated>2010-06-04T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/06/04/the-design-of-design</id>
   <content type="html">&lt;blockquote&gt;
&lt;p&gt;Mediocre design provably wastes the world&amp;#8217;s resources, corrupts the environment, affects internatonal competitiveness. Design is important; teaching design is important.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Fred Brooks, &amp;#8220;The Design of Design&amp;#8221;. Page x&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fredrick Brooks, the author of the &lt;a href='http://en.wikipedia.org/wiki/The_Mythical_Man-Month'&gt;&amp;#8220;Mythical Man Month&amp;#8221;&lt;/a&gt; and the project manager of the &lt;a href='http://en.wikipedia.org/wiki/IBM_System/360'&gt;IBM System/360&lt;/a&gt; (the most successful mainframe computer series back in the day), has written a new book called &amp;#8220;The Design of Design&amp;#8221;. Like &amp;#8220;The Mythical Man Month&amp;#8221;, it is a collection of essays, but in this one focus is on how good design can be achieved.&lt;/p&gt;

&lt;p&gt;While computer-related projects are a main source of examples in the book, Brooks also draws from experience in designing buildings and other projects. The book is aimed at designers and project managers of many kinds, but computer science is clearly Brooks&amp;#8217;s home discipline.&lt;/p&gt;

&lt;p&gt;The essays are by no means how-tos trying to teach you a particular design process. Nor are there any singularly novel ideas you have never heard before, if you have some background in designing software. The value of the book is that it makes you think about the different aspects of design. The texts are well written and opinionated, inviting you to look at the issues presented in them from angles that are different from what you might be used to. In the rest of this post, I summarize some ideas that resonated with me, but I really recommend reading the book.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/design-of-design/brooks.jpeg' alt='Fred Brooks' /&gt; Fred Brooks (c) sd&amp;amp;m&lt;/p&gt;

&lt;p&gt;The book is organized in six parts, I follow that structure for my notes here (not all parts are mentioned, however).&lt;/p&gt;

&lt;h2 id='models_of_designing'&gt;Models of Designing&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The hardest part of design is deciding what to design&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 22&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the first part of the book, Brooks explains that he rejects the waterfall model, because it is impossible to know enough about requirements and other factors influencing the design up front.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Rational Model, in any of its forms [the waterfall model being one of them], leads us to demand up-front statements of design requirements. It leads us to believe that such can be formulated. [&amp;#8230;]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;The Waterfall Model is wrong and harmful; we must outgrow it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Pages 33,34&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is old news to anyone sympathizing with agile styles of design, but it is interesting to hear it from someone who worked on mainframes for IBM in the 50ies and 60ies.&lt;/p&gt;

&lt;p&gt;The rejection of the waterfall model raises two questions: (1) why are project plans still often based on it and (2) what is a better model?&lt;/p&gt;

&lt;p&gt;There are two plausible answers to the first question. It seems obvious, clean and strait-forward. But more importantly, most projects involve some sort of contract between the designer and the customer, where the customer would not be to happy about an open-ended, pay by the hour, it&amp;#8217;s-done-when-it&amp;#8217;s-done deal.&lt;/p&gt;

&lt;p&gt;Brooks proposes decoupling the contracts for design from the contracts for implementation, but concedes that this is not a complete solution especially in software, where the boundaries between design and construction are blurry.&lt;/p&gt;

&lt;p&gt;So what about a better model? The purpose of a model for design is to serve as a means of teaching and as map to answer the question &amp;#8220;where are we?&amp;#8221; in the course of a project.&lt;/p&gt;

&lt;p&gt;Brooks has no definitive and final answer to this question, but he is in favor of something based on the &lt;a href='http://portal.acm.org/citation.cfm?doid=12944.12948'&gt;spiral&lt;/a&gt; &lt;a href='http://en.wikipedia.org/wiki/Spiral_model'&gt;model&lt;/a&gt; proposed by Barry Boehm. In this model, development repeatedly go through stages of planning, determining requirements and contraints, prototyping, risk analysis, and verification.&lt;/p&gt;

&lt;h2 id='collaboration_and_telecollaboration'&gt;Collaboration and Telecollaboration&lt;/h2&gt;

&lt;p&gt;In the second part, Brooks questions the notion that collaboration and design in teams are good per se.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It is generally assumed that collaboration is, in and of itself, a &amp;#8220;good thing.&amp;#8221; &amp;#8220;Plays well with others&amp;#8221; is high praise from kindergarten onward. &amp;#8220;All of us are smarter than any of us.&amp;#8221; &amp;#8220;The more participation in design, the better.&amp;#8221; Now, these attractive propositions are far from self-evident. I will argue that they are not &lt;em&gt;universally&lt;/em&gt; true.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 64&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Although there are good reasons from collaborative design such as technological complexity and time contraints, the challenge is conceptual integrity. As a consequence, Brooks argues that a design team should always have one system architect who calls shots in favor of a consistent concept.&lt;/p&gt;

&lt;h2 id='design_perspectives'&gt;Design Perspectives&lt;/h2&gt;

&lt;p&gt;The third part of the book contains essays with ideas that are similar to those of agile development.&lt;/p&gt;

&lt;p&gt;Brooks starts with an interesting perspective on design where he compares different approaches to the philosophical schools of &lt;em&gt;empiricism&lt;/em&gt; and &lt;em&gt;rationalism&lt;/em&gt;. Rationalists believe that given sufficient experience and careful consideration, you can come up with a perfect design. The empiricist view on the other hand is that anything you design, no matter how well considered, will be flawed. The flaws need to be fixed by trial and error.&lt;/p&gt;

&lt;p&gt;Brooks is clearly in the empiricist camp:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can I, by sufficient thought alone, design a complex object correctly? No; testing and iteration are in practice necessary. But careful thought helps.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 109&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An important factor for the design is the model you have of your product&amp;#8217;s user. While it is impractical to know everything about the users, Brooks argues that you should explicitly state the assumptions you make in great detail, even if some of them are wrong.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Wrong explicit assumptions are much better than vague ones. Wrong ones will perhaps be questioned; vague ones won&amp;#8217;t.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 117&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Reminding me of &lt;a href='http://gettingreal.37signals.com/ch03_Embrace_Constraints.php'&gt;37signals&lt;/a&gt; he supports the notion that constraints are your friends.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Since constraints are the designer&amp;#8217;s friend, if the task originally seems unconstrained, first think harder about what is really desired, about the user and use models, and you will probably find some narrowing constraints, to the benefit of both designer and user.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 135&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Brooks than goes on to examine esthetics and style in technical design. In his analysis &amp;#8220;logical beauty&amp;#8221; comes from &lt;em&gt;parsimony&lt;/em&gt;, &lt;em&gt;structural clarity&lt;/em&gt;, and &lt;em&gt;consistency&lt;/em&gt;. Parsimony roughly means &amp;#8220;accomplishing a great deal with few elements&amp;#8221;. Parsimony alone, however, is not enough. Without any redundancy, as design can be cryptic and infective.&lt;/p&gt;

&lt;p&gt;According to Brooks, &amp;#8220;consistency underlies all principles of quality.&amp;#8221;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A good architecture is consistent in the sense that, given partial knowledge of the system, one can predict the remainder.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 143&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three design principles are important for consistency: &lt;strong&gt;orthogonality (do not link what is independent), propriety (do not introduce what is immaterial), and generality (do not restrict what is inherent)&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 id='great_designers'&gt;Great Designers&lt;/h2&gt;

&lt;p&gt;In the fifth part, Brooks examines the value of design processes, individual designers, and education in design. He maintains that &amp;#8220;great designs come from great designers, not from great design processes.&amp;#8221;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I believe that standard corporate product design processes do indeed work against truly great and innovative design.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 233&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This does not mean, however, that he is rejects processes entirely. The key is some middle ground where processes prevent grave errors, but are not followed to the letter stifling creativity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;First and foremost, the top leader of the organization must passionately want innovative products with great designs&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;Page 238&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Concerning education, the most important idea in the book is that you should learn by &lt;em&gt;critiqued practice&lt;/em&gt;, instead of getting taught the basics before starting to make stuff. Another component in learning is teaching to others. A third part in the education of a designer should be studying exemplars &amp;#8211; designs by other designers.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>IDSC II: Lists</title>
   <link href="http://jgre.org/2010/05/27/idsc-ii-lists"/>
   <updated>2010-05-27T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/05/27/idsc-ii-lists</id>
   <content type="html">&lt;p&gt;Lists are the simplest data-structures that can easily grow. In this post, as part of the &lt;a href='/2010/05/idsc.html'&gt;immutable data-structure canon&lt;/a&gt;, we&amp;#8217;re going to look at how immutable lists are implemented in Clojure.&lt;/p&gt;

&lt;p&gt;Let us review the definition of list in general. A list is either empty or has a &lt;em&gt;head&lt;/em&gt; and a &lt;em&gt;tail&lt;/em&gt;. The head contains a data element and the tail is another list. A simple recursive definition. The list &lt;code&gt;(1 2 3)&lt;/code&gt; for example has &lt;code&gt;1&lt;/code&gt; as its head and the list &lt;code&gt;(2 3)&lt;/code&gt; as tail. The head of that list is &lt;code&gt;2&lt;/code&gt; and the tail is the list &lt;code&gt;(3)&lt;/code&gt;. Finally, that list has &lt;code&gt;3&lt;/code&gt; as head and the empty list as tail.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-ii/list-1.png' alt='List Illustration' /&gt;&lt;/p&gt;

&lt;p&gt;This structure is implemented as &lt;a href='http://github.com/richhickey/clojure/blob/3da8a12112332d15a91b140fab5e535f0d2528e8/src/jvm/clojure/lang/PersistentList.java'&gt;clojure.lang.PersistentList&lt;/a&gt; in the Java part of Clojure.&lt;/p&gt;

&lt;p&gt;We can create a list using the &lt;code&gt;list&lt;/code&gt; function&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;l1&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;list &lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;We also saved the list in a binding named &lt;code&gt;l1&lt;/code&gt;. The general convention (at least in Lisps) is that adding elements happens at the front of the list. To add an element, we use &lt;code&gt;cons&lt;/code&gt;.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;l2&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;cons &lt;/span&gt;&lt;span class='mi'&gt;42&lt;/span&gt; &lt;span class='nv'&gt;l1&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='nv'&gt;l2&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;42&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='nv'&gt;l1&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;&lt;img src='/images/idsc-ii/list-21.png' alt='List Illustration' /&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cons&lt;/code&gt; returns a list with the added element, but &lt;code&gt;l&lt;/code&gt; still contains only &lt;code&gt;1 2 3&lt;/code&gt;. This property is called &lt;em&gt;persistence&lt;/em&gt;. Conceptually this is easy. We create a new list with &lt;code&gt;42&lt;/code&gt; as the head and what we called &lt;code&gt;l&lt;/code&gt; as the tail. &lt;code&gt;l&lt;/code&gt; still refers to the same list as before.&lt;/p&gt;

&lt;p&gt;Because the addition does not have to go through the existing list, this operation takes constant time.&lt;/p&gt;

&lt;p&gt;Given the recursive definition of list, immutably and persistently removing elements from the front of the list is simple as well. Things get trickier when we concatenate two lists.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;ls&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;concat &lt;/span&gt;&lt;span class='nv'&gt;l1&lt;/span&gt; &lt;span class='nv'&gt;l2&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='nv'&gt;l1&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='nv'&gt;l2&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;42&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='nv'&gt;ls&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt; &lt;span class='mi'&gt;42&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;In a mutable implementation, we could simply change the last element of the first list to refer to the second list as tail, but that would destroy the persistence of &lt;code&gt;l1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-ii/list-31.png' alt='List Illustration' /&gt;&lt;/p&gt;

&lt;p&gt;What is the performance expectation here? You might be tempted to say, that it is constant, because we only need to change one pointer. But remember, that we are talking about a singly-linked list. That means, we can only find the last element of &lt;code&gt;l1&lt;/code&gt; by traversing the whole list. Thus the operation takes linear time.&lt;/p&gt;

&lt;p&gt;Back to our original question, how do we concatenate two lists without modifying them?&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/idsc-ii/clara-copy1.png' alt='Then we need to copy the lists, right?' /&gt;&lt;/p&gt;

&lt;p&gt;For Clojure the answer is both yes and no.&lt;/p&gt;

&lt;p&gt;Clojure&amp;#8217;s solution here could be described as &amp;#8220;copying light&amp;#8221;. We create a new data-structure that has new entries for all the elements, but each entry is only really created when it is used. This is called a &lt;em&gt;lazy sequence&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A lazy sequence behaves like a normal collection. The special power of lazy seqs is that they don&amp;#8217;t contain all the data elements, they only know how they are generated. This means that the values in the seq are only created when they are accessed, allowing expensive calculations to be postponed. Lazy seqs can even represent infinite data sets &amp;#8211; as long as the data-structure is not traversed completely. We will look into this construct in greater depth in the next part of this series.&lt;/p&gt;

&lt;p&gt;Back to our immutable lists. How do lazy seqs help with concatenating two lists? The &lt;code&gt;concat&lt;/code&gt; function does not actually return a lists, it returns a lazy seq. When we go through the list, a new list entry is created. Thus for the whole list, we get a newly created entry of each element of both lists, resulting in linear complexity. The data elements, however, are not copied as we can be sure that they are also values and thus won&amp;#8217;t change.&lt;/p&gt;

&lt;p&gt;Here is the implementation of &lt;code&gt;concat&lt;/code&gt; in &lt;a href='http://github.com/richhickey/clojure/blob/master/src/clj/clojure/core.clj'&gt;core.clj&lt;/a&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;defn &lt;/span&gt;&lt;span class='nv'&gt;concat&lt;/span&gt;
  &lt;span class='s'&gt;&amp;quot;Returns a lazy seq representing the concatenation of the elements in the supplied colls.&amp;quot;&lt;/span&gt;
  &lt;span class='p'&gt;([]&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;lazy-seq&lt;/span&gt; &lt;span class='nv'&gt;nil&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
  &lt;span class='p'&gt;([&lt;/span&gt;&lt;span class='nv'&gt;x&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;lazy-seq&lt;/span&gt; &lt;span class='nv'&gt;x&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
  &lt;span class='p'&gt;([&lt;/span&gt;&lt;span class='nv'&gt;x&lt;/span&gt; &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
    &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;lazy-seq&lt;/span&gt;
      &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;let &lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;seq &lt;/span&gt;&lt;span class='nv'&gt;x&lt;/span&gt;&lt;span class='p'&gt;)]&lt;/span&gt;
        &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;if &lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt;
          &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;if &lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;chunked-seq?&lt;/span&gt; &lt;span class='nv'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
            &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;chunk-cons&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;chunk-first&lt;/span&gt; &lt;span class='nv'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;concat &lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nf'&gt;chunk-rest&lt;/span&gt; &lt;span class='nv'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
            &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;cons &lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;first &lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;concat &lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;rest &lt;/span&gt;&lt;span class='nv'&gt;s&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;)))&lt;/span&gt;
          &lt;span class='nv'&gt;y&lt;/span&gt;&lt;span class='p'&gt;))))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;(I&amp;#8217;ve omitted the implementation of concatenating more than two lists, as that is not essential to this discussion.)&lt;/p&gt;

&lt;p&gt;Let us walk through this code. There are three definitions of of the function here, the dispatch depends on arity (i.e. with how many parameters this function is called). Lines 3 and 4 handle the cases that &lt;code&gt;concat&lt;/code&gt; is called with zero or one parameters the obvious way, by just returning a lazy seq with the same contents that were passed as parameters.&lt;/p&gt;

&lt;p&gt;Starting in line 5, we have the &amp;#8220;real&amp;#8221; case for &lt;code&gt;concat&lt;/code&gt;: two parameters. Line 6 tells us that the result is a lazy seq. Lines 7-12 are only executed when an element of the list is read &amp;#8211; that&amp;#8217;s the laziness.&lt;/p&gt;

&lt;p&gt;Line 7 makes sure the rest of the function is working with a sequence type, and line 8 makes sure that the first list to concatenate is not nil, otherwise the second list is returned in line 12.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://clojure.googlegroups.com/web/chunks.pdf?gda=WIF8ADwAAAC-wnUK1KQ919yJcmM1ACuZUsYXlXWR5Y8qvjzEXQCX1uwyCdwt79_BXi8-B36MGsn9Wm-ajmzVoAFUlE7c_fAt'&gt;Chunked seqs&lt;/a&gt; are an optimization that we&amp;#8217;ll ignore here, so we can ignore lines 9 and 10. That leaves us with line 11. Here, we take the head of the first list, and add it (using &lt;code&gt;cons&lt;/code&gt;) to the result of a recursive call to &lt;code&gt;concat&lt;/code&gt; passing the tail of the first list and the second list.&lt;/p&gt;

&lt;p&gt;When we go through the above example of concatenating &lt;code&gt;(1 2 3)&lt;/code&gt; and &lt;code&gt;(42 1 2 3)&lt;/code&gt;, we get a lazy seq back immediately, without the function looking at any of the elements. Only when we go through that result seq, lines 7-12 get invoked. For the first element, we get an entry that contains &lt;code&gt;1&lt;/code&gt; as head and another lazy seq &amp;#8211; the result of &lt;code&gt;(concat (list 2 3) (list 42 1 2 3))&lt;/code&gt; &amp;#8211; as the tail. Getting the second happens the same way, as we resolve the second lazy seq. The third, ditto. After the third element, the first list passed to the inner call to &lt;code&gt;concat&lt;/code&gt;is empty, so that the else part of the &lt;code&gt;if&lt;/code&gt; in line 8 kicks in, and the result of the call is simply the second list (line 12).&lt;/p&gt;

&lt;h2 id='in_summary'&gt;In summary&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Lists in clojure are immutable&lt;/li&gt;

&lt;li&gt;By default, elements are added to the front of the list&lt;/li&gt;

&lt;li&gt;The linked structure makes it easy to implement immutable adding of elements&lt;/li&gt;

&lt;li&gt;Concatenating lists results in a &lt;em&gt;lazy sequence&lt;/em&gt; where new list entries are created &amp;#8220;on-demand&amp;#8221;&lt;/li&gt;

&lt;li&gt;The data elements are not copied, as they are immutable values&lt;/li&gt;
&lt;/ul&gt;</content>
 </entry>
 
 <entry>
   <title>The Immutable Data-Structure Canon</title>
   <link href="http://jgre.org/2010/05/20/the-immutable-data-structure-canon"/>
   <updated>2010-05-20T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/05/20/the-immutable-data-structure-canon</id>
   <content type="html">&lt;p&gt;In the April issue of Communications of the ACM, George V. Neville-Neil (a.k.a. Kode Vicious) reviews the fundamental data-structures &amp;#8211; array, list, tree, and hash table. Because of the importance of data-structures for all programming, he calls it the &amp;#8220;data-structure canon&amp;#8221; (requires subscription). Taking a cue from that article, I&amp;#8217;m starting a series of posts about a special kind of implementation of the basic data-structures: the immutable lists, vectors, and maps in &lt;a href='http://clojure.org/'&gt;Clojure&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='clojure'&gt;Clojure&lt;/h2&gt;

&lt;p&gt;Clojure is a Lisp dialect for the Java Virtual Machine with a focus on functional programming. It has strong support for clean multithreaded design and a key ingredient are immutable, persistent data-structures.&lt;/p&gt;

&lt;p&gt;Immutability is important for clean multithreaded programs, because it allows you to separate &lt;em&gt;values&lt;/em&gt; from &lt;em&gt;identities&lt;/em&gt;. Rich Hickey, Clojure&amp;#8217;s creator, &lt;a href='http://clojure.org/state'&gt;defines these terms&lt;/a&gt; as follows:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;By identity I mean a stable logical entity associated with a series of different values over time. Models need identity for the same reasons humans need identity &amp;#8211; to represent the world. How could it work if identities like &amp;#8216;today&amp;#8217; or &amp;#8216;America&amp;#8217; had to represent a single constant value for all time? Note that by identities I don&amp;#8217;t mean names (I call my mother Mom, but you wouldn&amp;#8217;t).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;So, for this discussion, an identity is an entity that has a state, which is its value at a point in time. And a value is something that doesn&amp;#8217;t change. 42 doesn&amp;#8217;t change. June 29th 2008 doesn&amp;#8217;t change. Points don&amp;#8217;t move, dates don&amp;#8217;t change, no matter what some bad class libraries may cause you to believe. Even aggregates are values. The set of my favorite foods doesn&amp;#8217;t change, i.e. if I prefer different foods in the future, that will be a different set.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you have pure functions dealing with values, problems with multiple threads simply go away, because there is no danger of a value being changed from under a function&amp;#8217;s backside. Of course, the world is not as easy as that, and programs need to model state that can change.&lt;/p&gt;

&lt;p&gt;In Clojure, an identity can be in different states &amp;#8211; i.e. have different values &amp;#8211; over time. Identities are implemented as atomic references to values using so called &lt;a href='http://clojure.org/Refs'&gt;Refs&lt;/a&gt; and &lt;a href='http://clojure.org/Agents'&gt;Agents&lt;/a&gt;. But these constructs do not concern us in this series, I mentioned them only to give a background on the rationale for using immutable data-structures to model values.&lt;/p&gt;

&lt;p&gt;Abstractly speaking, new values are &lt;em&gt;functions&lt;/em&gt; of old values. For example, let us define a list and bind it to the name &lt;code&gt;l1&lt;/code&gt;.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;l1&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;list &lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;When we want to add another element, we do not modify the existing list, but we call the function &lt;code&gt;cons&lt;/code&gt; create new list with one more element.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='clojure'&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;def &lt;/span&gt;&lt;span class='nv'&gt;l2&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;cons &lt;/span&gt;&lt;span class='mi'&gt;42&lt;/span&gt; &lt;span class='nv'&gt;l1&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='nv'&gt;l2&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;42&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='nv'&gt;l1&lt;/span&gt;
&lt;span class='nv'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If we had passed &lt;code&gt;l1&lt;/code&gt; to a function in another thread, it could happily work on the initial three elements without being bothered by our addition of &lt;code&gt;42&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Your initial impression is probably that there is a lot of copying involved in the implementation of the immutable data-structures, but that ain&amp;#8217;t so. Clojure&amp;#8217;s data-structures maintain close to the same performance guarantees that their mutable counterparts make.&lt;/p&gt;

&lt;p&gt;In the following posts, we look at the implementation of lists, vectors, and maps one by one.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>More on HTML5</title>
   <link href="http://jgre.org/2010/05/16/more-on-html5"/>
   <updated>2010-05-16T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/05/16/more-on-html5</id>
   <content type="html">&lt;p&gt;Some more links about the state of affairs around HTML5.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;a href='http://blog.mefeedia.com/html5-video-stats'&gt;survey&lt;/a&gt; of video on the web shows that 26% of videos are available through HTML5 video tags, up from 10% in January. (via the &lt;a href='http://www.theregister.co.uk/2010/05/16/mefeedia_html5_survey/'&gt;Register&lt;/a&gt;)&lt;/li&gt;

&lt;li&gt;An &lt;a href='http://html5readiness.com/'&gt;overview&lt;/a&gt; of which browsers support which features of HTML5 and CSS3&lt;/li&gt;

&lt;li&gt;A &lt;a href='http://alteredqualia.com/canvasmol/'&gt;visualization&lt;/a&gt; of Molecules using the canvas tag&lt;/li&gt;
&lt;/ul&gt;</content>
 </entry>
 
 <entry>
   <title>How Hard Can It Be to Read a File?</title>
   <link href="http://jgre.org/2010/05/09/how-hard-can-it-be-to-read-a-file"/>
   <updated>2010-05-09T00:00:00+02:00</updated>
   <id>http://jgre.org/2010/05/09/how-hard-can-it-be-to-read-a-file</id>
   <content type="html">&lt;p&gt;Reading code is good for you. Everyone knows this, but few actually follow the advice. It&amp;#8217;s like healthy eating: Nobody would oppose it, but it is often ignored out of inertia. I&amp;#8217;m certainly guilty of skipping my healthy dose of code reading in the past, but I&amp;#8217;m going to make a habit out of it from now on. To support the process, I will write about the code I&amp;#8217;m reading.&lt;/p&gt;

&lt;p&gt;&lt;img src='/images/2010-05-09/veggies.jpg' alt='Healthy Vegetables' /&gt; (c)iStockphoto.com/skodonnell&lt;/p&gt;

&lt;p&gt;For the first exercise, I&amp;#8217;m going to take a look at a simple bit of library code: reading a file into a string. I choose this example, because it turns out that reading a file with Java is more involved than I though it should be.&lt;/p&gt;

&lt;p&gt;Coming from Ruby I&amp;#8217;m used to being able to read the contents of a file into a string using one line of code like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ruby'&gt;&lt;span class='n'&gt;str&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='no'&gt;File&lt;/span&gt;&lt;span class='o'&gt;::&lt;/span&gt;&lt;span class='n'&gt;read&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;bla.txt&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The simplest solution I found for Java is this&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='java'&gt;&lt;span class='n'&gt;String&lt;/span&gt; &lt;span class='n'&gt;str&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;try&lt;/span&gt; &lt;span class='o'&gt;{&lt;/span&gt;
	&lt;span class='n'&gt;BufferedReader&lt;/span&gt; &lt;span class='n'&gt;in&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;BufferedReader&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;FileReader&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;bla.txt&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;));&lt;/span&gt;
	&lt;span class='n'&gt;String&lt;/span&gt; &lt;span class='n'&gt;line&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='k'&gt;while&lt;/span&gt;&lt;span class='o'&gt;((&lt;/span&gt;&lt;span class='n'&gt;line&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;in&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;readLine&lt;/span&gt;&lt;span class='o'&gt;())&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='kc'&gt;null&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='o'&gt;{&lt;/span&gt;
  		&lt;span class='n'&gt;str&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='n'&gt;line&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;\n&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;;&lt;/span&gt;
	&lt;span class='o'&gt;}&lt;/span&gt;
	&lt;span class='n'&gt;in&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='na'&gt;close&lt;/span&gt;&lt;span class='o'&gt;();&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt; &lt;span class='k'&gt;catch&lt;/span&gt;&lt;span class='o'&gt;(&lt;/span&gt;&lt;span class='n'&gt;IOException&lt;/span&gt; &lt;span class='n'&gt;e&lt;/span&gt;&lt;span class='o'&gt;)&lt;/span&gt; &lt;span class='o'&gt;{&lt;/span&gt;
&lt;span class='o'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Let us ignore the missing error handling and the inefficiency of concatenating strings for the moment. The number of lines is not of interest either &amp;#8211; this not some &amp;#8220;Java sucks&amp;#8221;-rant, there are more than enough of that already.&lt;/p&gt;

&lt;p&gt;The difficulty of this trivial task illustrates a concept I have been thinking about regularly lately: the &lt;a href='http://www.pragprog.com/magazines/2010-04/tangled-up-in-tools'&gt;Radius of Comprehension&lt;/a&gt; which I wrote about &lt;a href='/2010/04/25/engineering-and-framework-fever/'&gt;here&lt;/a&gt;. It is a property of a codebase defined as follows&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you are looking at a given fragment of code, how far away from that bit of the code do you need to have in your mind at that time in order to understand the fragment at hand?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So how far from the code in our examples do we have to have in our heads? In Ruby, we need to understand one method in one class. (I&amp;#8217;m not including &lt;code&gt;String&lt;/code&gt; here, as I consider that to be part of fundamental understanding of the language.) However, to actually find the &lt;code&gt;read&lt;/code&gt; method, you need to look into the &lt;code&gt;IO&lt;/code&gt; class, the parent of &lt;code&gt;File&lt;/code&gt;. The names of the classes and the method are pretty obvious, so you quickly find them when looking through the documentation.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m going to make a naive attempt at quantifying the radius of comprehension here. The Ruby solution tentatively gets a score of three: two classes and one method each with meaningful names. I&amp;#8217;ll talk about the merits of this quantification at the end of the post, but first let&amp;#8217;s get the comparison with Java.&lt;/p&gt;

&lt;p&gt;In the Java version, we have two classes, &lt;code&gt;BufferedReader&lt;/code&gt; and &lt;code&gt;FileReader&lt;/code&gt;, and one method, &lt;code&gt;readLine&lt;/code&gt;. Having to loop through the file and concatenate all the lines makes the code more verbose and thus harder to read, but I wouldn&amp;#8217;t say it makes it harder to understand. A source of confusion in this code is the indirect relationship between the &lt;code&gt;FileReader&lt;/code&gt; and the actual reading. &lt;code&gt;FileReader&lt;/code&gt; is a self-explanatory name, but &lt;code&gt;BufferedReader&lt;/code&gt;? Having one reader and passing it to another reader from which you can actually read strings in your program increases the radius of comprehension by more than an obvious inheritance. As a consequence I assign the Java code a radius four: one for the method, one for the &lt;code&gt;FileReader&lt;/code&gt;, and two for &lt;code&gt;BufferedReader&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id='error_handling'&gt;Error Handling&lt;/h2&gt;

&lt;p&gt;There is another factor I want to take into account here: error handling. In both examples an exception could be thrown, and we need to understand what kind of exception that is in order to write robust code. So I would add one to both radius scores.&lt;/p&gt;

&lt;p&gt;In the Java code there is an additional complication: an error could occur when creating the readers or in the call to &lt;code&gt;readLine&lt;/code&gt;. In the latter case, we should to close the reader to avoid leaking resources. In Ruby we don&amp;#8217;t have to worry about closing anything. Understanding this additional error case in Java adds to the radius. Thus, with error handling we end up with: Ruby: 4, Java: 6.&lt;/p&gt;

&lt;p&gt;Again, the intention of this post is not to prove that Java stinks and Ruby is great. I&amp;#8217;m looking at a very limited use case here and actively ignore scenarios where the added complexity in Java&amp;#8217;s library might be useful. The point is understand more about the radius of comprehension.&lt;/p&gt;

&lt;p&gt;What I like about this metric is that the difference between the examples is a factor of 1.5 while the more straight forward metric of lines of code differs by a factor of 10. This score feels about right. The Java code is more complicated, but not hugely so.&lt;/p&gt;

&lt;h2 id='why_quantify_the_radius_of_comprehension'&gt;Why Quantify the Radius of Comprehension?&lt;/h2&gt;

&lt;p&gt;While I&amp;#8217;m satisfied with the result of my calculation here, the ad-hoc way I pulled the number from my pants is not where near scientific or generally useful. Maybe I can improve and formalize the process while reading more code. There is, however, another deeper question: Is it even worthwhile to try and formalize the radius of comprehension?&lt;/p&gt;

&lt;p&gt;Mike Taylor, who invented the term, &lt;a href='http://www.pragprog.com/magazines/2010-04/tangled-up-in-tools'&gt;writes&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I’m talking about a human issue here (and therefore, sadly, an all but impossible one to measure, though we know it when we see it)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While we may not be able to measure it given the vagueness of the definition, it might be possible to approximate it. Like any metric about code quality, it brings the danger of creating an arbitrary goal that is pursued at the expense of others, with a negative overall effect on quality. But I also see potential in quantifying the radius of comprehension.&lt;/p&gt;

&lt;p&gt;Code always has two audiences: the compiler or interpreter and other programmers who have to use or maintain it. While we can reliably and immediately assess how well the machine understands our code, feedback on how well others understand it is rare and fuzzy. If we had a way to determine the radius of comprehension, we could judge how well other programmers can understand what we wrote &amp;#8211; and take steps to improve it.&lt;/p&gt;</content>
 </entry>
 
 
</feed>

