IDSC V: Accessing and Modifying Vectors
In the previous post of the Immutable Data-Structure Canon we looked at vectors, their internal structure, how they are created, and how more elements are inserted. In this post we continue where we left off and examine the code used to access and remove values from a vector.
Accessing Elements
The elements of a vector are accessible by index. The way vectors are usually implemented, this is a constant time operation, as it only takes the calculation of the offset of a memory location. As we’ve seen, Clojure vectors are implemented as trees to allow for shared structures between persistent “modified” versions, so the access methods need to find their way around that structure.
Let’s say, for example, that we have a vector v with 1500 elements and we want to get the one at index 1101. There are three different ways to find an element in a vector by index: the nth function, the get function, and using the vector as a function. They differ in how they handle the vector being nil, the index being out of range, and whether they support a “not found” argument. For this discussion we’ll use nth; it returns nil if the vector is nil, throws an exception, if the index is out of range, but you can pass an optional argument that is returned, when the index is not found.
(nth v 1101)
Internally, this function is mapped to a call to the nth method in PersistentVector:
public Object nth(int i){
Object[] node = arrayFor(i);
return node[i & 0x01f];
}
The arrayFor method handles the lookup in the internal structure and returns on of the 32-element arrays where the elements are stored – either from one of the leaf nodes or from the tail (line 2). The index in that array is calculated by applying a bit-mask to the index passed into the method (line 3). For our example that local index is 13.
public Object[] arrayFor(int i){
if(i >= 0 && i < cnt)
{
if(i >= tailoff())
return tail;
Node node = root;
for(int level = shift; level > 0; level -= 5)
node = (Node) node.array[(i >>> level) & 0x01f];
return node.array;
}
throw new IndexOutOfBoundsException();
}
The arrayFor method checks that the index is valid and throws an exception if it is not (lines 2, 11). If the requested index is greater than the number of values in the tree, the tail array is returned (lines 4,5). In our example, this is not the case, we need to look into the tree.
As a tree with two layers (root and leaves) can hold 1024 elements, the tree in for a vector with 1500 elements needs three layers. The index we’re looking for is in the second subtree, as it’s greater than 1024. The field shift which holds a multiple of 5 proportional to the height of the tree is 10 in our case, so that we enter the loop in line 7 with 10 as level.
Inside the loop, we find the subtree holding our value by bit-shifting the index by the current level and applying a bit-mask to get it into the 32-element frame. The index for the subtree in our example is 1, as expected. In the second iteration of the loop, we look at the leaves, so that level is decremented to 5. This time, we find the node with index 2. The loop terminates here, as there is not level further down. The method returns the array attached to the node we found.
Summarizing all the index-offsets, we have:
- the second subtree of the root (the first subtree contains 1024 elements),
- the third leaf of that subtree (the leaves contain 32 elements each), and
- the fourteenth element of the leaf (calculated in
nth).
Thus we have the 1102nd element of the vector. Bingo!
Like the cons operation we saw last time, the number of steps necessary to find the nth element depends on the height of the tree, so the complexity here is once again O(log32 N).
Deleting Elements
To finish the discussion of vectors, let’s look at deleting elements. The only position where efficient deletions are possible is the end. This can be done using the pop function that is conveniently mapped to the pop method of PersistentVector:
public PersistentVector pop(){
if(cnt == 0)
throw new IllegalStateException("Can't pop empty vector");
if(cnt == 1)
return EMPTY.withMeta(meta());
if(cnt-tailoff() > 1)
{
Object[] newTail = new Object[tail.length - 1];
System.arraycopy(tail, 0, newTail, 0, newTail.length);
return new PersistentVector(meta(), cnt - 1, shift, root, newTail);
}
Object[] newtail = arrayFor(cnt - 2);
Node newroot = popTail(shift, root);
int newshift = shift;
if(newroot == null)
{
newroot = EMPTY_NODE;
}
if(shift > 5 && newroot.array[1] == null)
{
newroot = (Node) newroot.array[0];
newshift -= 5;
}
return new PersistentVector(meta(), cnt - 1, newshift, newroot, newtail);
}
First, the edge-cases are handled: pop on an empty vector does not work (lines 2,3), and pop on a one-element vector returns an empty vector (lines 4,5). Then we look at the easy case that the tail holds more than one element (line 6); we just copy all tail elements except for the last one into a new array and use it to create a new vector that shares the tree with the original (lines 8-10).

The condition for the simple case is phrased so that we don’t use it on a tail with only one element. This is because we need to change the tree when the tail run empty; the last node becomes the new tail in that case.
Shrinking the tree starts with getting the array from the last node by calling the arrayFor method (line 12). Next, we call popTail to get a new root node for a tree without the last node (line 13). This leaves us with three cases:
- The node we removed was the only node, so that our return vector gets an empty node as root (lines 15-18).
- The tree has intermediate levels between the root and the leaves and the removed node was the only leaf in the second subtree, so that we can remove one layer (lines 19-23).
- Otherwise the new tree still has the same height as the old tree.
Like the other vector operations we looked at, pop also has a complexity of O(log32 N), because the number of steps necessary is dependent on the height of the tree.
Summary
- Vectors are implemented based on a tree and a tail array.
- All arrays (both in the nodes of the tree and in the tail) have up to 32 elements.
- The leaves always have exactly 32 elements in them.
- Elements can added and removed efficiently at the end of the vector.
- Addition, removal, and index lookup are
O(log32 N), which is essentially constant time in practice.
