The problem with arrays
In functional programming, we never mutate or change our
data-structures. Instead, all operations on data-structures that needs
to change something must return a new version without modifying the
old one. An example of this is the
concat method on arrays. The code
arrayA.concat(arrayB) changes neither
returns a brand new array.
Arrays are stored in memory as a consequetive sequence of bits.
Therefore, when a pure function operates on arrays the only thing it
can do in order to return a new array is to copy the entire array.
For instance, to append a single element to a list by using
append the entire array that is being appended to must be
copied. As another example, to remove the first element of an array
one can write
array.slice(1), use the function
R.init from Ramda. All of these functions make a copy of
the entire slice. If the array is 1000 elements long the code will
construct a brand new array of length 999.
When doing functional programming with arrays a lot of time is wasted copying arrays all the time. This not only makes our code slower, it also gives the garbage collector extra work because it has to free all the memory again. We pay twice.
How List solves the problem
List is an implementation of an immutable data-structure called relaxed radix balanced trees. Like most immutable data-structures it uses a technique called structural sharing to avoid the unnecessary copying we saw with arrays. Internally a sequence of elements is stored by List in a format that looks a bit like this.
One way of looking at the above is that List stores elements in several small chunks. Whenever we need to create a new list we can reuse all the chunks that didn’t change. This type of reusing is structural sharing. Arrays, on the other hand, store everything in one big chunk so it’s not possible to use share anything when changing an array.
If we appended an element to the above list we would get a new list that, internally, looked something like this.
The dashed boxes represent the new list. Note how everything in the old list is reused in the new list. This is one part of why List can perform so well. Appending an element to a list takes essentially the same time no matter how large a list we’re appending to. This can be seen in benchmarks.
Appending an element to an array takes more and more time as the array gets larger. For List the time is constant. Appending an element to a list takes the same time on a list of length 10 as it does on a list of length 10.000.
This performance improving technique applies to a wide range of
operations. Sharing is useful for things such as changing a single
element in a list, inserting elements into a list, concatenating
lists, and slicing a list. The graph below compares the performance of
slice between List and arrays.
Again, every time we want to change something in an array we must make a copy of the entire array. But, to change something in a list we can reuse large parts that can be shared between the new and the old list.
Another aspect that makes List faster than native arrays is that many
methods on native arrays are actually quite slow. This applies to
reduce, and more. They have a lot of complexity
that slows them down. For instance
must handle sparse arrays and must accommodate for the filtering
predicate mutating the array during the filter. These things are
arrays. The graph below shows the performance of filtering a List
versus filtering an array.
Even though List is very fast across the board no data-structure can be the fastest at everything. There are a few operations where List is slower than native arrays. One of these is random accessing. I.e. accessing arbitrary elements out of order. But, even in the few cases where List is behind native arrays, it is not behind by much.
Another caveat is that some functions in List are slower than arrays
for very small lists—even though they scale better to larger lists.
This can be seen in the plot of
slice above. List’s
slice is a lot
faster on large lists but for small lists, it is slightly slower. This
is something that I’m planning to improve in the future.
By using structural sharing List can implement a lot of operations in a way that is much faster than what is possible with arrays. This means that for most functional use cases using List will be faster than arrays. However, that is only part of the reason why someone may use List. Other benefits, that in many cases are more important, are enforced immutability and the extensive API that List offers.