What does optimised pure lisp look like?
In my previous post on Calculating a DOT product I mentioned making a fast pure lisp implementation of the dot product function. Let’s dig into that.
Let’s see if we can compete with the magicl
code, or the highly optimised C
code in numpy
. My goals to explore are twofold:
- Make pure lisp code comparable in speed to the R and python best cases
- Make the code readable, we’re not going to use fancy tricks or algorithms, or reduce readability (too highly optimised code leads to hard to identify bugs and difficulties maintaining!)
This post is heavily motivated by the magicl dot product code and it appears in the following categories:
- code
- AI / ML in common lisp —
Implementing AI and Machine Learning tools in common lisp.
Initial setup
As we did previously, let’s create two lists, each being 10 million random numbers between 0 and 1.
(defun randvect (length)
"A simple list (length LENTH) of random numbers [0,1)"
(loop for i below length collect (random 1.0)))
(defvar n 10000000 "The length of our test data")
(defvar a (randvect n) "A list, length N, of random numbers")
(defvar b (randvect n) "Another list, length N, of random numbers")
Thinking about how we should store our data
- lists versus arrays
- lists aren’t ideal for storing vectors, lookup speed grows with length, whilst arrays have a constant time. This is important when building models, where the list/array might be 10 million (or more) observations of our experiment
(defparameter array-a (make-array (list (length a)) :initial-contents a))
(defparameter array-b (make-array (list (length b)) :initial-contents b))
Previous version of the code
This is the code I had for naïve-dot-v2
(which I’m calling v1
here)
(loop for i below (length array-a)
sum (* (aref array-a i) (aref array-b i)))
v1 - 270.0504ms
Let’s turn this into a function so we can optimise it…
(defun dot-v2 (array1 array2)
(loop for i below (length array1)
sum (* (aref array1 i) (aref array2 i))))
v2 - 234.21423ms
Adding a check
Before we optimise the code, we should add a check to make sure the arrays are the same size:
(defun dot-v3 (array1 array2)
(let ((size1 (length array1))
(size2 (length array2)))
(assert (cl:= size1 size2))
(loop for i below size1
sum (* (aref array1 i) (aref array2 i)))))
v3 - 220.24509ms
I’ve created two new variables size1
and size2
and assert
that they are
equal.
I’m not really sure why this was faster, it was probably a function of whatever else was (or wasn’t) running on my machine at the time, but it’s not really material.
I want speed!
Let’s be clear - we want to optimise for SPEED, so be explicit in line 2.
|
|
v4 - 222.48372ms
No improvement!
Why? We haven’t given any hints to the how the compiler/interpreter can improve things. Let’s remove the optimise command and add some type hints.
Adding data type information
Now let’s use lisp’s gradual typing features. I’m planning to write about this in another post (and I’ll update this one to point to it).
- data type
- Python and R are making assumptions on the data we’re storing, they’re both assuming floating point numbers.
How do I this in lisp? Easily. We just add :element-type 'single-float
to the
make-array
command. This is exactly comparable to the dtype=float32
flag
you’d use in python’s NumPy/TensorFlow/etc.
(defparameter array-a-v2 (make-array (list (length a))
:initial-contents a
:element-type 'single-float))
(defparameter array-b-v2 (make-array (list (length b))
:initial-contents b
:element-type 'single-float))
Gradual typing is where we can define the type of data in a particular function - but we don’t have to everywhere. (Most) Other languages have an all-or-nothing approach, but lisp allows you to choose.
Below, in line 2, we simply declare
that the type of the inputs array1
and
array2
are simple-array single-float (*)
, which means:
simple-array
- just take this as it reads - a simple array - or dig into the CLHS for details
single-float
- elements are floating point numbers (as opposed to, say double precision or complex numbers)
(*)
- indicates this is a single dimensional array, i.e. vector, of unknown size
|
|
v5 - 120.991104ms
We’ve almost halved the run time - because the multiplication on line 7 now
can be simplified to only handle single-float
arguments rather than a generic
operation that works for integers, single and double floats, ratios, complex
numbers, …
Adding type information to the loop: the accumulator
We have now optimised our multiplication to work for out single-float
arguments, but the accumulator (i.e. the sum
part) in the loop
function
doesn’t know the type of it’s arguments. However, we do. The sum of two values
of type single-float
is another single-float
. In line 7 we use that
information:
|
|
We’ve also had to add lines 8 and 9 to replicate what the sum
keyword is
providing in the loop.
The results were staggering when I first saw them.
v6 - 11.116839ms
We took an order of magnitude off the run-time and are now in the realms of the NumPy - hand tuned, optimised and mostly unreadable, C.
Optimise for speed, again
Again, let’s be explicit that we want SPEED.
(defun dot-v7 (array1 array2)
(declare (optimize speed)
(type (simple-array single-float (*)) array1 array2))
(let ((size1 (length array1))
(size2 (length array2)))
(assert (cl:= size1 size2))
(loop for i below size1
with s of-type single-float = 0.0
do (incf s (* (aref array1 i) (aref array2 i)))
finally (return s))))
The lisp implementation I’m using (SBCL) is already a high performance Common Lisp compiler, so there’s not much to be squeezed out here.
v7 - 10.600901ms
Adding type information to the loop: array index
We can help the compiler out a little more by letting it know that the loop
variable i
is for indexing arrays:
(defun dot-v8 (array1 array2)
(declare (optimize speed)
(type (simple-array single-float (*)) array1 array2))
(let ((size1 (length array1))
(size2 (length array2)))
(assert (cl:= size1 size2))
(loop for i of-type alexandria:array-index below size1
with s of-type single-float = 0.0
do (incf s (* (aref array1 i) (aref array2 i)))
finally (return s))))
With this we can squeeze out a little more speed, but to be honest this result fluctuated depending on what else my computer was doing.
v8 - 10.57616ms
Almost the end
At this point I would look into defgeneric
where I can define multiple dot
functions each optimised for different inputs, so double-float
,
complex-float
, etc. This is similar to overloading in other languages.
However, I’ll leave that for another day.
Now, if this was a language benchmark post, I’d stop here. However, I want to use this dot function in a practical use case. That’s what I do in some following articles where I build a AI/ML model (a neural network in fact) in lisp.
Finally
Let’s compare the initial function and the optimised one:
|
|
- Line 1
- Defining the function [Same as line 13]
- Line 2
- Tell the compiler we want fast code
- Line 3
- Tell the compiler the arguments are
single-float
- Lines 4 – 6
- Check to see that the vectors are the same length
- Line 7
- Create our iterating variable [Effectively, same as line 14]
- Line 8
- Tell the compiler that our accumulator variable
s
is asingle-float
- Lines 9 & 10
- Sum up the product of the array elements [Same as line 15]
Effectively, by adding lines 3 and 8 we have taken the time to calculate the dot product from 270ms down to 10.576ms, speeding it up by more than 25 times and ending with a run time that is comparable to Numpy (11.154ms - previously ).
An aside…
Why did I say the lines 9 & 10 are the same as line 15?
The loop
macro in common lisp is incredibly versatile.
The code generated by these two loops is virtually identical
;; Loop version 1
(loop for i below 5
sum i)
;; Loop version 2
(loop for i below 5
with s = 0
do (incf s i)
finally (return s))
The only real difference is the variable #:LOOP-SUM-708
in this first code
block acting as a proxy for S
in the second loop.
This is the expansion of the first loop (using (macroexpand-1 '(loop for i below 5 sum i))
):
|
|
The expansion of the second loop, where lines 5 & 6 (defining the variable S
)
are the only material differences.
|
|
In case you were wondering about the (incf s i)
in line 10
(macroexpand-1 '(incf s i))
(setq s (+ i s))
which is the same form as line 10 in the first expansion.
Aren’t lisp macros amazing!?