It should go without saying: Choose the right programming language for your problem.
tl;dr - You can get high speed, numerical calculation output without having to resort to bug-prone, highly tuned and optimised, unreadable code.
Here I explore the code and efficiency (or more accurately run-time) of taking dot product on two vectors that are each ten million elements long. I’ll look at python, R and common lisp and in each case implement the dot product myself as well as using an optimised package for the library. The fastest implementation may surprise you.
This is not a post on benchmarking, the timings in the final section aren’t taken on a particularly powerful machine1 or intended to be rigorous; the timings are simply to show that rapid prototyping, followed by optimising what matters can be worthwhile. It was, however motivated by my question on Reddit.
This post appears in the following categories:
- code
- AI / ML in common lisp —
Implementing AI and Machine Learning tools in common lisp.
- Lisp versus X — A different perspective is always valuable.
What’s a dot product and why do I care?
A dot product between two vectors \(A\) and \(B\) can be thought of as the sum of the product of the elements, i.e.
\begin{align} A \cdot B &= \sum_{i=1}^{n}a_{i}\,b_{i} \\ &= a_{1}\,b_{1} + a_{2}\,b_{2} + \ldots \end{align}
So why do I (or should you) care? Dot products are pretty fundamental to machine learning - as I’ll show in some upcoming posts. That means, when you have lots of data, you’re doing this operation lots of times. So how well your language implements this matters.
Let’s look at two common (and one not so common2) languages and timings.
A python implementation
Python is the go-to language for machine-learning practitioners currently. There are many extremely powerful frameworks like PyTorch, TensorFlow among others. They are primarily built upon NumPy. If you’re reading this post, you probably know all of this.
Let’s set up our random vectors…
import time
import numpy as np
# How big do we want the vector to be? 10 million!
n = 10000000
# Make two arrays of 10 million random numbers
a = np.random.rand(n)
b = np.random.rand(n)
Naïve loop
If we were to evaluate the dot product with a loop over each of the elements it would look something like
def dot_prod_loop():
n = len(a)
tic = time.time()
c = 0
for i in range(n):
c += a[i] * b[i]
toc = time.time()
print(f"Python: loop version: {1000*(toc-tic):.3f}ms")
# Run it
dot_prod_loop()
Python: loop version: 3632.499ms
NumPy
Naturally, we can do better by using the NumPy implementation:
def dot_prod_numpy():
tic = time.time()
c = np.dot(a,b)
toc = time.time()
print(f"Python: numpy version: {1000*(toc-tic):.3f}ms")
# Run it
dot_prod_numpy()
Python: numpy version: 11.154ms
So the “vectorised” python version is about 330 times faster!
Of course this is a somewhat unfair comparison, because the NumPy version is actually highly optimised and compiled C code. It’s not pretty or concise (and that’s coming from someone who chooses to code in C!), but it’s fast.
An alternative implementation in R
I used to write a lot in R, so I thought I’d dust it off for a comparison.
Setting our random vectors again
n <- 10000000
a <- runif(n)
b <- runif(n)
Naïve loop
As with the python case loop over each of the elements
# Loop
tic = Sys.time()
result = 0
for (i in 1:length(a))
result = result + a[i] * b[i]
toc = Sys.time()
cat("R: loop version: ", round(1000*(toc-tic), 3), "ms\n")
R: loop version: 533.698 ms
Around 7 times faster than the naïve python implementation. Not bad.
Sum the element-wise multiplication
An alternative is to use the built-in element-wise multiplication feature and sum up the results
# Another simplistic approach
tic = Sys.time()
result = sum(a*b)
toc = Sys.time()
cat("R: summing element-wise multiplication version: ", round(1000*(toc-tic), 3), "ms\n")
R: summing element-wise multiplication version: 1599.384 ms
Ouch. This is not what I would have expected using native functions.
Built-in matrix multiplication
As R is a language built for mathematical operations, we could just use the built-in matrix multiplication operation.
# Built in dot product
tic = Sys.time()
result = a %*% b
toc = Sys.time()
cat("R: Base matrix multiplication version: ", round(1000*(toc-tic), 3), "ms\n")
R: Base matrix multiplication version: 32.981 ms
As expected, this looks to have been optimised reasonably well.
Geometry library
How about a library that provides a dot
function? It’s compiled code so you’d
expect it to be fast. I saw this suggestion on StackOverflow.
library(geometry)
tic = Sys.time()
result = dot(a, b)
toc = Sys.time()
cat("R: geometry dot version: ", round(1000*(toc-tic), 3), "ms\n")
R: geometry dot version: 386.445 ms
This is unexpectedly slow.
Like python, a lot of R’s packages aren’t actually written in R they’re C, C++ and Fortran to get the speed you want.
Common lisp implementations
How about common lisp?
Set up our random vector yet again. It’s lisp, so let’s just use a list.
(defun randvect (length)
"A simple list (length LENTH) of random numbers [0,1)"
(loop for i below length collect (random 1.0)))
(defparameter n 10000000)
(defparameter a (randvect n))
(defparameter b (randvect n))
Now, let’s use some of the power of common lisp and write a macro to make generating the timing functions easier. If that sentence doesn’t make sense, this is just a quick way to duplicate the same code across all of my timing functions.
(defmacro mytime (name description &body body)
"Create a function called NAME to get timing when running BODY"
`(defun ,name ()
(let ((tic)
(toc))
(setf tic (get-internal-real-time))
,@body
(setf toc (get-internal-real-time))
(format t "lisp: ~a version: ~fms~&"
,description
(/ (- toc tic) (/ internal-time-units-per-second 1000))))))
Naïve loop
Let’s try two different looping approaches.
v1
- One of the simplest data types in lisp is the list, it’s not a numerical type like in other languages, it’s just a linked list of elements…
v2
- Let’s test an array object, this would perhaps be preferable?
(mytime naive-dot-v1 "naive loop v1"
(loop for i in a
for j in b
sum (* i j)))
;; Let's make our lists arrays
(defparameter aa (make-array (list (length a)) :initial-contents a))
(defparameter ab (make-array (list (length b)) :initial-contents b))
(mytime naïve-dot-v2 "naïve loop v2"
(loop for i below (length aa)
sum (* (aref aa i) (aref ab i))))
(naive-dot-v1)
(naïve-dot-v2)
lisp: naive loop v1 version: 163.385ms
lisp: naïve loop v2 version: 270.050ms
Slower than the best python and R implementations, but not bad for a first pass approach for a non-numerical language.
Magicl library
We can optimise pure lisp, to make it fast…
(defparameter ma (magicl:from-list a (list n)))
(defparameter mb (magicl:from-list b (list n)))
(mytime magicl-dot "magicl pure lisp"
(magicl:dot ma mb))
(magicl-dot)
lisp: magicl pure lisp version: 11.042ms
Ever so slightly slower than the NumPy version (< 1 ms), but not bad at all for pure lisp.
So, what does “optimised” pure lisp look like?
Well, that’s for another write up on Making (pure) lisp functions faster . I’ve put quotes around optimised as this isn’t a fancy algorithm, it’s simply putting variable type hints in the code. The implementation I’ll discuss is a little bit of a cheat compared to MAGICL; I haven’t generalised it to work for complex numbers - unlike the MAGICL implementation.
What I will do in this post is share the result…
lisp: array-dot-lisp version: 10.576ms
Python vs. R vs. lisp - timing
What does the following table show us? Python (NumPy), R and common lisp can all produce code that is comparably fast, but in each case you need to think a little bit about your problem set and choose the right approach.
The other take-away that I think is important is that (for my use case!) it appears to take considerably less effort and development skill - and is less bug-prone - to take a rapid prototype in lisp and convert it to a fast, production ready implementation than it would in python and R.
Language | Implementation | Time (ms) | Rel. to numpy |
---|---|---|---|
lisp | Pure lisp | 10.576 | 0.9 |
Python | Numpy | 11.154 | 1.0 |
lisp | Magicl pure lisp | 11.042 | 1.0 |
R | Matrix multiplication | 32.981 | 3.0 |
lisp | Naïve loop (v1) | 163.385 | 14.6 |
lisp | Naïve loop (v2) | 270.050 | 24.2 |
R | Geometry library | 386.445 | 34.6 |
R | Naïve loop | 533.698 | 47.8 |
R | Sum element product | 1599.384 | 143.4 |
Python | Naïve loop | 3632.499 | 325.7 |