Nathan Stien

Abstractions & Applications

Typed Clojure by Example

A Guided Tour of Some Interesting Types from the Core Library

I’ve been exploring Clojure over the last year or so, and it’s one of the most joyful languages I’ve ever used. That said, I all too frequently cause silly runtime errors that I know could have been prevented statically in other languages such as Haskell. When I first heard about Ambrose Bonnaire-Sergeant’s work to create a static type system for Clojure, I began to hope that I could get back some of that safety I had been missing without sacrificing the other advantages of Clojure. Ambrose recently announced a production-ready release, so I finally decided to dig in and make an effort to grok it.

Explaining what I know to others has always helped me shore up my own understanding, so I decided to write down some of what I’ve learned so far. I’m going to walk through a series of interesting types I’ve encountered while working with some standard clojure.core library functions. As we go along, the types will get more complex and show off additional features of the type system. My hope is that this will be a useful tutorial for people who want to learn Typed Clojure.

Some Initial Concepts

Before we get into the types, we should lay down some groundwork.

Type Unions

For those familiar with the type systems of the C-like languages or even that of Haskell, Typed Clojure is quite a departure. The first thing I noticed about it was the pervasive use of ad-hoc type unions, which represent a set of possible types a value is permitted to take.

For example, if a value can be an Integer or a String, then its type would be described as (U Integer String), the union of those two types. You can supply a simple Integer or a String anywhere (U Integer String) is expected.

Importantly, the converse is not true – a (U Integer String) is itself neither an Integer or a String. The type system will require you to resolve the ambiguity before you pass such a value to a function that requires only an Integer. Fortunately, this is not hard to do, as will be explained below.

Clojure was designed as a dynamically typed language, and type unions are a convenient way to describe the behavior of a dynamic language in static terms.

Value Types

Typed Clojure has value types, which represent the type of all variables that have a certain literal value. The string literal "foo" has the type (Value "foo"). Value types are subtypes of their literals’ types, so (Value "foo") will typecheck anywhere String is required.

nil Is Explicit

Unlike in Java, where all non-primitive types are implicitly nullable, types in core.typed are what they say they are. The value nil has its own value type, also called nil, and it can inhabit no other type. This is immensely useful because it forces you to deal with any possible nils that may arise, and thereby avoid the dreaded NullPointerException.

Bring on the Types

That’s enough prerequisites for now, so let’s start checking out some types. We’ll pick up the rest along the way.

AnyInteger

The simplest type I’ll show you is just a union of several java.lang.* boxed integer types. A function written in terms of AnyInteger can accept any of the concrete types mentioned in the union.

(def-alias AnyInteger (U Integer 
                         Long 
                         clojure.lang.BigInt 
                         BigInteger 
                         Short 
                         Byte))

If you write functions in terms of AnyInteger, they can be polymorphic with respect to the particular integer type used, and thus more flexible.

(U x nil) a.k.a. (Option x)

This is an extremely useful type: the union of the nil type with some other type. It’s often seen under the alias (Option x), and a great many functions in the standard library mention it. It’s Typed Clojure’s analogue of option types in languages like Haskell, Scala, and Rust, which represent a value that might not be present.

If a function returns (Option String), you are forced by the type system to handle the possibility of nil in some way. Thanks to the magic of occurrence typing, a conditional check against nil is enough to satisfy the checker, and pluck out the String from the union.

;; let `foo` be a variable of type (Option String)
(if foo
    (...within this branch, `foo` has type String 
      since all Strings are truthy)
    (...but in the else-branch, 
      occurences of `foo` have type nil))

That is, core.typed is examining your conditional logic and deducing what types can exist in each branch. In the above, if foo were a String instance, then the if would consider it truthy and evaluate the first branch, so the checker is free to assume that it’s not nil for that branch.

You can even just (assert foo), and the nil will be removed from the type union in subsequent occurrences of foo, since the rest of the function would be unreachable if foo were nil. (But then of course you might get a runtime assertion exception.)

Occurence typing works for other types besides nil. Make a conditional check with a runtime predicate like number?, string?, keyword?, etc. and core.typed will narrow the types in your branches. This is a really elegant mechanism because you don’t have to do any extra annotations or write code with some kind of foreign smell – it just makes you do a runtime check that you should really be doing anyway.

clojure.core/name

It’s time for our first function. name is a standard Clojure function to give you the name of something. It’s normally used to turn a keyword or symbol into a string. If you pass in a String, you get the same String back, making the function a no-op.

=> (name :foo)
"foo"
=> (name "bar")
"bar"
=> (name nil)
NullPointerException ; d'oh!

The notation for function types resembles many other type systems, with the -> arrow indicating the return type, though in Clojure it must be surrounded in square brackets.

(ann clojure.core/name [(U String Named) -> String])

This type says that name can take a value that is either a String or a Named, and always gives you a non-nil String. The type Named is clojure.lang.Named, a simple Java interface implemented by Clojure objects that have a concept of a namespaced name. We can seamlessly mix types originating in Java into our Clojure-based types, so in a sense, Clojure’s new type system is a superset of Java’s. Rules about interfaces and base classes are exactly what you’d expect.

clojure.core/str

The str function is a bit more interesting because it’s variadic. You can give it any number of arguments, of any type, and it will turn them all into strings and return their concatenation. It has the following type:

(ann clojure.core/str [Any * -> String])

The Any type will match any other type. That is, all types are subtypes of Any, even nil. You could also think of it as the union of all possible types. (Thankfully, it’s not really implemented that way, as infinite sets tend to be a bit expensive on RAM.)

The asterisk here means that our function takes an arbitrary number of arguments, each of type Any. It often coincides with usage of the rest-parameter operator (&) in Clojure argument destructuring.

clojure.core/iterate

iterate takes a function, an initial value, and produces a lazy sequence of results from successively applying that function.

=> (iterate inc 0) 
(0 1 2 3 ...)

It’s a higher order function, as it takes another function as an argument. It’s also polymorphic, since iterate doesn’t care about the particular type of the collection elements it deals with.

(ann clojure.core/iterate 
  (All [x] 
    [[x -> x] x -> (LazySeq x)]))

That (All [x]) form is introducing a type variable. This is similar to type parameters for generics in Java and C#, for example the T in List<T>.

So the type above says that for any given type x, iterate accepts a function [x -> x] and an initial value also of type x. It then produces a lazy sequence containing values of the same type x.

clojure.core/nth

nth accesses a member of a collection by index. It’s our first example of a function with multiple type clauses.

(ann clojure.core/nth
  (All [x y]
    (Fn [(Indexed x) AnyInteger -> x]
        [(Indexed x) AnyInteger y -> (U x y)]))

Functions with multiple clauses are introduced with (Fn). The clauses are tried in the given order, so you usualy want the most specific types first, and a catch-all general type last. Here, though, we’re simply matching on two different arities. The first case covers invocations that only supply an Indexed collection and an index value. The second clause handles an extra parameter returned as a default value if the index is not found.

Note that nth allows the default value to be of a different type (y) than that of the collection elements (x), and this is also reflected in the return type of (U x y). Often the default value will be of the same type as x, so the union (U x x) will collapse into simply x. That extra degree of freedom might seem like an odd design decision if you were designing this function with static types in mind, but the Clojure core libraries and cultural idioms were developed in a dynamically typed environment. Typed Clojure takes idiomatic Clojure code as given, and tries to provide as much safety as possible given that constraint.

clojure.core/first

The simple function first, which extracts the first element of a sequence, has a really interesting signature.

(ann clojure.core/first
  (All [x]
    (Fn [(Option (EmptySeqable x)) -> nil]
        [(NonEmptySeqable x) -> x]
        [(Option (Seqable x)) -> (Option x)]))

We have three clauses in the type, even though they are all for the same arity.

The first case, [(Option (EmptySeqable x)) -> nil], describes the fact that if we pass in a sequence that we know statically is empty, then we will always get a nil output. The Option means that this case also covers static nils.

Secondly, [(NonEmptySeqable x) -> x] guarantees that a statically non-empty input sequence will always give us a non-nil result.

Finally, we have the general case of [(Option (Seqable x)) -> (Option x)], which handles normal operation of first when we don’t have any static information about the argument. You can give it nil or a possibly-empty sequence, and it will give you nil or a value. In this case, the type system will force us to check against nil, which is what we want.

clojure.core/second

The type for second is even more interesting than that of first.

(ann clojure.core/second
  (All [x]
    (Fn [(Option (I (Seqable x) (CountRange 0 1))) -> nil]
        [(I (Seqable x) (CountRange 2)) -> x]
        [(Option (Seqable x)) -> (Option x)]))

Let’s break down each clause here.

[(Option (I (Seqable x) (CountRange 0 1))) -> nil]

This clause is using a type intersection that matches any type that’s both Seqable (that is, able to produce a seq of x-typed values) and statically known to contain fewer than 2 elements. As there is no second element to retrieve, you’re guaranteed to get a nil result. And since the argument type is wrapped in an Option, a statically-nil input will always produce a nil result.

[(I (Seqable x) (CountRange 2)) -> x]

The second clause is the happy path, when we statically know that we have a non-nil input collection that has at least 2 elements. For that, we always get our non-nil result, and don’t have to do a nil check.

[(Option (Seqable x)) -> (Option x)]

Finally we have the catch-all case. If we don’t have any static knowledge about this collection, all we can say is that you’ll get either a nil or your value of type x.

clojure.core/map

The last stop on the today’s type tour is the venerable lisp map function. You can pass any number of sequences to map, and they will all be passed element-wise to the supplied function. This calls for a type of a form we haven’t seen yet.

(ann clojure.core/map
  (All [c a b ...]
    (Fn [[a b ... b -> c] 
         (NonEmptySeqable a) (NonEmptySeqable b) ... b 
         -> (NonEmptyLazySeq c)]
        [[a b ... b -> c] 
         (U nil (Seqable a)) (U nil (Seqable b)) ... b
         -> (LazySeq c)]))

This is our first example of using an unbounded number of type variables. The ... in the All form is how we introduce these.

Each clause starts with [a b ... b -> c], which describes a function of an arbitrary number of arguments, each of which may have a distinct type. The remaining variadic arguments, such as (NonEmptySeqable a) (NonEmptySeqable b) ... b in the first clause, represent some number of arguments, with non-empty sequences carrying elements of the same types, in the same order, as the function passed as the first argument. That is, if you pass a function of type [String AnyInteger Double -> Whatever], then you have to feed it exactly three collections containing values of the corresponding types. That’s a really nice piece of type-safety that I wouldn’t have expected to be able to express.

I’ll admit I find the syntax for specifying the collection arguments here a bit confusing. I suppose I might have expected to see the right side of the ellipsis to be (NonEmptySeqable b) instead of merely b.

By now you can probably read these types well enough to discern that the first clause guarantees a non-empty result for non-empty inputs, and that the second clause is the general case for possibly-empty or possibly-nil input sequences.

Going Further

I found these examples by spelunking through the core.typed source code. I’m nowhere near grokking it all yet, but there’s a lot of fairly accessible stuff in base_env.clj, which is where I got all of the above types.

There are also some other interesting aspects of the type system I haven’t talked about, such as heterogeneous maps, filters, predicates, type functions, and variance. I am particularly fond of the heterogeneous maps.

As for theory, Typed Clojure borrows liberally from Typed Racket, and Ambrose has mentioned a few papers in particular:

Support Typed Clojure!

There’s still some of time left in the Typed Clojure crowdfunding campaign. If you think this work is valuable, consider chipping in!

comments powered by Disqus