Introduction to the Dex language

Dex is a functional, statically typed language for array processing. Here we introduce the language in a tutorial format. We assume reading familiarity with functional programming in the Haskell/ML style and numerical programming in the NumPy/MATLAB style. This is a literate program with outputs generated mechanically. You can follow along in the REPL, dex repl, or, on Linux, in the live notebook: dex web examples/tutorial.dx. See the README for installation instructions.

Expression syntax

We can evaluate and print expressions with :p

:p (1. + 2.) * 3.
9.0

We can write let bindings as x = <expr>; <expr>. Bindings can be separated with a line break or a semicolon.

:p x = 1. -- let binding y = (z = 2.; z + 1.) -- let binding of a nested let expression -- -- pretend empty line (double newline separates top-level decls) x + y -- body of let expression
4.0

We also have lambda, tuple construction and (tuple) pattern-matching (destructuring):

:p (x, y) = (1., 2.) -- let binding (with pattern-matching), tuple construction f = lam z. x + z * y -- let binding of a lambda function f 1. + f 2. -- body of let expression
8.0

We use white space for function application (we write f x y instead of f(x, y)). To see how subexpressions are grouped, it can be helpful to pretty-print the internal AST using :parse. For example, we see that function application associates to the left and binds more tightly than infix operators.

:parse f 1 2 3
(((f 1) 2) 3)
:parse f 1 + g 2
(%fadd((f 1), (g 2)))

To reduce parentheses, we sometimes use $ for function application. $ binds very loosely:

:parse f $ x + y
(f (%fadd(x, y)))
:parse f (x + y)
(f (%fadd(x, y)))

We can combine a let binding with a lambda expression, writing f = lam x. <expr> as f x = <expr>. This is just syntactic sugar. These three expression all parse to exactly the same thing:

:parse f x y = 1; f 2 3
(f = (lam x . (lam y . 1)); ((f 2) 3))
:parse f = lam x y. 1; f 2 3
(f = (lam x . (lam y . 1)); ((f 2) 3))
:parse f = lam x. lam y. 1; f 2 3
(f = (lam x . (lam y . 1)); ((f 2) 3))

Types

Why did we write 1. + 2. instead of just 1 + 2? Let's try that:

:p 1 + 2
Type error: Expected: Real Actual: Int In: 1 :p 1 + 2 ^^

The problem is that 1 is an integer whereas + operates on reals. (Note that Haskell overloads + and literals using typeclasses. We could do the same, but we're keeping it simple for now.) We can query the type of an expression with :t.

:t 1
Int
:t 1.
Real
:t lam x y. x + y
(Real -> (Real -> Real))

The type system is completely static. As a consequence, type errors appear at a function's call site rather than in its implementation.

:p f x = x + x f 1
Type error: Expected: Real Actual: Int In: 1 f 1 ^

The expressions we've seen so far have been implicitly typed. There have been no type annotations at all. The Dex compiler fills in the types using very standard Hindley-Milner-style type inference. The result of this process, is an explicitly typed IR, similar to System F, in which all binders are annotated. We can look at the explicitly typed IR with :typed:

:typed f x = x * x z = 2.0 f z
(f::(Real -> Real) = (lam x::Real . (%fmul(x, x))); (z::Real = 2.0; (f z)))

We can also add some explicit annotations if we like. Type inference then becomes type checking.

:typed f :: Real -> Real f x::Real = x * x z::Real = 2.0 f z
(f::(Real -> Real) = (lam x::Real . (%fmul(x, x))); (z::Real = 2.0; (f z)))

Polymorphism and let generalization

Unusually for Hindley-Milner-style languages, user-supplied type annotation are mandatory for let-bound polymorphic expressions. That is, we don't do let generalization. For example, although we can write this:

:typed (lam x. x) 1
((lam x::Int . x) 1)

It's an error to write this:

:typed myid x = x myid 1
Type error:Ambiguous type variables: [?_3] (myid::(?_3 -> ?_3), (lam x::?_3 . x))

Instead, we have to give the type explicitly:

:typed myid :: a -> a myid x = x -- myid 1
(myid :: A a. (a -> a) myid = (lam x::a . x); (myid @Int 1))

The motivation for this choice is a bit subtle. It's related to the reasons for Haskell's ("dreaded") monomorphism restriction: automatic let generalization can lead to surprising runtime work duplication in the presence of polymorphism that's not purely parametric. We'll say more about it later, when we discuss index sets as types.

In the explicitly typed IR, polymorphism is represented by functions which take types as arguments. Explicit types are supplied at call sites, as in myid @Int 1 above. This type application is available (but optional) in the source language too.

:p myid :: a -> a myid x = x myid @Int 1
1

Generally, a design goal has been to make sure that all well-typed terms in the explicitly typed IR can be written in the source langauge too.

Tables (arrays)

The central data structure in Dex is the "table". We could also call it an "vector" or "array" and we'll use these terms interchangeably. A table is a mapping from indices to elements. We can build one like this:

:t [10,20,30]
(3=>Int)

Notice its type, (3=>Int). Here 3 is a type representing the index set {0, 1, 2}. The => arrow is the type constructor for table types. A table a=>b is a mapping from the index set a to the element type b. Our particular table, [10,20,30], maps 0 to 10, 1 to 20 and 2 to 30.

Index set types are quite constrainted, as we'll see later, but elements can be any type, including tables. We can construct a two-dimensional table as a table of tables:

:t [[10,20,30],[100,200,300]]
(2=>(3=>Int))

Notice rectangularity is enforced by the type system. The elements of the outer table all have the same type 3=>Int and therefore the same size. We'll get a type error if we try to construct a non-rectangular two-dimensional table:

:t [[10,20,30],[100]]
Type error: Expected: (3=>Int) Actual: (1=>Int) In: [100] :t [[10,20,30],[100]] ^^^^^

This has an important consequence for the implementation: the elements of a table all have the same size, and we can use an efficient unboxed runtime representation (i.e. flat buffers and strided indices). Later, we'll see how to represent ragged tables, with irregularly sized elements using existential types, with corresponding run-time cost.

What can we do with a table? Surely we can index into it. In Dex, we use . for indexing, so we might try to index in to a table like this:

:p i = 0 xs = [1,2,3] xs.i
Type error: Expected: 3 Actual: Int In: i xs.i ^

Unfortunately, it's a type error. i is an integer, but we need an index of type 3 (the index set {0,1,2}). The error might be surprising because we know i=0 in this case, but that's only true dynamically and the static type system doesn't know about that. If we're willing to put up with runtime errors we can cast the integer to the required index set.

:p i = 0 xs = [1,2,3] xs.(asidx i)
1

But we're better off using indices of the right type to begin with. Where would such an index come from? The main means of introducing an index is using Dex's for construct:

:p xs = [1., 2., 3.] ys = for i::3 . xs.i * xs.i ys
[1.0, 4.0, 9.0]

for is to tables as lam is to functions. It builds a new table whose elements are the result of evaluating its body at each index i in the given index set. Like lam, the type annotation (::3) is optional and can be inferred. Also like lam, we can combine the let binding with the for expression in a way that looks like mathematical index notation. Type inference gives us the same thing as if we'd annotated the binders manually.

:typed xs = [1., 2., 3.] ys.i = xs.i * xs.i ys
(xs::(3=>Real) = [1.0, 2.0, 3.0]; (ys::(3=>Real) = (for i::3 . (%fmul(xs.i, xs.i))); ys))

Index set types interact nicely with polymorphism. The language of types, with type variables, is a very natural way to express shape constraints. For example, adding two vectors of the same size:

:p addVec :: n=>Real -> n=>Real -> n=>Real addVec x y = for i. x.i + y.i -- addVec [1.0, 2.0] [0.1, 0.2]
[1.1, 2.2]

Transposing an array

:p myTranspose :: n=>m=>a -> m=>n=>a myTranspose x = for i j. x.j.i -- myTranspose [[1,2,3],[10,20,30]]
[[1, 10], [2, 20], [3, 30]]

Extracting the diagonal of a square array

:p diag :: n=>n=>a -> n=>a diag x = for i. x.i.i -- diag [[1,2],[10,20]]
[1, 20]

TODO:

  • table/function contrast (cheap to create, expensive to apply and vice versa)
  • structured index sets like pairs
  • visible type application
  • existentials, packing/unpacking
  • input and output
  • compiler internals -- normalized IR, imperative IR, LLVM
  • currying tables
  • mention prelude
  • splittable PRNG
  • automatic differentiation
  • more syntax
  • scan, fold