Domain-specific languages are very useful, and ones that allow computation are more useful still. And as it turns out, they are not that difficult to write. Assuming you are competent in Haskell, I’ll walk you through the basic steps of writing an interpreter for such a language, and you can fill in the details.

Our language will be

**interpreted**,- based on
**the lambda calculus**, with variable references, function applications, and function abstractions, **lexically scoped**, meaning that names are resolved at their use in the source code,**dynamically-typed**(or**unityped**as Robert Harper might say), meaning that there is one type for all values that is a sum of various useful types for text, numbers, lists, etc., and most importantly, functions,**pure**, meaning that functions have no side-effects, although it’s not hard to adapt this to`IO`

or some other monad if you want.

I won’t spend much time on syntax, but for familiarity, any examples of the language itself will look roughly like Haskell where possible.

### 1. Value type

You’ll need a type to represent the values your language works on, a sum of all the types of values you want to work with. Since the language is based on the lambda calculus, it will need to include a constructor for functions. For example:

```
data Value =
VBool Bool |
VText Text |
VInt Int |
VList [Value] |
VFunction (Value -> Value) |
VError String
```

Note that we’re handling errors as a kind of value (`VError`

), but there are other approaches to error-handling.

### 2. Apply function

This is a function takes a “function” value and an “argument” value and applies them. It’ll probably look something like this:

```
vapply :: Value -> Value -> Value
vapply (VFunction f) a = f a
vapply (VError err) _ = VError err
vapply _ _ = VError "not a function" -- or a more informative error message
```

Note how we pass through errors that are being used as functions. if we want to, we could also pass through errors that are being used as arguments, making all functions “strict”.

### 3. The expression type

You’ll need a type for expressions. An expressions has some set of free (unbound) variables, which can then be bound in composition with other expressions. For example:

`f a 3`

has free variables `f`

`a`

`let p=q in p`

has free variables `q`

`(\a -> a) 1`

has no free variables

Expressions with no free variables are “closed”, those with one or more are “open”.

Here’s the magic type for expressions with lexical scope:

`data Expr a = ClosedExpr a | OpenExpr Name (Expr (Value -> a))`

There are lots of variations and generalisations of this type, but this is the basic form. (`Name`

is of course the type of names of variables, probably a synonym for `String`

or `Text`

.) Here’s how it represents expressions, all of type `Expr Value`

:

text | value |
---|---|

`4` | `ClosedExpr $ VInt 4 ` |

`a` | `OpenExpr "a" $ ClosedExpr $ \a -> a ` |

`f 1` | `OpenExpr "f" $ ClosedExpr $ \f -> vapply f (VInt 1)` |

`f a` | `OpenExpr "f" $ OpenExpr "a" $ ClosedExpr $ \f a -> vapply f a` — be careful about argument order |

### 4. instance Applicative Expr

You’ll need to create `Functor`

and `Applicative`

instances for your expression type, which I’ll leave for you to do. It’s easy to get the argument ordering wrong, but we’ll catch this with tests. The `(<*>)`

function gives you the “application” operation of the lambda calculus, while `pure`

creates constant expressions.

### 5. var, abstract and let

You’ll need to create functions for the other lambda calculus operations:

```
exprVar :: Name -> Expr Value
exprAbstract :: Name -> Expr a -> Expr (Value -> a)
exprLet :: Name -> Expr Value -> Expr a -> Expr a
```

It’s easy to write `exprLet`

using `exprAbstract`

, you simply make this transformation:

`let var = bind in val`

⇒

`(\var -> val) bind`

Again, if you implement these wrong your tests will catch them.

### 6. Recursive let-binding

Of course, this let-binding operation doesn’t allow recursion. The definition cannot refer to the name being defined, nor can you have a set of definitions that refer to each other cyclically. This might be what you want, but if it isn’t, no worries, Haskell is a lazy language and it’s straightforward to implement recursive let-binding (`letrec`

) in terms of sequential let-binding (`letseq`

).

To show how, let’s first see how it works in the case of a single binding. It’s simply this transformation:

`letrec var = bind in val`

⇒

`letseq var = fix (\var -> bind) in val`

Now for a set of bindings, the idea is to gather them together into a single binding, and perform the same transformation:

`letrec var0=bind0; var1=bind1; var2=bind2 in val`

⇒

```
letseq
vars = fix (\vars -> let var0=extract0 vars; var1=extract1 vars; var2=extract2 vars in join bind0 bind1 bind2);
var0=extract0 vars;
var1=extract1 vars;
var2=extract2 vars
in val
```

Of course, you’ll need to determine a suitable type for `vars`

, with corresponding `join`

and `extractN`

functions, paying attention to strictness and irrefutability in matching.

### 7. Evaluation

Evaluation is turning an expression into a resulting value. This is pretty easy: if an expression is closed, you have a result; if it’s open, then variables have been left unbound:

```
exprEvaluate :: Monad m => Expr a -> m a
exprEvaluate (ClosedExpr a) = return a
exprEvaluate (OpenExpr name _) = fail $ "unbound variable: " ++ name -- or list all of them instead
```

### 8. Parsing, generally

So now you’ll need to parse text into expressions. The parsec library is well-suited for this; the `Parser`

monad basically does what you want. The main caveat is how it handles failure in choice:

`p <|> q`

: if `p`

fails *and consumed no input*, then `q`

(otherwise propagate `p`

‘s failure)

`try p <|> q`

: if `p`

fails then `q`

The key is to pay attention to the first character to be parsed in each branch of a choice. If they’re all different, you don’t need to use `try`

.

### 9. Parsing whitespace

First thing is to create a parser for whitespace, including comments, which we’ll call `readWS`

. After that, you can use the convention that all parsing functions munch trailing (but not leading) whitespace. I generally find these functions useful:

```
readCharAndWS :: Char -> Parser ()
readCharAndWS c = do
_ <- char c
readWS
readStringAndWS :: String -> Parser ()
readStringAndWS s = do
_ <- string s
readWS
```

Of course, your top-level parser will also have to munch leading whitespace.

```
parse :: Parser (Expr Value)
parse = do
readWS -- munch leading whitespace
readExpression
```

### 10. Parsing identifiers

It’s up to you of course, but names of variables and other identifiers typically start with a letter, followed by zero-or-more alphanumeric characters. You might want to also include underscores and some other characters. Use the character-test functions in `Data.Char`

and you’ll get Unicode support for free.

If you have keywords in your language then you must specifically test and exclude them from identifiers, otherwise you’re going to have a bad time.

Don’t forget to add a trailing `readWS`

!

### 11. Parsing expressions

You’ll want to pay careful attention to ambiguity and precedence, with functions for parsing “loose” and “tight” expressions. For numerical and other infix expressions, you can use the notion of “terms” (which contribute to a sum) and “factors” (which contribute to a product). For example, the sum `a + b * c`

is compose of two terms, the latter of which is a product of two factors. For example:

```
readExpression :: Parser (Expr Value)
readExpression = ... readTerm ...
readTerm :: Parser (Expr Value)
readTerm = ... readFactor ...
readFactor :: Parser (Expr Value)
readFactor = ... parenthetical readExpression ...
```

### 12. Predefined functions

You’ll probably want to supply a set of bindings to useful functions, that you’ll need to convert to your `Value`

type. The easiest way to do this is to create `FromValue`

and `ToValue`

classes:

```
class FromValue t where
fromValue :: Value -> Either String t
class ToValue t where
toValue :: t -> Value
-- Int example instance
instance FromValue Int where
fromValue (VInt v) = Right v
fromValue (VError err) = Left err
fromValue _ = Left "expected int"
instance ToValue Int where
toValue = VInt
```

Then you can create a system of instances that allows easy conversions of Haskell functions.

```
instance FromValue Value
instance ToValue Value
instance (ToValue a,FromValue b) => FromValue (a -> b)
instance (FromValue a,ToValue b) => ToValue (a -> b)
instance FromValue Int
instance ToValue Int
instance FromValue Text
instance ToValue Text
```

… etc.

You can then let-bind your library of values to expressions created by the user.

### 13. Putting it all together

Now you have all the pieces, the pipeline looks like this:

- Parse the user’s text to get an expression (of type
`Expr Value`

). - Let-bind the library of predefined values.
- Evaluate the result to get a value (of type
`Value`

).

And that’s it!

### 14. Tests

You’ll want hundreds of these. I use tasty-hunit as a basic test framework. The good news is that it’s very easy to write lots, as the basic form is just a language string and an expected result (fail, or pass with this value). The only awkwardness is that you probably don’t have an `instance Eq Value`

, because of its function constructor, so you’ll need to create a value-comparison function, or even an orphan `Eq`

instance just for testing.

Be sure to test

- whitespace,
- comments,
- boolean literals,
- text literals,
- numeric literals,
- list construction,
- parsing ambiguity,
- function abstraction,
- function application,
- let-binding,
- recursive let-binding,
- scoping,
- predefined functions,
- error propagation,
- anything else you can think of…

And if you’re really serious, you can use QuickCheck to generate language strings and expected results.

— Ashley Yakeley