Domain-specific languages are very useful, and ones that allow computation are more useful still. And as it turns out, they are not that difficult to write. Assuming you are competent in Haskell, I’ll walk you through the basic steps of writing an interpreter for such a language, and you can fill in the details.
Our language will be
- interpreted,
- based on the lambda calculus, with variable references, function applications, and function abstractions,
- lexically scoped, meaning that names are resolved at their use in the source code,
- dynamically-typed (or unityped as Robert Harper might say), meaning that there is one type for all values that is a sum of various useful types for text, numbers, lists, etc., and most importantly, functions,
- pure, meaning that functions have no side-effects, although it’s not hard to adapt this to
IO
or some other monad if you want.
I won’t spend much time on syntax, but for familiarity, any examples of the language itself will look roughly like Haskell where possible.
1. Value type
You’ll need a type to represent the values your language works on, a sum of all the types of values you want to work with. Since the language is based on the lambda calculus, it will need to include a constructor for functions. For example:
data Value =
VBool Bool |
VText Text |
VInt Int |
VList [Value] |
VFunction (Value -> Value) |
VError String
Note that we’re handling errors as a kind of value (VError
), but there are other approaches to error-handling.
2. Apply function
This is a function takes a “function” value and an “argument” value and applies them. It’ll probably look something like this:
vapply :: Value -> Value -> Value
vapply (VFunction f) a = f a
vapply (VError err) _ = VError err
vapply _ _ = VError "not a function" -- or a more informative error message
Note how we pass through errors that are being used as functions. if we want to, we could also pass through errors that are being used as arguments, making all functions “strict”.
3. The expression type
You’ll need a type for expressions. An expressions has some set of free (unbound) variables, which can then be bound in composition with other expressions. For example:
f a 3
has free variables f
a
let p=q in p
has free variables q
(\a -> a) 1
has no free variables
Expressions with no free variables are “closed”, those with one or more are “open”.
Here’s the magic type for expressions with lexical scope:
data Expr a = ClosedExpr a | OpenExpr Name (Expr (Value -> a))
There are lots of variations and generalisations of this type, but this is the basic form. (Name
is of course the type of names of variables, probably a synonym for String
or Text
.) Here’s how it represents expressions, all of type Expr Value
:
text | value |
---|---|
4 | ClosedExpr $ VInt 4 |
a | OpenExpr "a" $ ClosedExpr $ \a -> a |
f 1 | OpenExpr "f" $ ClosedExpr $ \f -> vapply f (VInt 1) |
f a | OpenExpr "f" $ OpenExpr "a" $ ClosedExpr $ \f a -> vapply f a — be careful about argument order |
4. instance Applicative Expr
You’ll need to create Functor
and Applicative
instances for your expression type, which I’ll leave for you to do. It’s easy to get the argument ordering wrong, but we’ll catch this with tests. The (<*>)
function gives you the “application” operation of the lambda calculus, while pure
creates constant expressions.
5. var, abstract and let
You’ll need to create functions for the other lambda calculus operations:
exprVar :: Name -> Expr Value
exprAbstract :: Name -> Expr a -> Expr (Value -> a)
exprLet :: Name -> Expr Value -> Expr a -> Expr a
It’s easy to write exprLet
using exprAbstract
, you simply make this transformation:
let var = bind in val
⇒
(\var -> val) bind
Again, if you implement these wrong your tests will catch them.
6. Recursive let-binding
Of course, this let-binding operation doesn’t allow recursion. The definition cannot refer to the name being defined, nor can you have a set of definitions that refer to each other cyclically. This might be what you want, but if it isn’t, no worries, Haskell is a lazy language and it’s straightforward to implement recursive let-binding (letrec
) in terms of sequential let-binding (letseq
).
To show how, let’s first see how it works in the case of a single binding. It’s simply this transformation:
letrec var = bind in val
⇒
letseq var = fix (\var -> bind) in val
Now for a set of bindings, the idea is to gather them together into a single binding, and perform the same transformation:
letrec var0=bind0; var1=bind1; var2=bind2 in val
⇒
letseq
vars = fix (\vars -> let var0=extract0 vars; var1=extract1 vars; var2=extract2 vars in join bind0 bind1 bind2);
var0=extract0 vars;
var1=extract1 vars;
var2=extract2 vars
in val
Of course, you’ll need to determine a suitable type for vars
, with corresponding join
and extractN
functions, paying attention to strictness and irrefutability in matching.
7. Evaluation
Evaluation is turning an expression into a resulting value. This is pretty easy: if an expression is closed, you have a result; if it’s open, then variables have been left unbound:
exprEvaluate :: Monad m => Expr a -> m a
exprEvaluate (ClosedExpr a) = return a
exprEvaluate (OpenExpr name _) = fail $ "unbound variable: " ++ name -- or list all of them instead
8. Parsing, generally
So now you’ll need to parse text into expressions. The parsec library is well-suited for this; the Parser
monad basically does what you want. The main caveat is how it handles failure in choice:
p <|> q
: if p
fails and consumed no input, then q
(otherwise propagate p
‘s failure)
try p <|> q
: if p
fails then q
The key is to pay attention to the first character to be parsed in each branch of a choice. If they’re all different, you don’t need to use try
.
9. Parsing whitespace
First thing is to create a parser for whitespace, including comments, which we’ll call readWS
. After that, you can use the convention that all parsing functions munch trailing (but not leading) whitespace. I generally find these functions useful:
readCharAndWS :: Char -> Parser ()
readCharAndWS c = do
_ <- char c
readWS
readStringAndWS :: String -> Parser ()
readStringAndWS s = do
_ <- string s
readWS
Of course, your top-level parser will also have to munch leading whitespace.
parse :: Parser (Expr Value)
parse = do
readWS -- munch leading whitespace
readExpression
10. Parsing identifiers
It’s up to you of course, but names of variables and other identifiers typically start with a letter, followed by zero-or-more alphanumeric characters. You might want to also include underscores and some other characters. Use the character-test functions in Data.Char
and you’ll get Unicode support for free.
If you have keywords in your language then you must specifically test and exclude them from identifiers, otherwise you’re going to have a bad time.
Don’t forget to add a trailing readWS
!
11. Parsing expressions
You’ll want to pay careful attention to ambiguity and precedence, with functions for parsing “loose” and “tight” expressions. For numerical and other infix expressions, you can use the notion of “terms” (which contribute to a sum) and “factors” (which contribute to a product). For example, the sum a + b * c
is compose of two terms, the latter of which is a product of two factors. For example:
readExpression :: Parser (Expr Value)
readExpression = ... readTerm ...
readTerm :: Parser (Expr Value)
readTerm = ... readFactor ...
readFactor :: Parser (Expr Value)
readFactor = ... parenthetical readExpression ...
12. Predefined functions
You’ll probably want to supply a set of bindings to useful functions, that you’ll need to convert to your Value
type. The easiest way to do this is to create FromValue
and ToValue
classes:
class FromValue t where
fromValue :: Value -> Either String t
class ToValue t where
toValue :: t -> Value
-- Int example instance
instance FromValue Int where
fromValue (VInt v) = Right v
fromValue (VError err) = Left err
fromValue _ = Left "expected int"
instance ToValue Int where
toValue = VInt
Then you can create a system of instances that allows easy conversions of Haskell functions.
instance FromValue Value
instance ToValue Value
instance (ToValue a,FromValue b) => FromValue (a -> b)
instance (FromValue a,ToValue b) => ToValue (a -> b)
instance FromValue Int
instance ToValue Int
instance FromValue Text
instance ToValue Text
… etc.
You can then let-bind your library of values to expressions created by the user.
13. Putting it all together
Now you have all the pieces, the pipeline looks like this:
- Parse the user’s text to get an expression (of type
Expr Value
). - Let-bind the library of predefined values.
- Evaluate the result to get a value (of type
Value
).
And that’s it!
14. Tests
You’ll want hundreds of these. I use tasty-hunit as a basic test framework. The good news is that it’s very easy to write lots, as the basic form is just a language string and an expected result (fail, or pass with this value). The only awkwardness is that you probably don’t have an instance Eq Value
, because of its function constructor, so you’ll need to create a value-comparison function, or even an orphan Eq
instance just for testing.
Be sure to test
- whitespace,
- comments,
- boolean literals,
- text literals,
- numeric literals,
- list construction,
- parsing ambiguity,
- function abstraction,
- function application,
- let-binding,
- recursive let-binding,
- scoping,
- predefined functions,
- error propagation,
- anything else you can think of…
And if you’re really serious, you can use QuickCheck to generate language strings and expected results.
— Ashley Yakeley