Our language will be

**interpreted**,- based on
**the lambda calculus**, with variable references, function applications, and function abstractions, **lexically scoped**, meaning that names are resolved at their use in the source code,**dynamically-typed**(or**unityped**as Robert Harper might say), meaning that there is one type for all values that is a sum of various useful types for text, numbers, lists, etc., and most importantly, functions,**pure**, meaning that functions have no side-effects, although it’s not hard to adapt this toIO

or some other monad if you want.

I won’t spend much time on syntax, but for familiarity, any examples of the language itself will look roughly like Haskell where possible.

You’ll need a type to represent the values your language works on, a sum of all the types of values you want to work with. Since the language is based on the lambda calculus, it will need to include a constructor for functions. For example:

data Value = VBool Bool | VText Text | VInt Int | VList [Value] | VFunction (Value -> Value) | VError String

Note that we’re handling errors as a kind of value (

VError), but there are other approaches to error-handling.

This is a function takes a “function” value and an “argument” value and applies them. It’ll probably look something like this:

vapply :: Value -> Value -> Value vapply (VFunction f) a = f a vapply (VError err) _ = VError err vapply _ _ = VError "not a function" -- or a more informative error message

Note how we pass through errors that are being used as functions. if we want to, we could also pass through errors that are being used as arguments, making all functions “strict”.

You’ll need a type for expressions. An expressions has some set of free (unbound) variables, which can then be bound in composition with other expressions. For example:

f a 3has free variables

f

a

let p=q in phas free variables

q

(\a -> a) 1has no free variables

Expressions with no free variables are “closed”, those with one or more are “open”.

Here’s the magic type for expressions with lexical scope:

data Expr a = ClosedExpr a | OpenExpr Name (Expr (Value -> a))

There are lots of variations and generalisations of this type, but this is the basic form. (

Nameis of course the type of names of variables, probably a synonym for

Stringor

Text.) Here’s how it represents expressions, all of type

Expr Value:

text | value |
---|---|

4 |
ClosedExpr $ VInt 4 |

a |
OpenExpr "a" $ ClosedExpr $ \a -> a |

f 1 |
OpenExpr "f" $ ClosedExpr $ \f -> vapply f (VInt 1) |

f a |
OpenExpr "f" $ OpenExpr "a" $ ClosedExpr $ \f a -> vapply f a— be careful about argument order |

You’ll need to create

Functorand

Applicativeinstances for your expression type, which I’ll leave for you to do. It’s easy to get the argument ordering wrong, but we’ll catch this with tests. The

(<*>)function gives you the “application” operation of the lambda calculus, while

purecreates constant expressions.

You’ll need to create functions for the other lambda calculus operations:

exprVar :: Name -> Expr Value exprAbstract :: Name -> Expr a -> Expr (Value -> a) exprLet :: Name -> Expr Value -> Expr a -> Expr a

It’s easy to write

exprLetusing

exprAbstract, you simply make this transformation:

let var = bind in val

⇒

(\var -> val) bind

Again, if you implement these wrong your tests will catch them.

Of course, this let-binding operation doesn’t allow recursion. The definition cannot refer to the name being defined, nor can you have a set of definitions that refer to each other cyclically. This might be what you want, but if it isn’t, no worries, Haskell is a lazy language and it’s straightforward to implement recursive let-binding (

letrec) in terms of sequential let-binding (

letseq).

To show how, let’s first see how it works in the case of a single binding. It’s simply this transformation:

letrec var = bind in val

⇒

letseq var = fix (\var -> bind) in val

Now for a set of bindings, the idea is to gather them together into a single binding, and perform the same transformation:

letrec var0=bind0; var1=bind1; var2=bind2 in val

⇒

letseq vars = fix (\vars -> let var0=extract0 vars; var1=extract1 vars; var2=extract2 vars in join bind0 bind1 bind2); var0=extract0 vars; var1=extract1 vars; var2=extract2 vars in val

Of course, you’ll need to determine a suitable type for

vars, with corresponding

joinand

extractNfunctions, paying attention to strictness and irrefutability in matching.

Evaluation is turning an expression into a resulting value. This is pretty easy: if an expression is closed, you have a result; if it’s open, then variables have been left unbound:

exprEvaluate :: Monad m => Expr a -> m a exprEvaluate (ClosedExpr a) = return a exprEvaluate (OpenExpr name _) = fail $ "unbound variable: " ++ name -- or list all of them instead

So now you’ll need to parse text into expressions. The parsec library is well-suited for this; the

Parsermonad basically does what you want. The main caveat is how it handles failure in choice:

p <|> q: if

pfails

q(otherwise propagate

p’s failure)

try p <|> q: if

pfails then

q

The key is to pay attention to the first character to be parsed in each branch of a choice. If they’re all different, you don’t need to use

try.

First thing is to create a parser for whitespace, including comments, which we’ll call

readWS. After that, you can use the convention that all parsing functions munch trailing (but not leading) whitespace. I generally find these functions useful:

readCharAndWS :: Char -> Parser () readCharAndWS c = do _ <- char c readWS readStringAndWS :: String -> Parser () readStringAndWS s = do _ <- string s readWS

Of course, your top-level parser will also have to munch leading whitespace.

parse :: Parser (Expr Value) parse = do readWS -- munch leading whitespace readExpression

It’s up to you of course, but names of variables and other identifiers typically start with a letter, followed by zero-or-more alphanumeric characters. You might want to also include underscores and some other characters. Use the character-test functions in

Data.Charand you’ll get Unicode support for free.

If you have keywords in your language then you must specifically test and exclude them from identifiers, otherwise you’re going to have a bad time.

Don’t forget to add a trailing

readWS!

You’ll want to pay careful attention to ambiguity and precedence, with functions for parsing “loose” and “tight” expressions. For numerical and other infix expressions, you can use the notion of “terms” (which contribute to a sum) and “factors” (which contribute to a product). For example, the sum

a + b * cis compose of two terms, the latter of which is a product of two factors. For example:

readExpression :: Parser (Expr Value) readExpression = ... readTerm ... readTerm :: Parser (Expr Value) readTerm = ... readFactor ... readFactor :: Parser (Expr Value) readFactor = ... parenthetical readExpression ...

You’ll probably want to supply a set of bindings to useful functions, that you’ll need to convert to your

Valuetype. The easiest way to do this is to create

FromValueand

ToValueclasses:

class FromValue t where fromValue :: Value -> Either String t class ToValue t where toValue :: t -> Value -- Int example instance instance FromValue Int where fromValue (VInt v) = Right v fromValue (VError err) = Left err fromValue _ = Left "expected int" instance ToValue Int where toValue = VInt

Then you can create a system of instances that allows easy conversions of Haskell functions.

instance FromValue Value instance ToValue Value instance (ToValue a,FromValue b) => FromValue (a -> b) instance (FromValue a,ToValue b) => ToValue (a -> b) instance FromValue Int instance ToValue Int instance FromValue Text instance ToValue Text

… etc.

You can then let-bind your library of values to expressions created by the user.

Now you have all the pieces, the pipeline looks like this:

- Parse the user’s text to get an expression (of type
Expr Value

). - Let-bind the library of predefined values.
- Evaluate the result to get a value (of type
Value

).

And that’s it!

You’ll want hundreds of these. I use tasty-hunit as a basic test framework. The good news is that it’s very easy to write lots, as the basic form is just a language string and an expected result (fail, or pass with this value). The only awkwardness is that you probably don’t have an

instance Eq Value, because of its function constructor, so you’ll need to create a value-comparison function, or even an orphan

Eqinstance just for testing.

Be sure to test

- whitespace,
- comments,
- boolean literals,
- text literals,
- numeric literals,
- list construction,
- parsing ambiguity,
- function abstraction,
- function application,
- let-binding,
- recursive let-binding,
- scoping,
- predefined functions,
- error propagation,
- anything else you can think of…

And if you’re really serious, you can use QuickCheck to generate language strings and expected results.

]]>openFile :: FilePath -> IOMode -> IO Handle hClose :: Handle -> IO ()

… or you can use a function that takes care of the open-close lifecycle:

withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

Which is the better interface? It’s a trade-off:

withFileguarantees that every open file is closed, while

openFileand

hCloseallow you to open and close files in any order.

Let’s create types to represent these two kinds of interface:

type OpenClose h = IO (h,IO ()) -- opens a resource, returns a handle and a "closer" type With h = forall r. (h -> IO r) -> IO r openCloseFile :: FilePath -> IOMode -> OpenClose Handle openCloseFile path mode = do h <- openFile path mode return (h, hClose h) withFile :: FilePath -> IOMode -> With Handle -- already defined

Is it possible to convert between the two? Here’s how to obtain a

Withinterface from an

OpenCloseinterface:

openCloseToWith :: OpenClose h -> With h openCloseToWith oc f = do (h, closer) <- oc finally (f h) closer

It’s also possible to obtain an

OpenCloseinterface from a

Withinterface. Can you see how?

withToOpenClose :: With h -> OpenClose h withToOpenClose = ...?]]>

- Haskell is the promise that you can write it as cleanly as your understanding of it. Have faith.
- Always be looking for patterns. Abstract them always and only when it simplifies.
- Persevere in getting an abstraction just right. When you find it, everything will magically fall into place.
- The implementation is the design.
- Hide whatever the caller shouldn’t care about. In particular, you can remove type parameters with appropriate quantification.
- There’s a reason
fst3

,snd3

,thd3

are not in base. Triple or bigger: create a data type with fields instead. - Never make an instance unless it morally follows the class’s rules.
- Instances of simpler classes are more valuable than of more complex generalisations.
Monoid

: not a flashy class, but still definitely worthwhile.Applicative

: hugely underappreciated, and good for types that have “static information”. Lets you do deep magic withtraverse

andsequenceA

.Monad

: potential instances are usually easy to spot.- The fewer type parameters in a class, the better. Can you turn any into associated types? Can you split the class into two classes? Can you hive off some of the parameters into a superclass?
- Don’t worry about strictness until it’s time to optimise.
- Intuition about optimisation tends to be bad. Before profiling, limit yourself to reasoning about complexity classes.
- An orphan instance is a very minor wart.
- Types themselves are weightless (i.e., erased). For example,
*

dropped from kind to type (withTypeInType

) carries no information. - You probably won’t need
IORef

. - Template Haskell comes with a high cost in intelligibility. Sometimes it’s worth it.

— Ashley Yakeley

]]>Note: writing in Agda is *very slow*.

We first discover the category of Haskell types and functions, predictably called **Hask**. This is a very nice sort of category: it has a terminal object (the unit type), products (pair tuple types) and exponentiation (function types), indeed a well-known sort of category known as *Cartesian closed*. For good measure, it also has sums (Either types) and an initial object (any empty type). Fabulous! A programming language with the beauty and rigour of mathematics.

Then we discover that almost none of this is true. **Hask** is indeed a category, but it does not have products or sums, and the unit type is not a terminal object, and there is no initial object. It turns out we actually had a *different* category in mind, a kind of **IdealHask**. In our **IdealHask**, typically, all functions terminate. In the actual **Hask**, every object is a Haskell type, and every Haskell type includes “bottom values” representing non-termination and other awkward conditions. (There’s actually a faithful functor from **IdealHask** to **Hask** that allows us to write our programs.)

Oh dear. These “extra” values have ruined everything. Dan Piponi was right, you shouldn’t have even asked.

But wait! As it happens, simple non-termination can be represented by *topologies*, which are awesome. We associate a topology with every type, and the definable functions are exactly the continuous ones. For example, the type

Boolnominally has two values,

Trueand

False. We add non-termination as a third value

bottom. The topology we define on this has these open sets: {{}, {

True}, {

False}, {

True,

False}, {

True,

False,

bottom}}. The eleven definable functions are the nine strict ones (i.e., where

f bottom=

bottom):

\x -> if x then True else True \x -> if x then True else False \x -> if x then True else bottom \x -> if x then False else True \x -> if x then False else False \x -> if x then False else bottom \x -> if x then bottom else True \x -> if x then bottom else False \x -> if x then bottom else bottom

and the two non-strict ones:

\_ -> True \_ -> False

These are precisely the continuous functions from our topological space to itself. And it seems perhaps straightforward to generalise this to any type?

Yay topology! We have some of our mathematicalliness back! We think.

Ah, but Haskell types are more complicated than this. There are many different bottom values, that are included in every type. Non-termination is one, but there are also *exceptions*. Plus, there’s a function called

mapExceptionthat can inspect exception values and arbitrarily map them to bottom values. Can we create a topology for each Haskell type such that the definable functions are exactly the continuous ones?

Sadly, we cannot. To show this, let us consider different models of Haskell’s unit type

(), and what can happen when we attempt to evaluate a term of that type, which we’ll collectively refer to as

instance Eq (a -> b)”.

{-# LANGUAGE ScopedTypeVariables, TypeSynonymInstances, FlexibleInstances, OverlappingInstances #-} module Main where import Control.Applicative (liftA,liftA2,liftA3) import Data.List (intercalate) import Data.Countable import Data.Searchable

First some set theory:

implies :: Bool -> Bool -> Bool implies False _ = True implies True x = x type Set x = x -> Bool -- is it a member? empty :: Set x empty _ = False full :: Set x full _ = True union :: Set x -> Set x -> Set x union p q a = p a || q a intersection :: Set x -> Set x -> Set x intersection p q a = p a && q a instance (Finite x,Show x) => Show (Set x) where show s = "{" ++ (intercalate "," (fmap show (filter s allValues))) ++ "}"

And this is how we do topology:

-- Given a function, find the preimage of a set. preimage :: (x -> y) -> Set y -> Set x preimage f s a = s (f a) type Topology x = Set (Set x) -- set of open sets isTopology :: (Countable x) => Topology x -> Bool isTopology top = hasEmpty && hasFull && hasIntersections && hasUnions where hasEmpty = top empty hasFull = top full hasIntersections = forevery ((p,q) -> (top p && top q) `implies` top (intersection p q)) hasUnions = forevery ((p,q) -> (top p && top q) `implies` top (union p q)) -- Given topologies on x and y, find the set of continuous functions continuous :: (Countable y) => Topology x -> Topology y -> Set (x -> y) continuous topx topy f = forevery (set -> implies (topy set) (topx (preimage f set)))

Now we need to model Haskell types in Haskell:

class (Finite x) => Model x where -- definable functions in this model definable :: Set (x -> x)

The simplest model of the unit type has only one possibility, termination with the unit value, in a language where all functions terminate. We’ll call that model U1, and its sole possibility T1. The only possible function there can be maps T1 to T1, and it is definable:

-- U1 models the unit type in a strict language -- T1 represents the only possibility, termination with the unit value data U1 = T1 deriving (Eq, Show) instance Countable U1 where countPrevious = finiteCountPrevious countMaybeNext = finiteCountMaybeNext instance Searchable U1 where search = finiteSearch instance Finite U1 where allValues = [T1] assemble afb = liftA (t a -> case a of T1 -> t ) (afb T1) instance Model U1 where definable f = af (f T1) where af T1 = True -- there is only one possible function, which is definable

Our second model, U2, adds a non-termination possibility N2 (termination with the unit value is T2). A function is definable if either

- it ignores its input and behaves the same whatever it is passed
- it examines its input and therefore doesn’t terminate when the input doesn’t terminate

-- U2 represents the unit type in a lazy language with no exceptions besides non-termination -- T2 is termination with the unit value, -- N2 is non-termination data U2 = T2 | N2 deriving (Eq, Show) instance Countable U2 where countPrevious = finiteCountPrevious countMaybeNext = finiteCountMaybeNext instance Searchable U2 where search = finiteSearch instance Finite U2 where allValues = [T2,N2] assemble afb = liftA2 (t n a -> case a of T2 -> t N2 -> n ) (afb T2) (afb N2) instance Model U2 where definable f = af (f T2) (f N2) where af T2 T2 = True -- f _ = always termination af N2 N2 = True -- f _ = always non-termination af T2 N2 = True -- f x = x af N2 T2 = False -- "inversion", not definable

Our third model, U3, adds an “exception” possibility. An exception is a termination, but still considered by Haskell to be “bottom”. The previous rule applies, but in addition, Haskell ensures that if a function examines its input, it must yield a bottom when passed a bottom. However, it’s allowed to map exception to a different bottom using

mapException.

-- U3 represents the unit type in a lazy language with an additional "exception" possibility -- T3 is termination with the unit value, -- E3 is termination with the exception -- N3 is non-termination -- (E3 and N3 are both "bottoms") data U3 = T3 | E3 | N3 deriving (Eq, Show) instance Countable U3 where countPrevious = finiteCountPrevious countMaybeNext = finiteCountMaybeNext instance Searchable U3 where search = finiteSearch instance Finite U3 where allValues = [T3,E3,N3] assemble afb = liftA3 (t e n a -> case a of T3 -> t E3 -> e N3 -> n ) (afb T3) (afb E3) (afb N3) instance Model U3 where definable f = af (f T3) (f E3) (f N3) where -- functions that ignore the argument are definable af t e n | n == e && n == t = True -- otherwise, the function must not terminate when passed non-termination: af _ _ T3 = False af _ _ E3 = False -- more generally, the function must return a bottom when passed a bottom: af _ T3 _ = False -- all other functions are definable af _ _ _ = True

Now we want to search for topologies. Thanks to the countable package, we can do this exhaustively.

-- this topology matches if its continous functions are exactly the definable functions matching :: (Model x) => Topology x -> Bool matching top = (continuous top top) == definable -- all matching topologies goodtops :: (Model x) => [Topology x] goodtops = filter (top -> matching top && isTopology top) allValues printOnLines :: (Show s) => [s] -> IO () printOnLines = mapM_ (putStrLn . show) main :: IO () main = do putStrLn "Topologies for U1:" printOnLines (goodtops :: [Topology U1]) putStrLn "Topologies for U2:" printOnLines (goodtops :: [Topology U2]) putStrLn "Topologies for U3:" printOnLines (goodtops :: [Topology U3])

And that’s it. This is what we get when we run our program:

Topologies for U1: {{},{T1}} Topologies for U2: {{},{T2},{T2,N2}} {{},{N2},{T2,N2}} Topologies for U3:

U1 has the one-point topology. U2 has the Sierpiński topologies. But there is no matching topology for U3.

So what about the *actual* Haskell unit type? Consider the equivalence relation R with three equivalence classes: the unit value, non-termination, and all exceptions, corresponding to the possibilities of U3. For each “definable” function U3 -> U3, we can define a Haskell function () -> () that corresponds to it under R. For “disallowed” functions, we cannot.

So if there were a topology X on the Haskell unit type such that the continuous functions were the definable ones, then X/R would be a quotient topology on U3 where the continuous functions were the definable ones. But this doesn’t exist, so that doesn’t either.

]]>