Pinafore 0.1

We all generate a lot of information in our lives and in our work:

contacts, events, emails, tasks, photos, plans, budgets, financial records, media collections…

Computers are supposed to help us organise it all. So how’s that working out?

For the most part, we use application programs. Each application program works with information of a particular kind, with a polished and specific user interface. However, it fixes a particular schema for that information, and it is typically difficult to combine information from multiple applications.

Alternatively, we can work with information more loosely and flexibly in a spreadsheet. However, this provides a more limited user interface experience, and the very looseness makes it difficult to reason abstractly about the type and schema of the information.

Pinafore is an attempt to reimagine how computers represent and store information, and how users interact with it. It allows users to create their own schemas for information, and create their own interfaces to it.

This 0.1 release includes some of the major pieces: a type system for information, a language, a storage system, some composable user interface elements. Enough to get the gist of the project, and perhaps suggest some future possibilities. But much more needs to be done.

Language

Pinafore is an interpreted language. Information is stored as predicate/subject/object triples in a database in your home directory, and the user interfaces are created with GTK+.

Pinafore generally resembles Haskell. It has a type system derived from Hindley-Milner, and features pattern-matching, lazy evaluation, and separation of pure functions from executable actions. There are some differences however:

  • There is no “top level”. A Pinafore file consists of a single expression. Type declarations, like bindings, are declared within let expressions.
  • Layout is not syntactically significant. Instead, lines are terminated by semicolons, and do and case expressions are terminated with the end keyword.
  • The colon is used for type signatures, while the double colon is used for list construction, the other way around from Haskell.
  • Line comments start with #, while (nestable) block comments are marked with {# and #}.
  • Only one equation is allowed for a function definition. Argument patterns can be matched with case expressions.

Of course, there are many features of Haskell that Pinafore lacks, and vice versa.

Type System

Pinafore is a strongly-typed language. The type system implements Stephen Dolan’s Algebraic Subtyping, which is an extension of Hindley-Milner to allow subtyping. This type system is decidable: if an expression has a type, Pinafore can always infer a principal type for it. Like Haskell, however, you can also add type signatures to definitions.

A subtype relation P <: Q is a relationship between two types, “P is a subtype of Q”, which simply means “every P is a Q”, or “allow a P where a Q is expected”. Of course, this implies an inclusion function P → Q that actually converts the P to the Q. These functions do not have to be injective, nor does there need to be any kind of reverse function Q → Maybe P, though these do exist in Pinafore in some cases.

Two types are equivalent if each is a subtype of the other.

Polarity

The type system distinguishes positive and negative types. This is necessary, because certain type operations are only permitted with certain polarities:

  • A positive type is a type that can appear in a positive position. Think of this as the type of a value you’ve defined in your program. The type signature of a value is such a positive position.
  • A negative type is a type that can appear in a negative position. Think of this as the type of acceptance of values, such as the argument of a function type (that is itself positive).
  • An ambipolar type is a type that is both positive and negative. This includes simple types such as Text, Integer, and so forth.
  • If P and Q are positive types, then P | Q is a positive type. You can read this as “a P or a Q, not telling you which”. As you might expect:
    • P <: P | Q
    • Q <: P | Q
    • If P <: R and Q <: R, then P | Q <: R.
  • If P and Q are negative types, then P & Q is a negative type. Think of this as “must be both a P and a Q“. Likewise:
    • P & Q <: P
    • P & Q <: Q
    • If R <: P and R <: Q, then R <: P & Q.
  • None is a positive type, that is empty (and is a subtype of every type). None | P = P.
  • Any is a negative type, that accepts anything (and is a supertype of every type). Any & P = P.

Here are some examples of expressions with the principal types that Pinafore will infer. Note that Pinafore uses a single rather than double colon for type signatures:

\x -> 3: Any -> Integer
Nothing: Maybe None
\b -> if b then "hi" else 3: Boolean -> (Text | Integer)
\x -> x + textlength x: (Integer & Text) -> Integer

Type Constructors

Like Haskell, Pinafore has type constructors such as Maybe, [] (list), (,) (pair), -> (function) and so on. But all type parameters must be types (i.e. as if of Haskell’s kind *): in addition, each parameter must be either covariant or contravariant. This gives subtype relations. For example, suppose F is a type constructor with one argument:

  • If F is covariant in its argument, then,
    • F x has the same polarity as x.
    • P <: Q implies F P <: F Q.
  • If F is contravariant in its argument, then,
    • F x has the opposite polarity as x.
    • P <: Q implies F Q <: F P.

Of course, some types are, morally, neither contravariant nor covariant in their arguments. For these we use a pair of type parameters in a special syntax, one contravariant (marked with -) and one covariant (marked with +). For example, the WholeRef type constructor represents references with get and set operations:

WholeRef {-p,+q}
get: WholeRef {-p,+q} -> Action q
(:=): WholeRef {-p,+q} -> p -> Action ()

Pinafore has some abbreviations to make working with these a little easier, e.g. WholeRef T = WholeRef {-T,+T}, and WholeRef +T = WholeRef {-Any,+T}, etc.

Recursive Types

Pinafore has “equirecursive” types, written in the form rec v. T, where T is a type expression where v appears only covariantly. The key fact of recursive types is that they are equivalent to their unrolling. For example, these two types are equivalent:

rec a. (a, Maybe a)
(rec a. (a, Maybe a), Maybe (rec a. (a, Maybe a))

Recursive types are necessary for principality (that all typeable expressions have a principal type), though they’re not much used in practice.

Data Types

Pinafore allows you to create your own algebraic data types, like the data keyword in Haskell. Here’s an example:

datatype StopwatchState = StoppedState Duration | RunningState Time;

Typed Storage

Pinafore stores “knowledge” as relationships between entities of various types. The relationship types are called morphisms, which can be composed as the name suggests.

Types of entities are all subtypes of the Entity type. These include:

  • Literal types for small pieces of data, such as Integer, Number, Boolean etc.
  • Open entity types, that simply represent arbitrary points, declared with the opentype keyword. Values of open entity types can be declared statically (with an anchor) with the openEntity keyword, or generated at run-time with the newOpenEntity function.
  • Closed entity types, that have constructors, declared with the closedtype keyword. These are similar to data types, except that each constructor has an anchor, and the contained types must themselves be entity types.

Anchors are 256-bit values usually hashed from a literal string in your program. Pinafore erases types when storing information in its storage: it does not store the structure of types nor does it store which values have which types. Instead, it uses anchors to identify information in storage.

Open Entity Example

Here’s an example. Let us suppose to store two relationships concerning people:

  • The name of some person p is “James”.
  • The mother of p is some person q.

Firstly, we will need a type for people. This can be an open entity type: it has no information of its own besides identity: all information about people comes from morphisms.

opentype Person;

We also need properties for “name” and “mother”. We need to give these anchors, since this is what will identify them in storage, not the names of the language bindings we happen to use.

“Name” is a property from Person to Text, because the name of a person is text. We give it the anchor !"myschema.name".

“Mother” is a property from Person to Person, because the mother of a person is a person. We give it the anchor !"myschema.mother".

Properties are morphisms, so the type of them is a morphism type, indicated by ~>. In fact, properties generate morphisms: you can compose morphisms together that are strings of properties. In this case, you can compose these two to get a morphism for “name of mother”.

name: Person ~> Text;
name = property @Person @Text !"myschema.name";
mother: Person ~> Person;
mother = property @Person @Person !"myschema.mother";

Now we need entities p and q. These might be generated at run-time or obtained elsewhere, but here we’ll declare them statically. Again, it is the anchor that identifies them in storage, not the bindings p and q.

p: Person;
p = openEntity @Person !"someperson";
q: Person;
q = openEntity @Person !"otherperson";

Actually storing the relationships is an Action (similar to IO in Haskell). And like Haskell, Pinafore has do notation to make working with actions easier:

do
    name !$ {p} := "James";
    mother !$ {p} := q;
end

The !$ operator applies a morphism to reference to get another reference. Since p is an entity, not a reference, we must first convert it to a reference using “reference notation” {p}.

The := notation sets the value of a reference.

Here’s a typed breakdown of that first action:

p: Person
{p}: WholeRef +Person
name !$ {p}: WholeRef Text
name !$ {p} := "James": Action ()

Here’s what it looks like put altogether:

let
    opentype Person;
    name: Person ~> Text;
    name = property @Person @Text !"myschema.name";
    mother: Person ~> Person;
    mother = property @Person @Person !"myschema.mother";
    p: Person;
    p = openEntity @Person !"someperson";
    q: Person;
    q = openEntity @Person !"otherperson";
in do
    name !$ {p} := "James";
    mother !$ {p} := q;
end

This is a complete Pinafore program. Running it will store those two relations in Pinafore’s persistent storage.

Of course, we can combine the morphisms in other ways:

# everyone who's mother's name is Kate
(name !. mother) !@ {"Kate"}

# the (name, mother) pair of p
(name !** mother) !$ {p}

It is important to note that Pinafore does not store entities per se, it stores relations between entities. One cannot, for example, retrieve all entities of type Person. Types of entities such as Person are erased for storage. However, one can retrieve all entities that have name “James”, or whose mother is some entity q.

Closed Entity Types

Closed entity types resemble data types, but they are all subtypes of Entity, and so can be stored. Every constructor of a closed entity type includes an anchor, to identify that constructor in storage. Here’s an example:

closedtype CelestialLocation
    = EquatorialLoc Number Number !"EquatorialLoc"
    | EclipticLoc Number Number !"EclipticLoc";

Reinterpretability

Any item of information retrieved from storage can be “unknown”, and Pinafore is robust with regards to what it happens to find in storage. This gives a certain amount of flexibility in modifying an existing “schema”, or system of entity types, without having to transform data in storage. For example, if you remove a property from your program, Pinafore will simply ignore that information in storage. If you add a property, Pinafore will initially find all values of that property to be “unknown”. Constructors can be added and removed from closed entity types. If Pinafore finds something it doesn’t recognise or cannot parse as it expects, it treats it as “unknown”.

Composable User Interface Models

In terms of the Model/View/Controller way of looking at user interface, the view and controller are represented by user interface elements (of type UI), while the model is represented by references (of type WholeRef, SetRef, and FiniteSetRef).

A reference represents the state of some thing. The user may wish to retrieve some part of that state, or make some change to it, or be notified when it changes. References are thus “live”: when connected to a UI element, the user can use the UI to change the reference, but also the reference can update the UI when its state changes.

  • Whole references (WholeRef) represent a single value (which might be “unknown”). There are operations for getting, setting, and deleting (making unknown).
  • Set references (SetRef) represent some arbitrary set or predicate of some type. There are operations for adding and removing members, and for checking membership of some value.
  • Finite set references (FiniteSetRef) are set references that have a finite number of members, that can be retrieved.

The various kinds of references can be composed in various ways, such as various set operations (union, intersection, Cartesian product and sum).

Pinafore’s “reference notation” makes working with whole references a little easier. For example, given two whole references to integers, we can create a new whole reference that is the sum of them:

p: WholeRef Integer;
q: WholeRef Integer;
pq: WholeRef +Integer;
pq = {%p + %q};

Whenever p or q updates, then pq also updates. Of course, pq is read-only: attempts to set it will fail.

User Interface Elements

User interface elements are things such as text areas, buttons, check boxes, and tables. They are constructed from the references they control, and can be composed by horizontal or vertical layout, and put in windows.

Release

Version 0.1 of Pinafore is available from Github as a Debian package.

The Pinafore website has all the documentation.

— Ashley Yakeley

Forbidden Haskell Types

Sometimes, writing Haskell is like having an argument with the compiler. You give it your reasoning, and it checks it over for flaws. And if it thinks it finds one, it will tell you all about it. You then have to look over what it told you, and figure out exactly what its complaint is. Did I just express myself badly? Or am I actually wrong? Or, very occasionally, the compiler is just being petulant and you have to work around it. Of course, if you’re really upset about GHC not accepting your perfectly reasonably argued program, you can go tell its parents.

Haskell’s type system is pretty great, but one thing it doesn’t have, that some other type systems do have, is recursive types, by which I mean, types directly constructed from themselves. Recursive types are forbidden in Haskell.

Let’s say we want to create a recursive type T = Maybe T. The official introductory Haskell answer to this is that one cannot do this directly, but one can easily create a type T that contains Maybe T, sometimes called a “newtype wrapper”. Like this:

newtype T = MkT (Maybe T)

Simple enough. But can we do better than this? Can we create a type T that is actually equal to Maybe T? Can we create the forbidden recursive type?

Let’s start with won’t work:

type T = Maybe T
Example.hs:6:1: error:
    Cycle in type synonym declarations:
      Example.hs:6:1-16: type T = Maybe T

The type keyword creates an alias for a type, and aliases cannot be recursive as they must eventually refer to an actual type.

Oddly enough, GHC does accept this:

type family T where
    T = Maybe T

Type families are strange things. They’re not quite “real” types, but neither are they mere aliases for types, as they don’t have to be fully resolved. Indeed you can declare a type family in one module, use it all over as if it were a real type, and then define the actual instances of that family in another module.

Sadly, though, GHC won’t let us actually use the instance. It rejects the obvious proof that T and Maybe T are the same type:

type family T where
    T = Maybe T

proof :: T :~: Maybe T
proof = Refl
Example.hs:10:9: error:
    • Reduction stack overflow; size = 201
      When simplifying the following type: T

Why is this? It’s because GHC can’t just accept what it’s been told, and instead endlessly tries to “reduce” (that is, expand) T = Maybe T = Maybe (Maybe T) etc, giving up when it reaches a safety limit. You can think of it as a kind of intellectual black hole, or an “infohazard” for the compiler: if it ever starts examining the strange secrets of your recursive type, it will get lost in the infinite abyss, until a built-in emergency mechanism kicks it awake after it’s twisted in the vortex 200 times (you can change this number if you want, or even let it run forever).

It turns out we can construct a type T together with a proper (non-bottom) proof value of T :~: Maybe T, we just have to be discreet. The key is to make sure GHC never comes across the unfathomable constraint T ~ Maybe T, because it will immediately become ensnared in the contemplation of its endlessness.

Instead, we prepare a lemma that is both harmless and more general, and then carefully specialise it to get the terrifying conclusion we want, using the TypeApplications extension, thus avoiding the need for any kind of inference:

type family A p q where
    A () x = Maybe (A x ())

lemma :: forall x. (A () x) :~: Maybe (A x ())
lemma  = Refl

type T = A () ()

proof :: T :~: Maybe T
proof = lemma @()

Actually constructing and deconstructing values of T will require a similar circumspection. The trick is, we don’t let GHC infer types. We musn’t let GHC think, because thinking about the bad thing leads it to madness. Instead, we apply all types by hand. Here’s the full program, including all the necessary extensions:

{-# LANGUAGE RankNTypes, TypeApplications, TypeOperators,
    TypeFamilies, UndecidableInstances, AllowAmbiguousTypes #-}
module Main where
import Data.Type.Equality

type family A p q where
    A () x = Maybe (A x ())

lemma :: forall x. (A () x) :~: Maybe (A x ())
lemma  = Refl

type T = A () ()

proof :: T :~: Maybe T
proof = lemma @()

convert1 :: forall a b. a :~: b -> a -> b
convert1 Refl = id

convert2 :: forall a b. a :~: b -> b -> a
convert2 Refl = id

tconvert1 :: T -> Maybe T
tconvert1 = convert1 @T @(Maybe T) proof

tconvert2 :: Maybe T -> T
tconvert2 = convert2 @T @(Maybe T) proof

nothing :: T
nothing = tconvert2 $ Nothing @T

just :: T -> T
just x = tconvert2 $ Just @T x

count :: T -> Int
count t = case tconvert1 t of
    Nothing -> 0
    Just t' -> succ $ count t'

t3 :: T
t3 = just $ just $ just nothing

main :: IO ()
main = putStrLn $ show $ count t3

This program defines a truly recursive type T = Maybe T, constructs a value of it, and then deconstructs that value. (It does indeed correctly print “3”.) On the other hand, it’s much more complicated than just using a newtype like we’re supposed to…

— Ashley Yakeley

Empty and Unit Types

Remember when you were in school, and your mathematics teacher told you that 00 = 1, and you got very confused? This is the type theory equivalent of that.

So what’s this all about?

It’s about three type constructors:

0 is an “empty” type. It has no values.
1 is a “unit” type. It has one value.
A → B is the type of maps from A to B, where A and B are types.

“An” empty type?

Sure, there might be more than one empty type, and more than one unit type. But in each case they’re all isomorphic (formal way of saying “basically the same”) so we’ll treat them as one empty type and one unit type.

How would you store the unit type?

The unit type has one value, so it needs zero storage space. Any more than that would be wasting space.

How would you store the empty type?

The empty type has no values, so it doesn’t matter how much storage space it requires, as you’ll never have a value of it to store. You could decide to allocate 5G for the type, if you want, and it wouldn’t cause any problems, because those allocations would never actually happen. (Although, it might cause problems with the way you allocate non-empty types constructed with the empty type.)

What about the void type?

In some languages, “Void” is used to refer to an empty type, or some approximation thereof.

In C, C++, Java, C#, etc., the keyword void refers to the zero-byte function return structure. As such it has 2560 = 1 value, so it’s a unit type. You can think of void functions as returning only one possible “value” and therefore no information. (Functions in these languages are not maps, by the way.)

By “map”, you mean function, right?

Well, a pure function that always terminates and doesn’t do anything else. A → B just means assigning a B for every A. Two values of this type are the same if they assign the same B to each A. This kind of equality is called “extensional”, basically meaning that the things look the same from the outside.

How many values do maps have?

We’ll use |A| to indicate the number of values in type A. Then we have:

|0| = 0
|1| = 1
|A → B| = |B||A|.

So for example, |1A| = |A|. And |0A| = 1.

Wait, there’s a value of 0A ?

Yes. While 0 has no values, 0A has one value. That value is called the absurd map, and it exists but never gets looked up, or called, or whatever.

How is that possible?

Supposing I told you I could give you a value of 0. Then I’d be lying, because there are no values of 0.

But what if I told you that if you give me a value of 0, I could give you anything? Then I’d be telling the truth, because you can’t give me a value of 0. That promise is the absurd map.

“If you give me a value of 0, I’ll eat my hat.” Clearly I’m not going to break this promise.

Could there be more than one value of 0A ?

If you have two of these absurd maps, you have nothing you can do with them, so you have no way of telling them apart from the outside. So they’re the same map.

Doesn’t that mean 0A is itself a unit type?

I guess?? Sure, whatever.

How would you implement the absurd map as a function?

It doesn’t matter, since such a function will never get called. You could just make it the null pointer if that’s how you represent functions, or whatever you want.

What about the type A0 ?

If A has any values, then A0 has no values. But if A has no values, then A is basically the same as 0, and A0 has one value, the absurd map.

What’s this in category theory?

There’s a category where the objects are types, and the morphisms are maps. 0 is the initial object, and 1 is the final object. A → B is the exponential object BA.

How is all this done in Haskell?

Haskell does not have any of these types. It has a type of pure functions, but they are not guaranteed to terminate. It does not have an empty type or a unit type, because non-termination and other “bottom” values are included in all types, but it does have approximations of these types.

— Ashley Yakeley

How To Write an Interpreter for a Lambda-Calculus-Based Language

Domain-specific languages are very useful, and ones that allow computation are more useful still. And as it turns out, they are not that difficult to write. Assuming you are competent in Haskell, I’ll walk you through the basic steps of writing an interpreter for such a language, and you can fill in the details.

Our language will be

  • interpreted,
  • based on the lambda calculus, with variable references, function applications, and function abstractions,
  • lexically scoped, meaning that names are resolved at their use in the source code,
  • dynamically-typed (or unityped as Robert Harper might say), meaning that there is one type for all values that is a sum of various useful types for text, numbers, lists, etc., and most importantly, functions,
  • pure, meaning that functions have no side-effects, although it’s not hard to adapt this to IO or some other monad if you want.

I won’t spend much time on syntax, but for familiarity, any examples of the language itself will look roughly like Haskell where possible.

1. Value type

You’ll need a type to represent the values your language works on, a sum of all the types of values you want to work with. Since the language is based on the lambda calculus, it will need to include a constructor for functions. For example:

data Value =
    VBool Bool |
    VText Text |
    VInt Int |
    VList [Value] |
    VFunction (Value -> Value) |
    VError String

Note that we’re handling errors as a kind of value (VError), but there are other approaches to error-handling.

2. Apply function

This is a function takes a “function” value and an “argument” value and applies them. It’ll probably look something like this:

vapply :: Value -> Value -> Value
vapply (VFunction f) a = f a
vapply (VError err) _ = VError err
vapply _ _ = VError "not a function" -- or a more informative error message

Note how we pass through errors that are being used as functions. if we want to, we could also pass through errors that are being used as arguments, making all functions “strict”.

3. The expression type

You’ll need a type for expressions. An expressions has some set of free (unbound) variables, which can then be bound in composition with other expressions. For example:

f a 3 has free variables f a

let p=q in p has free variables q

(\a -> a) 1 has no free variables

Expressions with no free variables are “closed”, those with one or more are “open”.

Here’s the magic type for expressions with lexical scope:

data Expr a = ClosedExpr a | OpenExpr Name (Expr (Value -> a))

There are lots of variations and generalisations of this type, but this is the basic form. (Name is of course the type of names of variables, probably a synonym for String or Text.) Here’s how it represents expressions, all of type Expr Value:

textvalue
4 ClosedExpr $ VInt 4
a OpenExpr "a" $ ClosedExpr $ \a -> a
f 1 OpenExpr "f" $ ClosedExpr $ \f -> vapply f (VInt 1)
f a OpenExpr "f" $ OpenExpr "a" $ ClosedExpr $ \f a -> vapply f a

— be careful about argument order

4. instance Applicative Expr

You’ll need to create Functor and Applicative instances for your expression type, which I’ll leave for you to do. It’s easy to get the argument ordering wrong, but we’ll catch this with tests. The (<*>) function gives you the “application” operation of the lambda calculus, while pure creates constant expressions.

5. var, abstract and let

You’ll need to create functions for the other lambda calculus operations:

exprVar :: Name -> Expr Value
exprAbstract :: Name -> Expr a -> Expr (Value -> a)
exprLet :: Name -> Expr Value -> Expr a -> Expr a

It’s easy to write exprLet using exprAbstract, you simply make this transformation:

let var = bind in val

(\var -> val) bind

Again, if you implement these wrong your tests will catch them.

6. Recursive let-binding

Of course, this let-binding operation doesn’t allow recursion. The definition cannot refer to the name being defined, nor can you have a set of definitions that refer to each other cyclically. This might be what you want, but if it isn’t, no worries, Haskell is a lazy language and it’s straightforward to implement recursive let-binding (letrec) in terms of sequential let-binding (letseq).

To show how, let’s first see how it works in the case of a single binding. It’s simply this transformation:

letrec var = bind in val

letseq var = fix (\var -> bind) in val

Now for a set of bindings, the idea is to gather them together into a single binding, and perform the same transformation:

letrec var0=bind0; var1=bind1; var2=bind2 in val

letseq
vars = fix (\vars -> let var0=extract0 vars; var1=extract1 vars; var2=extract2 vars in join bind0 bind1 bind2);
var0=extract0 vars;
var1=extract1 vars;
var2=extract2 vars
in val

Of course, you’ll need to determine a suitable type for vars, with corresponding join and extractN functions, paying attention to strictness and irrefutability in matching.

7. Evaluation

Evaluation is turning an expression into a resulting value. This is pretty easy: if an expression is closed, you have a result; if it’s open, then variables have been left unbound:

exprEvaluate :: Monad m => Expr a -> m a
exprEvaluate (ClosedExpr a) = return a
exprEvaluate (OpenExpr name _) = fail $ "unbound variable: " ++ name -- or list all of them instead

8. Parsing, generally

So now you’ll need to parse text into expressions. The parsec library is well-suited for this; the Parser monad basically does what you want. The main caveat is how it handles failure in choice:

p <|> q: if p fails and consumed no input, then q (otherwise propagate p ‘s failure)

try p <|> q: if p fails then q

The key is to pay attention to the first character to be parsed in each branch of a choice. If they’re all different, you don’t need to use try .

9. Parsing whitespace

First thing is to create a parser for whitespace, including comments, which we’ll call readWS . After that, you can use the convention that all parsing functions munch trailing (but not leading) whitespace. I generally find these functions useful:

readCharAndWS :: Char -> Parser ()
readCharAndWS c = do
    _ <- char c
    readWS

readStringAndWS :: String -> Parser ()
readStringAndWS s = do
    _ <- string s
    readWS

Of course, your top-level parser will also have to munch leading whitespace.

parse :: Parser (Expr Value)
parse = do
    readWS -- munch leading whitespace
    readExpression

10. Parsing identifiers

It’s up to you of course, but names of variables and other identifiers typically start with a letter, followed by zero-or-more alphanumeric characters. You might want to also include underscores and some other characters. Use the character-test functions in Data.Char and you’ll get Unicode support for free.

If you have keywords in your language then you must specifically test and exclude them from identifiers, otherwise you’re going to have a bad time.

Don’t forget to add a trailing readWS !

11. Parsing expressions

You’ll want to pay careful attention to ambiguity and precedence, with functions for parsing “loose” and “tight” expressions. For numerical and other infix expressions, you can use the notion of “terms” (which contribute to a sum) and “factors” (which contribute to a product). For example, the sum a + b * c is compose of two terms, the latter of which is a product of two factors. For example:

readExpression :: Parser (Expr Value)
readExpression = ... readTerm ...

readTerm :: Parser (Expr Value)
readTerm = ... readFactor ...

readFactor :: Parser (Expr Value)
readFactor = ... parenthetical readExpression ...

12. Predefined functions

You’ll probably want to supply a set of bindings to useful functions, that you’ll need to convert to your Value type. The easiest way to do this is to create FromValue and ToValue classes:

class FromValue t where
    fromValue :: Value -> Either String t
class ToValue t where
    toValue :: t -> Value

-- Int example instance
instance FromValue Int where
    fromValue (VInt v) = Right v
    fromValue (VError err) = Left err
    fromValue _ = Left "expected int"
instance ToValue Int where
    toValue = VInt

Then you can create a system of instances that allows easy conversions of Haskell functions.

instance FromValue Value
instance ToValue Value
instance (ToValue a,FromValue b) => FromValue (a -> b)
instance (FromValue a,ToValue b) => ToValue (a -> b)
instance FromValue Int
instance ToValue Int
instance FromValue Text
instance ToValue Text

… etc.

You can then let-bind your library of values to expressions created by the user.

13. Putting it all together

Now you have all the pieces, the pipeline looks like this:

  • Parse the user’s text to get an expression (of type Expr Value).

  • Let-bind the library of predefined values.
  • Evaluate the result to get a value (of type Value).

And that’s it!

14. Tests

You’ll want hundreds of these. I use tasty-hunit as a basic test framework. The good news is that it’s very easy to write lots, as the basic form is just a language string and an expected result (fail, or pass with this value). The only awkwardness is that you probably don’t have an instance Eq Value, because of its function constructor, so you’ll need to create a value-comparison function, or even an orphan Eq instance just for testing.

Be sure to test

  • whitespace,
  • comments,
  • boolean literals,
  • text literals,
  • numeric literals,
  • list construction,
  • parsing ambiguity,
  • function abstraction,
  • function application,
  • let-binding,
  • recursive let-binding,
  • scoping,
  • predefined functions,
  • error propagation,
  • anything else you can think of…

And if you’re really serious, you can use QuickCheck to generate language strings and expected results.

— Ashley Yakeley

With-Resource Puzzle

If you want to use a file in Haskell, you have two options. You can open and close the file yourself:

openFile :: FilePath -> IOMode -> IO Handle
hClose :: Handle -> IO ()

… or you can use a function that takes care of the open-close lifecycle:

withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

Which is the better interface? It’s a trade-off: withFile guarantees that every open file is closed, while openFile and hClose allow you to open and close files in any order.

Let’s create types to represent these two kinds of interface:

type OpenClose h = IO (h,IO ()) -- opens a resource, returns a handle and a "closer"
type With h = forall r. (h -> IO r) -> IO r

openCloseFile :: FilePath -> IOMode -> OpenClose Handle
openCloseFile path mode = do
    h <- openFile path mode
    return (h, hClose h)
    
withFile :: FilePath -> IOMode -> With Handle
-- already defined

Is it possible to convert between the two? Here’s how to obtain a With interface from an OpenClose interface:

openCloseToWith :: OpenClose h -> With h
openCloseToWith oc f = do
    (h, closer) <- oc
    finally (f h) closer

It’s also possible to obtain an OpenClose interface from a With interface. Can you see how?

withToOpenClose :: With h -> OpenClose h
withToOpenClose = ...?

(answer here)

(first posted here)

— Ashley Yakeley

Haskell Maxims and Arrows

I’ve been writing Haskell on and off since about 2001, including three years as a full-time job. This is what I’ve learnt…

  1. Haskell is the promise that you can write it as cleanly as your understanding of it. Have faith.
  2. Always be looking for patterns. Abstract them always and only when it simplifies.
  3. Persevere in getting an abstraction just right. When you find it, everything will magically fall into place.
  4. The implementation is the design.
  5. Hide whatever the caller shouldn’t care about. In particular, you can remove type parameters with appropriate quantification.
  6. There’s a reason fst3, snd3, thd3 are not in base. Triple or bigger: create a data type with fields instead.
  7. Never make an instance unless it morally follows the class’s rules.
  8. Instances of simpler classes are more valuable than of more complex generalisations.
  9. Monoid: not a flashy class, but still definitely worthwhile.
  10. Applicative: hugely underappreciated, and good for types that have “static information”. Lets you do deep magic with traverse and sequenceA.
  11. Monad: potential instances are usually easy to spot.
  12. The fewer type parameters in a class, the better. Can you turn any into associated types? Can you split the class into two classes? Can you hive off some of the parameters into a superclass?
  13. Don’t worry about strictness until it’s time to optimise.
  14. Intuition about optimisation tends to be bad. Before profiling, limit yourself to reasoning about complexity classes.
  15. An orphan instance is a very minor wart.
  16. Types themselves are weightless (i.e., erased). For example, * dropped from kind to type (with TypeInType) carries no information.
  17. You probably won’t need IORef.
  18. Template Haskell comes with a high cost in intelligibility. Sometimes it’s worth it.

— Ashley Yakeley

There is No Haskell Topology

Hey, it’s Haskell Hwednesday! For those learning Haskell who are already educated in mathematics, or vice versa, the experience can be a bit of a rollercoaster.

We first discover the category of Haskell types and functions, predictably called Hask. This is a very nice sort of category: it has a terminal object (the unit type), products (pair tuple types) and exponentiation (function types), indeed a well-known sort of category known as Cartesian closed. For good measure, it also has sums (Either types) and an initial object (any empty type). Fabulous! A programming language with the beauty and rigour of mathematics.

Then we discover that almost none of this is true. Hask is indeed a category, but it does not have products or sums, and the unit type is not a terminal object, and there is no initial object. It turns out we actually had a different category in mind, a kind of IdealHask. In our IdealHask, typically, all functions terminate. In the actual Hask, every object is a Haskell type, and every Haskell type includes “bottom values” representing non-termination and other awkward conditions. (There’s actually a faithful functor from IdealHask to Hask that allows us to write our programs.)

Oh dear. These “extra” values have ruined everything. Dan Piponi was right, you shouldn’t have even asked.

But wait! As it happens, simple non-termination can be represented by topologies, which are awesome. We associate a topology with every type, and the definable functions are exactly the continuous ones. For example, the type Bool nominally has two values, True and False. We add non-termination as a third value bottom. The topology we define on this has these open sets: {{}, {True}, {False}, {True, False}, {True, False, bottom}}. The eleven definable functions are the nine strict ones (i.e., where f bottom = bottom):

\x -> if x then True else True
\x -> if x then True else False
\x -> if x then True else bottom
\x -> if x then False else True
\x -> if x then False else False
\x -> if x then False else bottom
\x -> if x then bottom else True
\x -> if x then bottom else False
\x -> if x then bottom else bottom

and the two non-strict ones:

\_ -> True
\_ -> False

These are precisely the continuous functions from our topological space to itself. And it seems perhaps straightforward to generalise this to any type?

Yay topology! We have some of our mathematicalliness back! We think.

Ah, but Haskell types are more complicated than this. There are many different bottom values, that are included in every type. Non-termination is one, but there are also exceptions. Plus, there’s a function called mapException that can inspect exception values and arbitrarily map them to bottom values. Can we create a topology for each Haskell type such that the definable functions are exactly the continuous ones?

Sadly, we cannot. To show this, let us consider different models of Haskell’s unit type (), and what can happen when we attempt to evaluate a term of that type, which we’ll collectively refer to as possibilities. We’ll define the models as types in Haskell, specifying exactly which functions are definable. Then we’ll search exhaustively for topologies that match. You’ll need to install the countable package, which among other things includes an extensional “instance Eq (a -> b)“.

{-# LANGUAGE ScopedTypeVariables, TypeSynonymInstances, FlexibleInstances, OverlappingInstances #-}
module Main where

import Control.Applicative (liftA,liftA2,liftA3)
import Data.List (intercalate)
import Data.Countable
import Data.Searchable

First some set theory:

implies :: Bool -> Bool -> Bool
implies False _ = True
implies True x = x

type Set x = x -> Bool -- is it a member?

empty :: Set x
empty _ = False

full :: Set x
full _ = True

union :: Set x -> Set x -> Set x
union p q a = p a || q a

intersection :: Set x -> Set x -> Set x
intersection p q a = p a && q a

instance (Finite x,Show x) => Show (Set x) where
    show s = "{" ++ (intercalate "," (fmap show (filter s allValues))) ++ "}"

And this is how we do topology:

-- Given a function, find the preimage of a set.
preimage :: (x -> y) -> Set y -> Set x
preimage f s a = s (f a)

type Topology x = Set (Set x) -- set of open sets

isTopology :: (Countable x) => Topology x -> Bool
isTopology top = hasEmpty && hasFull && hasIntersections && hasUnions where
    hasEmpty = top empty
    hasFull = top full
    hasIntersections = forevery (\ (p,q) -> (top p && top q) `implies` top (intersection p q))
    hasUnions        = forevery (\ (p,q) -> (top p && top q) `implies` top (union p q))

-- Given topologies on x and y, find the set of continuous functions
continuous :: (Countable y) => Topology x -> Topology y -> Set (x -> y)
continuous topx topy f = forevery (\set -> implies (topy set) (topx (preimage f set)))

Now we need to model Haskell types in Haskell:

class (Finite x) => Model x where
    -- definable functions in this model
    definable :: Set (x -> x)

The simplest model of the unit type has only one possibility, termination with the unit value, in a language where all functions terminate. We’ll call that model U1, and its sole possibility T1. The only possible function there can be maps T1 to T1, and it is definable:

-- U1 models the unit type in a strict language
-- T1 represents the only possibility, termination with the unit value
data U1 = T1 deriving (Eq, Show)

instance Countable U1 where
    countPrevious = finiteCountPrevious
    countMaybeNext = finiteCountMaybeNext

instance Searchable U1 where
    search = finiteSearch

instance Finite U1 where
    allValues = [T1]
    assemble afb = liftA (\t a -> case a of
        T1 -> t
        ) (afb T1)

instance Model U1 where
    definable f = af (f T1) where
        af T1 = True  -- there is only one possible function, which is definable

Our second model, U2, adds a non-termination possibility N2 (termination with the unit value is T2). A function is definable if either

  • it ignores its input and behaves the same whatever it is passed
  • it examines its input and therefore doesn’t terminate when the input doesn’t terminate
-- U2 represents the unit type in a lazy language with no exceptions besides non-termination
-- T2 is termination with the unit value,
-- N2 is non-termination
data U2 = T2 | N2 deriving (Eq, Show)

instance Countable U2 where
    countPrevious = finiteCountPrevious
    countMaybeNext = finiteCountMaybeNext

instance Searchable U2 where
    search = finiteSearch

instance Finite U2 where
    allValues = [T2,N2]
    assemble afb = liftA2 (\t n a -> case a of
        T2 -> t
        N2 -> n
        ) (afb T2) (afb N2)

instance Model U2 where
    definable f = af (f T2) (f N2) where
        af T2 T2 = True  -- f _ = always termination
        af N2 N2 = True  -- f _ = always non-termination
        af T2 N2 = True  -- f x = x
        af N2 T2 = False -- "inversion", not definable

Our third model, U3, adds an “exception” possibility. An exception is a termination, but still considered by Haskell to be “bottom”. The previous rule applies, but in addition, Haskell ensures that if a function examines its input, it must yield a bottom when passed a bottom. However, it’s allowed to map exception to a different bottom using mapException.

-- U3 represents the unit type in a lazy language with an additional "exception" possibility
-- T3 is termination with the unit value,
-- E3 is termination with the exception
-- N3 is non-termination
-- (E3 and N3 are both "bottoms")
data U3 = T3 | E3 | N3 deriving (Eq, Show)

instance Countable U3 where
    countPrevious = finiteCountPrevious
    countMaybeNext = finiteCountMaybeNext

instance Searchable U3 where
    search = finiteSearch

instance Finite U3 where
    allValues = [T3,E3,N3]
    assemble afb = liftA3 (\t e n a -> case a of
        T3 -> t
        E3 -> e
        N3 -> n
        ) (afb T3) (afb E3) (afb N3)

instance Model U3 where
    definable f = af (f T3) (f E3) (f N3) where
        -- functions that ignore the argument are definable
        af t e n | n == e && n == t = True
        -- otherwise, the function must not terminate when passed non-termination:
        af _ _ T3 = False
        af _ _ E3 = False
        -- more generally, the function must return a bottom when passed a bottom: 
        af _ T3 _ = False
        -- all other functions are definable
        af _ _ _ = True

Now we want to search for topologies. Thanks to the countable package, we can do this exhaustively.

-- this topology matches if its continous functions are exactly the definable functions
matching :: (Model x) => Topology x -> Bool
matching top = (continuous top top) == definable

-- all matching topologies
goodtops :: (Model x) => [Topology x]
goodtops = filter (\top -> matching top && isTopology top) allValues

printOnLines :: (Show s) => [s] -> IO ()
printOnLines = mapM_ (putStrLn . show)

main :: IO ()
main = do
    putStrLn "Topologies for U1:"
    printOnLines (goodtops :: [Topology U1])
    putStrLn "Topologies for U2:"
    printOnLines (goodtops :: [Topology U2])
    putStrLn "Topologies for U3:"
    printOnLines (goodtops :: [Topology U3])

And that’s it. This is what we get when we run our program:

Topologies for U1:
{{},{T1}}
Topologies for U2:
{{},{T2},{T2,N2}}
{{},{N2},{T2,N2}}
Topologies for U3:

U1 has the one-point topology. U2 has the Sierpiński topologies. But there is no matching topology for U3.

So what about the actual Haskell unit type? Consider the equivalence relation R with three equivalence classes: the unit value, non-termination, and all exceptions, corresponding to the possibilities of U3. For each “definable” function U3 -> U3, we can define a Haskell function () -> () that corresponds to it under R. For “disallowed” functions, we cannot.

So if there were a topology X on the Haskell unit type such that the continuous functions were the definable ones, then X/R would be a quotient topology on U3 where the continuous functions were the definable ones. But this doesn’t exist, so that doesn’t either.