In this module, we will explore how to support a language with variables.
We’ll start with the familiar Arith language.
{-# LANGUAGE GADTSyntax #-}
import Prelude hiding ((<$>), (<$), (<*>), (<*), (*>))
import Parsing
data Arith where
Lit :: Integer -> Arith
Bin :: Op -> Arith -> Arith -> Arith
deriving (Show)
data Op where
Plus :: Op
Minus :: Op
Times :: Op
deriving (Show, Eq)
interpArith :: Arith -> Integer
Lit i) = i
interpArith (Bin Plus e1 e2) = interpArith e1 + interpArith e2
interpArith (Bin Minus e1 e2) = interpArith e1 - interpArith e2
interpArith (Bin Times e1 e2) = interpArith e1 * interpArith e2
interpArith (
lexer :: TokenParser u
= makeTokenParser $ emptyDef
lexer = ["let", "in"] }
{ reservedNames -- tell the lexer that "let" and "in" are reserved keywords
-- which may not be used as variable names
parens :: Parser a -> Parser a
= getParens lexer
parens
reservedOp :: String -> Parser ()
= getReservedOp lexer
reservedOp
reserved :: String -> Parser ()
= getReserved lexer
reserved
integer :: Parser Integer
= getInteger lexer
integer
whiteSpace :: Parser ()
= getWhiteSpace lexer
whiteSpace
identifier :: Parser String
= getIdentifier lexer
identifier
parseArithAtom :: Parser Arith
= (Lit <$> integer) <|> parens parseArith
parseArithAtom
parseArith :: Parser Arith
= buildExpressionParser table parseArithAtom
parseArith where
= [ [ Infix (Bin Times <$ reservedOp "*") AssocLeft ]
table Infix (Bin Plus <$ reservedOp "+") AssocLeft
, [ Infix (Bin Minus <$ reservedOp "-") AssocLeft
,
]
]
arith :: Parser Arith
= whiteSpace *> parseArith <* eof
arith
eval :: String -> Maybe Integer
= case parse arith s of
eval s Left _ -> Nothing
Right e -> Just (interpArith e)
We will now add variables to this language. In particular, we’re going to add let-expressions, which look something like this:
>>> let x = 4*12 in x*x + 3-x
2259
This locally defines x
as a name for the value of
4*12
within the expression x*x + 3-x
.
Substituting 48
for each occurrence of x
and
evaluating the result yields the final value of 2259
.
Haskell has let-expressions too, so you can try typing the above expression at a GHCi prompt.
We need to make two changes to the syntax of the language.
Arith expressions may now contain variables, represented as
String
s.
Arith expressions may now contain let-expressions, which have the
concrete syntax
'let' <var> '=' <arith> 'in' <arith>
where <var>
is a variable name (a
String
) and the two occurrences of
<arith>
are Arith expressions.
Add two new constructors to the definition of the
Arith
data type to represent these two new syntactic forms.
(Of course, since Arith
represents abstract
syntax, you do not need to worry about storing specific pieces of
concrete syntax such as the words let
or in
;
for a let
expression you need only store the variable name
and two Arith
expressions.)
Modify the parser to parse the given concrete syntax. A few hints:
identifier :: Parser String
to
parse a variable name.reserved :: String -> Parser ()
to parse the keywords
let
and in
. Notice how these keywords are
specified as “reserved names” in the lexer, which means they may not be
used as variable names.parseArithAtom
with
two new cases, corresponding to the two new constructors of
Arith
. To keep the code a bit cleaner, you may want to
create a new auxiliary definition called parseLet
to parse
a let-expression; or you can simply inline it into the definition of
parseArithAtom
.Be sure to test your parser on some sample inputs! For example, you might try things like
x+y-foo
let x = 4*48 in x*x + x-3
let x = 3 in let y = x*x in let z = y*y*x*x in z*z*y*y*x*x
(let baz = 19 in baz*baz) - (let foo = 12 in foo*foo)
but you should make up your own examples to try as well. All of these should parse successfully.
Now we will extend the interpreter appropriately. The presence of variables introduces two new wrinkles to our interpreter:
let x = 2 in x + y
will generate a runtime error, since
y
is undefined. (Later we will see how to make this a
compile-time error instead.)For now, you should deal with undefined variables simply by calling
the error
function with an appropriate error message, which
will make the interpreter crash. In later modules we will explore better
ways to handle runtime errors.
To keep track of the values of variables, we will use a mapping from
variable names to values, called an environment. To represent
such a mapping, we can use the Data.Map
module, which works similarly to Python dictionaries or Java’s
Map
interface. Add the following line to the imports at the
top of this file:
import qualified Data.Map as M
You can now refer to things from Data.Map
by prefixing
them with M.
, for example, M.empty
,
M.insert
, M.lookup
. (The reason for prefixing
with M
like this is that otherwise there would be conflicts
with functions from the Prelude
.)
Define an environment as a map from variable names to values:
type Env = M.Map String Integer
Now add an extra Env
parameter to
interpArith
which keeps track of the current values of
variables.
Extend interpArith
to interpret variables and
let-expressions. You will probably find the M.insert
and
M.lookup
functions useful. See the Data.Map
documentation for information on these (and other)
functions.
(Optional, just for fun): try to write down an
expression which evaluates to an integer which is longer than the
expression, that is, the integer has more digits than the number of
characters in the original expression. What is the shortest expression
you can write with this property? This was not possible with the
previous version of Arith
but it is possible now that we
have let
and variables.