We won’t spend much time in this course talking about compilers. But for this first project you will explore a very simple compiler for the Arith language.
First, download the files you will need:
ArithCompiler.lhs is the file you will edit for your project. You don’t need to worry about Parsing.hs; just download it and put it in the same directory as ArithCompiler.lhs. You should be able to use repl.it if you wish. Just upload this file along with Parsing.hs to your repl.it project and :load ArithCompiler.lhs as usual.
ArithCompiler.lhs
Parsing.hs
repl.it
:load ArithCompiler.lhs
If you are using Haskell installed on your own computer, note that depending on what version you have, you may need to start ghci with a flag to tell it to use the parsec library, like so:
ghci
parsec
ghci -package parsec ArithCompiler.lhs
First, some extensions and imports we will need for the parser; you don’t need to worry about these.
{-# LANGUAGE GADTs #-} import Prelude hiding ((<$>), (<$), (<*>), (<*), (*>)) import Parsing
Here are the data types we used to represent Arith abstract syntax in class, along with a simple interpreter.
data Op where Plus :: Op Minus :: Op Times :: Op deriving (Show, Eq) data Arith where Lit :: Integer -> Arith Bin :: Op -> Arith -> Arith -> Arith deriving (Show) interp :: Arith -> Integer interp (Lit n) = n interp (Bin op a1 a2) = interpOp op (interp a1) (interp a2) interpOp :: Op -> Integer -> Integer -> Integer interpOp Plus = (+) interpOp Minus = (-) interpOp Times = (*)
A parser has been provided for your convenience, to help you test your functions. You can use the readArith function to parse concrete Arith syntax into an AST.
readArith
readArith :: String -> Arith readArith s = case parse parseArith s of Left err -> error (show err) Right a -> a
For example, try evaluating interp (readArith "(2+3)*4"), which should result in 20. This is much more convenient than typing interp (Bin Times (Bin Plus (Lit 2) (Lit 3)) (Lit 4)).
interp (readArith "(2+3)*4")
interp (Bin Times (Bin Plus (Lit 2) (Lit 3)) (Lit 4))
Instead of compiling Arith programs to machine code, you will compile them to an abstract machine. An abstract machine is just like a real machine except for the fact that it is imaginary.
Our imaginary machine is quite simple. It keeps track of a list of instructions to execute, and a stack of integers (recall that Haskell lists can also be used as stacks). There are four instructions it knows how to execute:
PUSH n
n
ADD
SUB
MUL
Instruction
Our machine can also be in one of three states. Each state may additionally store some information.
WORKING: this state corresponds to normal operation of the machine. It should contain a list of remaining instructions to execute and a stack of integers.
WORKING
DONE: this state means there are no more instructions to execute. It should contain only the final stack.
DONE
ERROR: something has gone terribly, horribly wrong. In this state, the machine does not need to remember any instructions or stack.
ERROR
Make a data type called MachineState to represent the possible states of the machine, as described above. Each different state should contain whatever information the machine needs to remember in that state.
MachineState
Write a function step :: MachineState -> MachineState which executes a single step of the machine. For example, in the WORKING state it should try executing the next instruction and return an appropriate next state for the machine.
step :: MachineState -> MachineState
Write execute :: [Instruction] -> MachineState, which takes a program and runs the machine (starting with an empty stack) until the machine won’t run anymore (that is, it has reached a DONE or ERROR state).
execute :: [Instruction] -> MachineState
Finally, write run :: [Instruction] -> Maybe Integer, which executes the program and then returns Nothing if the machine halted with an ERROR or an empty stack, or Just the top integer on the stack if the machine successfully finished and left at least one integer on the stack.
run :: [Instruction] -> Maybe Integer
Nothing
Just
Now that you have a working abstract machine, you can compile Arith expressions into equivalent programs that run on the abstract machine.
compile
Arith
Of course, your compiler should output not just any list of instructions! It should output a program which, when run on the abstract machine, successfully produces the same integer result as the Arith interpreter would. That is, for any a :: Arith,
a :: Arith
run (compile a) == Just (interp a)
To test the above it will be convenient to write a function which finally puts some of these things together:
exec :: String -> Maybe Integer
String
run
You should now be able to test that if s is any String, then eval s == exec s.
s
eval s == exec s
To complete this project to Level 1, do the above steps completely and correctly. eval s == exec s must be true for all strings s.
To complete this project to Level 2, in addition to the requirements for Level 1:
camelCase
=
Make sure your code is simplified as much as possible, for example, without redundant pattern-matching.
Turn on {-# OPTIONS_GHC -Wall #-} and make sure your code generates no warnings.
{-# OPTIONS_GHC -Wall #-}
Write informative, grammatically correct comments explaining your code, its operation, and any choices you made along with the reasons for those choices.
Consider a different virtual machine with the following characteristics:
ACCUM n
STORE a
a
ADD a
SUB a
MUL a
You should implement:
Pay no attention to the man behind the curtain…
lexer :: TokenParser u lexer = makeTokenParser emptyDef parens :: Parser a -> Parser a parens = getParens lexer reservedOp :: String -> Parser () reservedOp = getReservedOp lexer integer :: Parser Integer integer = getInteger lexer whiteSpace :: Parser () whiteSpace = getWhiteSpace lexer parseAtom :: Parser Arith parseAtom = Lit <$> integer <|> parens parseExpr parseExpr :: Parser Arith parseExpr = buildExpressionParser table parseAtom where -- Each list of operators in the table has the same precedence, and -- the lists are ordered from highest precedence to lowest. So -- in this case '*' has the highest precedence, and then "+" and -- "-" have lower (but equal) precedence. table = [ [ binary "*" (Bin Times) AssocLeft ] , [ binary "+" (Bin Plus) AssocLeft , binary "-" (Bin Minus) AssocLeft ] ] binary name fun assoc = Infix (reservedOp name >> return fun) assoc parseArith :: Parser Arith parseArith = whiteSpace *> parseExpr <* eof eval :: String -> Maybe Integer eval s = case parse parseArith s of Left _ -> Nothing Right a -> Just (interp a)