Input
<b>Hello 1Can use string.find(), look for space.
"hello"[1:3] = "el"
"hello"[1:] = "ello""Jane Eyre".split() = ["Jane", "Eyre"]import re
re.findall(r"[0-9]", "1+2==3") = ["1", "2", "3"]r"[a-c][1-2]"
"a1", "a2", "b1", …impore re
r = r"[a-z]+|[0-9]+"
re.findall(r, "Goethe 1749") = ["oethe", "1749"][0-2] = “0|1|2”
?
import re
r = r"-?[0-9]+"
re.findall(r, "1861-1941 R. Tagore") = ["1861", "-1941"]Star, *, zero or more copies.
a+ === aa*Escape using \.
r = r"[a-z]+-?[a-z]+".: any character except new-line.[^ab]: any characters that aren’t a or b.(?:xyz)+Above is grouped, so matches:
xyz
xyzxyz
…e.g. want to match any number of any permutation of do re mi.
r = r"do+|re+|mi+"r = r"(?:do|re|mi)+"regexp = r'"(?:(?:\\.)*|[^\\])*"'edges[(1, 'a')] = 2accepting = [3]edges.edges = {(1, 'a') : 2,
         (2, 'a') : 2,
         (2, '1') : 3,
         (3, '1') : 3}
accepting = [3]
def fsmsim(string, current, edges, accepting):
    if len(string) == 0:
        return current in accepting
    letter = string[0]
    next_state = edges.get((current, letter), None)
    if next_state is None:
        return False
    return fsmsim(string[1:], next_state, edges, accepting)edges and accepting for q*?1.edges = {(1, 'q'): 1}        
accepting = [1]edges and accepting for RE:r"[a-b][c-d]?"
edges = {(1, 'a'): 2,
         (1, 'b'): 2,
         (2, 'c'): 3,
         (2, 'd'): 3}
accepting = [2, 3]regexp = r'[0-9]+(?:[0-9]|-[0-9])*'fsmsim function can handle DFAs.re.findall(), there are equivalent problems that do not use re.findall().
fsmsim(). This gives you re.match(), only one search.re.findall():s1 = "12+34"
fsmsim() for '[0-9]+'.
call fsmsim("1"), it matches.
call fsmsim("2"), it matches.
call fsmsim("12+"), it doesn't match. Hence one 'token' is '12', and advance input to '3'.
call fsmsim("3"), it matches.
call fsmsim("4"), it matches.
end of string.
result is ["12", "34"].Given this fragment
Wollstonecraft</a>Want the following output
word                 Wollstonecraft
start of closing tag </
word                 a
end of closing tag   >
word                 wrotee.g.
 LANGLE        <
 LANGLESLASH   </
 RANGLE        >
 EQUAL         =
 STRING        "google.com"
 WORD          Welcome!Names of tokens are arbitrary, but would like them to be uppercase.
def t_RANGLE(token):
    r'>' # I am a regexp!
    return token # return text unchanged, but can transform it.
     
def t_LANGLESLASH(token):
    r'/>'
    return tokendef t_NUMBER(token):
    r'[0-9]+'
    token.value = int(token.value)
    return tokendef t_STRING(token):
    r'"[^"]*"'
    return tokendef t_WHITESPACE(token):
    r' '
    passAnd if we define a word as any number of characters except <, >, or space, leaving the value unchanges:
def t_WORD(token):
    r'[^<> ]+'
    return tokendef t_STRING(token):
    r'"[^"]*"'
    token.value = token.value[1:-1]
    return tokenMaking a lexer
import ply.lex as lex
tokens = (
    'LANGLE',        # <
    'LANGLESLASH',   # </
    'RANGLE',        # >
    'EQUAL',         # =
    'STRING',        # ".."
    'WORD'           # dada
)   
t_ignore = ' ' # shortcut for whitespace
# note this is before t_LANGLE, want it to win
def t_LANGLESLASH(token):
    r'</'
    return token
    
def t_LANGLE(token):
    r'<'
    return token
    
def t_RANGLE(token):
    r'>'
    return token
    
def t_EQUAL(token):
    r'='
    return token
   
def t_STRING(token):
    r'"[^"]*"'
    token.value = token.value[1:-1]
    return token
    
def t_WORD(token):
    r'[^ <>]+'
    return token
    
webpage = "This is <b>my</b> webpage!"
htmllexer = lex.lex()
htmllexer.input(webpage)
while True:
    tok = htmllexer.token()
    if not tok: break
    print tokLexToken(TYPE, line, character). Indicate line and character on that line.def t_newline(token):
    r'\n'
    token.lexer.lineno += 1
    pass\n from t_WORD, just like we’re currently ignoring spaces.<!--, end with -->How to add to lexer.
states = (
    ('htmlcomment', 'exclusive'),
)If we are in the state htmlcomment we cannot be doing anything else at the same time, like looking for strings or words.
def t_htmlcomment(token):
    r'<!--'
    token.lexer.begin('htmlcomment')
    
def t_htmlcomment_end(token):
    r'-->'
    token.lexer.lineno += token.value.count('\n')
    token.lexer.begin('INITIAL')
    
def t_htmlcomment_error(token):
    token.lexer.skip(1)INITIAL just means whatever you were going before coming into this state, i.e. htmlcomment.pass we’re actually gathering up all the characters that resulted in the error, so that we can subsequently count for newlines.def t_identifier(token):
    r'[A-Za-z][A-Za-z0-9_]+'
    return tokendef t_NUMBER(token):
    r'-?[0-9]+(?:\.[0-9]*)?'
    token.value = float(token.value)
    return tokenComments to the end of the line in JavaScript.
def t_eolcomment(token):
    r'//[^\n]*'
    passply library
ply can use reflection to get them.e.g.
sentence
-> subject verb
-> students verb
-> students thinkCan perform multiple derivations, e.g.
sentence
-> subject verb
-> subject write
-> teachers write    Add just one rule, gives phenomenal power!
Sentence -> Subject Verb
Subject -> students
Subject -> teachers
Subject -> Subject and Subject
Verb -> think
Verb -> writeFormally, the number of strings is countably infinite.
Arithmetic grammar example:
Exp -> Exp + Exp
Exp -> Exp - Exp
Exp -> numbere.g. number number is not valid, number + number - number is valid.
Valid in grammar == is in the language of the grammar.
Word rules + sentence rules = creativity!
Exp -> Exp + Exp
Exp -> Exp - Exp
Exp -> Number
and
def t_NUMBER(token):
    r'[0-9]+'
    token.value = int(token.value)
    return tokenCan now check for valid sequence of tokens.
1 + 2, good
7 + 2 - 2, good
- - 2, bad.    Stmt -> identifier = Exp
    Exp -> Exp + Exp
    Exp -> Exp - Exp
    Exp -> number
    lata = 1, good
    lata = lata + 1, badCan specify two rewrite rules for the same non-terminal, where one of them goes to epsilon, i.e. the empty string.
Sentence -> OptionalAdjective Subject Verb
Subject -> william
Subject -> tell
OptionalAdjective -> accurate
OptionalAdjective -> \epsilon
Verb -> shoots
Verb -> bows
8 possible utterances!Grammars can encode regular languages.
number - r'[0-9]+'
Number -> Digit MoreDigits
MoreDigits -> Digit MoreDigits
MoreDigits -> \epsilon
Digit -> 0
Digit -> 1
…
Digit -> 9
Number
-> Digit MoreDigits
-> Digit Digit MoreDigits
-> Digit Digit \epsilon
-> Digit 2
-> 42Grammar >= Regexp
regexp = r'p+i?' # e.g. p, pp, pi, ppi
Regexp -> Pplus Iopt
Pplus -> p Pplus
Pplus -> p
Iopt -> i
Iopt -> \epsilonContext-free grammars describe context-free languages.
A -> B
xyzAxyz -> xyzBxyzHere are three different regular expression forms, and equivalent context-free grammars.
r'ab'   => G -> ab
r'a*'   => G -> \epsilon
           G -> aG
r'a|b'  => G -> a
        => G -> bBut regular languages != context-free languages.
Consider:
P -> ( P )
P -> \epsilon  Let’s try:
r'\(*\)*'But it doesn’t match parentheses :(.
We want:
(^N )^NBut all we can write is:
(* )*e.g. “I saw Jane Austen using binoculars”.
1 - 2 + 3
could be 2 or -4!A grammar is ambiguous if there is at least one string in the grammar that has more than one different parse tree.
Parentheses can come to the ( Rescue ).
exp - exp + exp
exp -> exp - exp
exp -> number
exp -> ( exp )    <b>Welcome to <i>my</i> webpage!</b>
    
    Html -> Element Html
    Html -> \epsilon
    Element -> word
    Element -> TagOpen Html TagClose
    TagOpen -> < word >
    TagClose -> </ word >def absval(x):
    if x < 0:
        return 0 - x
    else:
        return xfunction absval(x) {
    if x < 0 { 
        return 0 - x;
    } else {
        return x;
    }
}JavaScript uses braces to signify lexical scope. Python uses indentation.
In Python
print "hello" + "!"In JavaScript:
document.write("hello" + "!")
or
write("hello" + "!")All JavaScript function calls require brackets.
Partial grammar for JavaScript:
    Exp -> identifier
    Exp -> TRUE
    Exp -> FALSE
    Exp -> number
    Exp -> string
    Exp -> Exp + Exp
    Exp -> Exp - Exp
    Exp -> Exp * Exp
    Exp -> Exp / Exp
    Exp -> Exp < Exp
    Exp -> Exp == Exp
    Exp -> Exp && Exp
    Exp -> Exp || Exp
    Statements ~= Sentences
-> j = 3;
Statement -> identifier = Exp
Statement -> return Exp
Statement -> if Exp CompoundStatement
Statement -> if Exp CompoundStatement else CompoundStatement
CompoundStatement -> { Statements }
Statements -> Statement; Statements
Statements -> \epsilonCompoundStatement could also be Statement without braces?
CompoundStatement or Statement in Statement?JavaScript program the same!
   Js -> Element Js
   Js -> \epsilon
   Element -> function identifier ( OptParams ) CompoundStatement // function definition
   Element -> Statement;
   OptParams -> Params
   OptParams -> \epsilon
   Params -> identifier, Params
   Params -> identifierThis is a cute property of Context-Free Grammars.
   Exp -> … // as before
   Exp -> identifier( OptArgs ) // function call
   OptArgs -> Args
   OptArgs -> \epsilon
   Args -> Exp, Args
   Args -> Expsin(x), but then call it as sin(50, 60).(1 + (2 + 3))1 + + + ) 3.Lambda (make me a function, anonymous function)
   def addtwo(x): return x+2
   addtwo(2) # = 4
   mystery = lambda(x): x+2
   mystery(3) # = 5
   pele = mystery
   pele(4) # = 6def mysquare(x): return x*x
map(mysquare, [1,2,3,4,5]) # = [1,4,9,16,25]
map(lambda(x): x*x, [1,2,3,4,5]) # same!
[x*x for x in [1,2,3,4,5] # same!def odds_only(numbers):
    for n in numbers:
        if n % 2 == 1:
            yield nyield: not return! A generator.[x for x in [1,2,3,4,5] if x % 2 == 1]Python program to check a string is in a grammar.
Exp -> Exp + Exp
Exp -> Exp - Exp
Exp -> ( Exp )
Exp -> num
grammar = [
    ("Exp", ["Exp", "+", "Exp"]),
    ("Exp", ["Exp", "-", "Exp"]),
    ("Exp", ["(", "Exp", ")"],
    ("Exp", ["num"]),
]Given e.g. print exp;
utterance = ["print", "exp", ";"]
into:
["print", "exp", "-", "exp", ";"]
pos = 1
result = utterance[0:pos] + rule[1] + utterance[pos+1:]e.g.
    start with "a exp"
    with depth 1, get:
    "a exp + exp"
    "a exp - exp"
    "a (exp)"
    "a num"Let’s code it up:
grammar = … (as above)
def expand(tokens, grammar):
    for i, token in enumerate(tokens):
        for (rule_lhs, rule_rhs) in grammar:
            if token == rule_lhs:
                result = tokens[0:i] + rule_rhs + tokens[i+1:]
                yield result
depth = 2
utterances = [["exp"]]
for x in xrange(depth):
    for sentence in utterances:
        utterances = utterances + [ i for i in expand(sentence, grammar)]
for sentence in utterances:
    print sentenceFor countably infinite grammars this is pretty useless!
S -> (S)
S -> \epsilon
Is '(()' in grammar?def memofibo(n, chart = None):
    if chart is None:
        chart = {}
    if n <= 2:
        chart[n] = 1
    if n not in chart:
        chart[n] = memofibo(n-1, chart) + memofibo(n-2, chart)
    return chart[n]And will need more than one finger!
S -> E
E -> E + E
E -> E - E
E -> 1
E -> 2
input = 1 + 21 +, where am I?Example of a parsing state.
If the red dot ends up on the right of the start symbol’s rule, you’ve parsed the string! i.e.
S -> E <dot>A parsing state is a rewrite rule from the grammar augmented with one red dot on the right-hand side of the rule.
    Input: 1 +
    State: E -> 1 + <dot> E
    E -> 1 + E is not a rule from the grammar.parse([t_1, T_2, …, t_n, …, t_last])chart[N] = all parse states we could be in after seeing t_1, t_2, …, t_n only!e.g.
E -> E + E
E -> int
Input = int + int
chart[0] =
    [E -> <dot> E + E,
     E -> <dot> int]
chart[1] = 
    [E -> int <dot>,
     E -> E <dot> + E]
chart[2] =
    [E -> E + <dot> E]We’ll need to keep track of one extra piece of information: how many tokens we’ve seen so far.
E -> E + E
E -> int
Input = int + int
chart[0] =
    [E -> <dot> E + E,
     E -> <dot> int - seen 0]
chart[1] = 
    [E -> int <dot>,
     E -> E <dot> + E]
chart[2] =
    [E -> <dot> int - seen 2,
     E -> E + <dot> E]Because we want to parse. Parsing is the inverse of producing strings.
int + int
E + int        # apply E -> int
E + E          # apply E -> int
E              # apply E -> E + EGenerating is going up.
If you build the chart, you have solved parsing!
S -> E
E -> …
S -> E <dot> - starting at 0 => we've parsed it.
# We want to be in this state!If inputs is T tokens long:
S -> E <dot> start at 0 in chart[T]If we can build the chart, and the above is true, then the string is in the language of the CFG.
Start:
chart[0], S -> <dot> E from 0End:
chart[T], S -> E <dot> from 0.Suppose:
S -> E + <dot> E, from j, in chart[i] (seen i tokens)Need to find all rules that go to E and “bring them in”.
Let’s say:
chart[i] has X -> ab <dot> cd, from j.For all grammar rules:
c -> pqrWe add:
c -> <dot> pqr, from iTo chart[i].
Suppose:
    E -> E - E
    E -> (F)
    E -> int
    F -> string
    
    Input: int - int
    Seen 2 tokens so far
 
    chart[2] has E -> E - <dot> E, from 0Then the result of computing the closure:
    E -> <dot> int from 2
    E -> <dot> (F) from 2
    E -> <dot> E - E from 2
    The following are not in the result:
    E -> <dot> E - E from 0 # wrong from
    F -> <dot> string from 2 # wrong LHSShifting, aka consuming the input, is another method.
Recall parsing state:
   X -> ab <dot> cd, from j in chart [i]If c is terminal, => shift (i.e. consume the terminal).
  X -> abc <dot> d from j into chart [i+1]i tokens, and the i+1th token is c, a terminal.We are not updating from, because that’s where we’ve come from.
    x -> ab <dot> cd
    c is non-terminal, => closurec is terminal, => shiftcd is \epsilon, i.e. nothing. => reduce.
Reduction: apply rewrite rules / productions in reverse.
E -> E + E
E -> int
<dot> int + int + int
int <dot> + int + int
# magical reduction!
E <dot> + int + int
E + <dot> int + int
E + int <dot> + int
# magical reduction!
E + E <dot> + int
# magical reduction!
E <dot> + int
E + <dot> int
...But how to apply reductions?
    E -> E + E <dot> from B in chart [A]
    We’ve seen inputs up to B and are about to encounter E + E:
input_1 input_2 … input_B | E + EIt’s as if we saw the LHS at this point:
input_1 input_2 … input_B | EWhere did we come from? Suppose chart[B] has:
E -> E - <dot> E from CSo add:
E -> E - E <dot> from C to chart[A]Example!
T -> aBc
B -> bb
input: abbc
N = 0
chart[0]
    T -> <dot>aBc, from 0
N = 1, a
chart[1]
    # shift
    T -> a <dot>Bc, from 0
    # and we see a non-terminal, so bring in closure
    B -> <dot>bb, from 1
N = 2, ab
    # shift
    B -> b<dot>b, from 1
N = 3, abb
    # shift
    B -> bb<dot>, from 1
    # - red dot at end of rule, so reduce.
    # - came from state 1.
    # - Does anyone in state 1 want to see B? 
    # - Yes! T -> a<dot>Bc is looking for one.    
    # - So transplant that rule here
    T -> aB<dot>c, from 0        Adding state to chart:
def addtochart(chart, index, state):
    if not state in chart[index]:
        chart[index] = [state] + chart[index]
        return True
    else:
        return FalseGrammar:
    S -> P
    P -> (P)
    P ->
    In Python:
grammar = [
    ("S", ["P"]),
    ("P", ["(", "P", ")"]),
    ("P", []),
] Parser state:
    X -> ab<dot>cd from j
    In Python:
state = ("x", ["a", "b"], ["c", "d"], j)chart[0].chart[n] for n tokens to see if we’re in the final state.chart[i], we see x -> ab<dot>cd from j.We’ll call:
next_states = closure(grammar, i, x, ab, cd, j)
for next_state in next_states:
    any_changes = addtochart(chart, i, next_state)
                  or any_changesWhat is closure()?
def closure(grammar, i, x, ab, cd, j):
    next_states = [
        (rule[0], [], rule[1], i)
        for rule in grammar
        if len(cd) > 0 and
           rule[0] == cd[0]
    ]
    return next_stateschart[i] and we see X -> ab<dot>cd from j.tokens.next_state = shift(tokens, i, x, ab, cd, j)
if next_state is not None:
    any_changes = addtochart(chart, i+1, next_state)
                  or any_changesshift()?def shift(tokens, i,x, ab, cd, j):
    if len(cd) > 0 and tokens[i] == cd[0]:
        return (x, ab + [cd[0]], cd[1:], j)
    else:
        return Nonechart[i], we see X -> ab<dot>cd from j.next_states = reductions(chart, i, x, ab, cd, j)
for next_state in next_states:
    any_changes = addtochart(chart i, next_state)
                  or any_changes
                  
def reductions(chart, i, x, ab, cd, j):
    # x -> ab<dot> from j
    # chart[j] has y -> ... <dot>x ... from k
    return [
        (jstate[0],
         jstate[1] + [x],
         jstate[2][1:],
         jstate[3])
        for jstate in chart[j]
        if len(cd) > 0 and
           len(jstate[2]) > 0 and
           jstate[2][0] == x
    ]# see notes/src/programming_languages/ps4_parser.py
# above has closure, shift, and reductions in-lined.
def parse(tokens, grammar):
    tokens = tokens + ["end_of_input_marker"]
    chart = {}
    start_rule = grammar[0]
    for i in xrange(len(tokens) + 1):
        chart[i] = []
    start_state = (start_rule[0], [], start_rule[1], 0)
    chart[0] = [start_state]
    for i in xrange(len(tokens)):
        while True:
            changes = False
            for state in chart[i]:
                # State === x -> ab<dot>cd, j
                (x, ab, cd, j) = state
                
                # Current state == x -> ab<dot>cd, j
                # Option 1: For each grammar rule
                # c -> pqr (where the c's match)
                # make a next state:
                #
                # c -> <dot>pqr, i
                #
                # English: We're about to start
                # parsing a "c", but "c" may be
                # something like "exp" with its
                # own production rules. We'll bring
                # those production rules in.
                next_states = closure(grammar, i, x, ab, cd, j)
                for next_state in next_states:
                    changes = addtochart(chart, i, next_state) or changes
                    
                # Current State == x -> ab<dot>cd, j
                # Option 2: If tokens[i] == c,
                # make a next state:
                #
                # x -> abc<dot>d, j
                #
                # £nglish: We're looking for a parse
                # token c next and the current token
                # is exactly c! Aren't we lucky!
                # So we can parse over it and move
                # to j+1.
                next_state = shift(tokens, i, x, ab, cd, j)
                if next_state is not None:
                    any_changes = addtochart(chart, i+1, next_state) or any_changes
                    
                # Current state == x -> ab<dot>cd, j
                # Option 3: if cd is [], the state is
                # just x -> ab<dot>, j
                # For each p -> q<dot>xr, l in chart[j]
                # Make a new state:
                #
                # p -> qx<dot>r, l
                #
                # in chart[i].
                #
                # English: We've just finished parsing
                # an "x" with this token, but that
                # may have been a sub-step (like
                # matching "exp->2" in "2+3"). We
                # should update the higher-level
                # rules as well.
                next_states = reductions(chart, i, x, ab, cd, j)
                for next_state in next_states:
                    changes = addtochart(chart, i, next_state) or changes
                    
        if not changes:
            break
    accepting_state = (start_rule[0], start_rule[1], [], 0)
    return accepting_state in chart[len(tokens)-1]
result = parse(tokens, grammar)
print result# tokens
def t_STRING(t):
    r'"[^"]*"'
    t.value = t.value[1:-1]
    return t
    
# parsing rules
def p_exp_number(p):
    'exp : NUMBER' # exp -> NUMBER
    p[0] = ("number", p[1])
    # p[0] is returned parse tree
    # p[0] refers to exp
    # p[1] refers to NUMBER.
    
def p_exp_not(p):
    'exp : NOT exp' # exp -> NOT exp
    p[0] = ("not", p[2])
    # p[0] refers to exp
    # p[1] refers to NOT
    # p[2] refers to expp: parse treesdef p_html(p):
    'html : elt html'
    p[0] = [p[1]] + p[2]
    
def p_html_empty(p):
    'html : '
    p[0] = []
    
def p_elt_word(p):
    'elt : WORD'
    p[0] = ("word-element", p[1])def p_elt_tag(p):
    # <span color="red">Text!</span>:
    'elt : LANGLE WORD tag_args RANGLE html LANGLESLASH WORD RANGLE'
    p[0] = ("tag-element", p[2], p[3], p[5], p[7])def p_exp_binop(p):
    """exp : exp PLUS exp
           | exp MINUS exp
           | exp TIMES exp"""
    p[0] = ("binop", p[1], p[2], p[3])1 - 3 - 5
(1-3)-5 = -71-(3-5) = 3def p_exp_call(p):
    'exp : IDENTIFIER LPAREN optargs RPAREN'
    p[0] = ("call", p[1], p[3])def p_exp_number(p):     
    'exp : NUMBER'
    p[0] = ("number", p[1])precedence = (
    # lower precedence at the top
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIVIDE'),
    # higher precedence at the bottom 
)def p_exp_call(p):
    'exp : IDENTIFIER LPAREN optargs RPAREN'
    p[0] = ("call", p[1], p[3])
    
def p_exp_number(p):
    'exp : NUMBER'
    p[0] = ("number", p[1])
    
def p_optargs(p):
    """optargs : exp COMMA optargs 
               | exp
               | """
    if len(p) == 1:
        p[0] = []
    elif len(p) == 2:
        p[0] = [p[1]]
    else:
        p[0] = [p[1]] + p[3]
        
# or can separate out parsing rules in OR statement
# into its own function. separate rules give better
# performance, as the parser has done all of your 
# len() work for you.What is in chart[2], given:
    S -> id(OPTARGS)
    OPTARGS ->
    OPTARGS -> ARGS
    ARGS -> exp,ARGS
    ARGS -> exp
    input: id(exp,exp)
    chart[0]
        S -> <dot>id(OPTARGS)$, from 0
        
    chart[1]
        # shift
        S -> id<dot>(OPTARGS)$, from 0
        
    chart[2]
        # shift
        S -> id(<dot>OPTARGS)$, from 0
        
        # OPTARGS could be epsilon, hence
        # in one world:
        S -> id(OPTARGS<dot>)$, from 0
        
        # In another world we see OPTARGS
        # and it isn't epsilon, so we closure.
        OPTARGS -> <dot>ARGS, from 2
        OPTARGS -> <dot>, from 2
        
        # !!AI I think by recursion we apply closure to ARGS; reminiscent of epsilon-closure during DFA->NFA conversion.
        ARGS -> <dot>exp,ARGS from 2
        ARGS -> <dot>exp from 2Programming examples
1 + 2 # = 3
"hello" + " world" # = "hello world"
1 + "hello" # ???len for string vs. list.+ for numbers, strings, and lists.Interpreting by walking a parse tree.
("word-element", "Hello")
("tag-element", "b", ..., "b")
("javascript-element", "function fibo(N) { ...")
# Embedded JavaScript in HTML.re for regexps, ply for lexing and parsing, timeit for benchmarking.graphics.word(string)
# draw on screen
graphics.begintag(string, dictionary)
# doesn't draw, just makes a note. like changing pen colours.
# dictionary passes in attributes, e.g. href.
graphics.endtag()
# most recent tag.
graphics.warning(string)
# debugging, in bold red color.Example:
Nelson Mandela <b>was elected</b> democratically.
# how this calls into graphics API
graphs.word("Nelson")
graphics.word("Mandela")
graphics.begintag("b", {})
graphics.word("was")
graphics.word("elected")
graphics.endtag("b")
graphics.word("democratically.")Interpret code.
import graphics
def interpret(trees): # Hello, friend
    for tree in trees: # Hello,
        # ("word-element","Hello")
        nodetype=tree[0] # "word-element"
        if nodetype == "word-element":
            graphics.word(tree[1])
        elif nodetype == "tag-element":
            # <b>Strong text</b>
            tagname = tree[1] # b
            tagargs = tree[2] # []
            subtrees = tree[3] # ...Strong Text!...
            closetagname = tree[4] # b
            # QUIZ: (1) check that the tags match
            # if not use graphics.warning()
            if tagname != closetagname:
                graphics.warning("Mismatched tag. start: '%s', end: '%s'" % (tagname, closetagname))
            else:
                #  (2): Interpret the subtree
                # HINT: Call interpret recursively
                graphics.begintag(tagname, {})
                interpret(subtrees)
                graphics.endtag()word-element - done.tag-element - done.javascript-element - not done.
e.g.
input: (1*2) + (3*4)eval_exp.Code:
def eval_exp(tree):
    # ("number" , "5")
    # ("binop" , ... , "+", ... )
    nodetype = tree[0]
    if nodetype == "number":
        return int(tree[1])
    elif nodetype == "binop":
        left_child = tree[1]
        operator = tree[2]
        right_child = tree[3]
        # QUIZ: (1) evaluate left and right child
        left_value = eval_exp(left_child)
        right_value = eval_exp(right_child)
        
        # (2) perform "operator"'s work
        assert(operator in ["+", "-"])
        if operator == "+":
            return left_value + right_value
        elif operator == "-":
            return left_value - right_valuedef env_lookup(environment, variable_name):
    ...def eval_exp(tree, environment):
    nodetype = tree[0]
    if nodetype == "number":
        return int(tree[1])
    elif nodetype == "binop":
        # ...
    elif nodetype == "identifier":
        # ("binop", ("identifier","x"), "+", ("number","2"))
        # QUIZ: (1) find the identifier name
        # (2) look it up in the environment and return it
        return env_lookup(environment, tree[1])if, while, return change the flow of control.
2+3 or x+1.def eval_stmts(tree, environment):
    stmttype = tree[0]
    if stmttype == "assign":
        # ("assign", "x", ("binop", ..., "+",  ...)) <=== x = ... + ...
        variable_name = tree[1]
        right_child = tree[2]
        new_value = eval_exp(right_child, environment)
        env_update(environment, variable_name, new_value)
    elif stmttype == "if-then-else": # if x < 5 then A;B; else C;D;
        conditional_exp = tree[1] # x < 5
        then_stmts = tree[2] # A;B;
        else_stmts = tree[3] # C;D;
        # QUIZ: Complete this code
        # Assume "eval_stmts(stmts, environment)" exists
        if eval_exp(conditional_exp, environment):
            return eval_stmts(then_stmts, environment)
        else:
            return eval_stmts(else_stmts, environment)    Python:
        x = 0
        print x + 1
    JavaScript:
        var x = 0
        write(x+1)Can have multiple values in different contexts.
x = "outside"
def myfun(x):
    print x
myfun("inside")
# get "inside"def env_lookup(var_name, env):
    # env = (parent, dictionary)
    if var_name in env[1]:
        # do we have it?
        return (env[1])[var_name]
    elif env[0] is None:
        # am global?
        return None
    else:
        # ask parents
        return env_lookup(var_name, env[0])
def env_update(var_name, value, env):
    if var_name in env[1]:
        # do we have it?
        (env[1])[var_name] = value
    elif not (env[0] is None):
        # if not global, ask parents.
        env_update(var_name, value, env[0])def mean(x):
    return x
    print "one thousand and one nights"try, except.def eval_stmt(true, environment):
    stmttype = tree[0]
    if stmttype == "return":
        return_exp = tree[1] # return 1 + 2
        retval = eval_exp(return_exp, environment)
        raise Exception(retval)def eval_stmt(tree,environment):
    stmttype = tree[0]
    if stmttype == "call": # ("call", "sqrt", [("number","2")])
        fname = tree[1] # "sqrt"
        args = tree[2] # [ ("number", "2") ]
        fvalue = env_lookup(fname, environment)
        if fvalue[0] == "function":
            # We'll make a promise to ourselves:
            # ("function", params, body, env)
            fparams = fvalue[1] # ["x"]
            fbody = fvalue[2]
            fenv = fvalue[3]
            if len(fparams) <> len(args):
                print "ERROR: wrong number of args"
            else:
                #QUIZ: Make a new environment frame
                newfenv = (fenv, {})
                for param, value in zip(fparams, args):
                    newfenv[1][param] = None
                    eval_value = eval_exp(value, environment)
                    env_update(param, eval_value, newfenv)
                try:
                    # QUIZ : Evaluate the body
                    eval_stmts(fbody, newfenv)
                    return None
                except Exception as return_value:
                    return return_value
        else:
            print  "ERROR: call to non-function"
    elif stmttype == "return": 
        retval = eval_exp(tree[1],environment) 
        raise Exception(retval) 
    elif stmttype == "exp": 
        eval_exp(tree[1],environment) In Python and JavaScript functions can be values. Hence we must represent function values.
def myfun(x):
    return x+1
function myfun(x) {
    return x+1;
}
("function", fparams, fbody, fenv)fenv.Code:
def eval_elt(tree, env):
    elttype = tree[0]
    if elttype == "function":
        fname = tree[1]
        fparams = tree[2]
        fbody = tree[3]
        fvalue = ("function", fparams, fbody, env)
        add_to_env(env, fname, fvalue)Can use JavaScript to simulate any Python program.
x = 0
while True:
    x = x + 1
print xhalts() which takes a procedure as an argument and returns True if that procedure halts and False if it loops forever.def tsif():
    if halts(tsif):
        x = 0
        while True:
            x = x + 1
    else:
        return 0tsif halts, then it loops forever.tsif loops forever, then it halts.Contradiction, hence halts() cannot exist.
explode(), aka Python’s string.split(), assigns to local variables and trusts user to be friendly. Nope!write().write().write() output, and calls graphics library.5<7 or a>b is valid JavaScript, but would confuse an HTML lexer.def t_javascript(token):
    r'\<script\ type=\"text\/javascript\"\>'
    token.lexer.code_start = token.lexer.lexpos
    token.lexer.begin('javascript')
    # note that lexpos is such that we've already
    # stripped off the initial text/javascript part.
def t_javascript_end(token):
    r'\<\/script\>' # </script>
    token.value = token.lexer.lexdata[token.lexer.code_start:token.lexer.lexpos-9]
    token.type = 'JAVASCRIPT'
    token.lexer.lineno += token.value.count('\n')
    token.lexer.begin('INITIAL')
    return token
    # note that lexdata is such that we need to
    # manually strip off </script> def p_element_word(p):
    'element : WORD'
    p[0] = ("word-element", p[1])
    # p[0] is the parse tree
    # p[1] is the child parse treedef p_element_javascript(p):
    'element : JAVASCRIPT'
    p[0] = ("javascript-element", p[1])JAVASCRIPT in the parser is the same as token.type in the lexer. This is intentional: this is the link between the lexer and the parser.HTML input:
    hello my
    <script type="text/javascript">document.write(99);</script>
    luftballons
Parse tree:
[("word-element", "hello"),
 ("word-element", "my"),
 ("javascript-element", "document.write(99)"),
 ("word-element", "luftballons")]def interpret(trees):
    for tree in trees:
        treetype = tree[0]
        if treetype == "word-element":
            graphics.word(node[1])
        # covered HTML tags in another quiz...
        elif tree.type == "javascript-element":
            jstext = tree[1] # "document.write(55);"
            # jstokens is an external module
            jslexer = lex.lex(module=jstokens)
            # jsgrammar is another external module
            jsparser = yacc.yacc(module=jsgrammar)
            # jstree is a parse tree for JavaScript
            jstree = jsparser.parse(jstext, lexer=jslexer)
            # We want to call the interpreter on our AST
            result = jsinterp.interpret(jstree)
            graphics.word(result) document.write() more than once, but we still want to only return one string from the jsinterp.interpret() call.write appends to the special “javascript output” variable in the global environment.def interpret(trees):
    # recall env = (parent, dictionary), and as this is the global environment the parent pointer is None
    global_env = (None, {"javascript output": ""})
    for elt in trees:
        eval_elt(elt, global_env)
    return (global_env[1])["javascript output"]"javascript output", in particular the space, is important; the user is not allowed to use a space in an identifier so they can’t ever collide with this.def eval_exp(tree, env):
    exptype = tree[0]
    if exptype == "call":
        fname = tree[1] # myfun in myfun(a,3+4)
        fargs = tree[2] # [a,3+4] in myfun(a,3+4)
        fvalue = envlookup(fname,env) # None for "write"; built-in
        if fname == "write":
            argval = eval_exp(fargs[0],env)
            output_sofar = env_lookup("javascript output",env)
            env_update("javascript output", \
                output_sofar + str(argval), env)
            return Nonefunction factorial(n) {
    if (n == 0) {
        return 1;
    }
    return n * factorial(n-1);
}
document.write(1260 + factorial(6));factorial, so 7.n.def eval_stmt(tree,environment)” above).Software maintenance (ie.e. testing, debugging, refactoring) carries a huge cost.
When comparing a function’s output to the expected output, code read and see what features of our interepeter we’re using. If you’re not using it, you’re not testing it!
def env_lookup(vname,env):
    # env = (parent-poiner, {"x": 22, "y": 33})
    if vname in env[1]:
        return (env[1])[vname]
    else: # BUG
        return None # BUG var a = 1;
function mistletoe(baldr) {
    baldr = baldr + 1;
    a = a + 2;
    baldr = baldr + a;
    return baldr;
}
write(mistletoe(5));return baldr, it works!greeting = "hola"
def makegreeter(greeting):
    def greeter(person):
        print greeting + " " + person
    return greeter
sayhello = makegreeter("hello")
sayhello("gracie")var greeting = "hola";
function makegreeter(greeting) {
    var greeter = function(person) {
        write(greeting + " " + person);
    }
    return greeter;
}
var sayhello = makegreeter("hello");def eval_exp(tree,env):
    exptype = tree[0]
    # function(x,y) { return x+y }
    if exptype == "function":
        # ("function", ["x","y"], [ ("return", ("binop", ...) ])
        fparams = tree[1]
        fbody = tree[2]
        return ("function", fparams, fbody, env)
        # "env" allows local functions to see local variables
        # can see variables that were in scope *when the function was defined*return ("function", fparams, fbody, global_env)function factorial(n) {
    if (n == 0) { return 1; }
    return 1 * n * factorial(n-1);
}1 a lot.1 \* n can be replaced by n.Smaller AST, faster recursive walk, fewer multiplications.
Think of optimizations
x \* 1 == x
x + 0 == xTransform parse tree
x/x can’t be optimized to 1 because if x=0 then raises an exception, and we want to keep the same semantics after optimization.def optimize(tree):
    etype = tree[0]
    if etype == "binop": # a * 1 = a
        a = tree[1]
        op = tree[2]
        b = tree[3]
        if op == "*" and b == ("number","1"):
            return a
        elif op == "*" and b == ("number","0"):
            return ("number","0")
        elif op == "+" and b == ("number","0"):
            return a
        return tree
i.e. this:
("binop",
    ("number", "5"),
    ("*"),
    ("number", "1")
)
becomes:
("number", "5")a \* 1 \* 1.def optimize(tree): # Expression trees only
    etype = tree[0]
    if etype == "binop":
        # Fix this code so that it handles a + ( 5 * 0 )
        # recursively! QUIZ!
        a = optimize(tree[1])
        op = tree[2]
        b = optimize(tree[3])
        if op == "*" and b == ("number","1"):
            return a
        elif op == "*" and b == ("number","0"):
            return ("number","0")
        elif op == "+" and b == ("number","0"):
            return a
    return ("binop", a, op, b) # return optimized tree, not original+, -, /, len().x * 1 === xx / x !== 1document.write().a^N: it’s a+.a^N b^N, because this involves memory / context / counting. Same as balancing parantheses.A context-free grammar (CFG) can capture a^N b^N. It looks like this:
S -> aSb
S ->S -> aSb
S -> \epsilon
S -> c
Input: acb
What parsing states are in chart[2]?S -> <dot> a S b is in chart[0] from 0, so no.S -> a <dot> S b is in chart[1] from 0, so no.S -> <dot> is in chart[1] from 1, so no.S -> c <dot> is in chart[2] from 1, so yes.S -> a S <dot> b is in chart[2] from 0, so yes.
from is not 1. from 1 means that there is one hidden input not shown in this rule, whereas we can see all the input here.optimization_OK(f, g), which compares a function before-and-after optimization and tells you if it’s a safe optimization.optimization_OK so that it returns a safe answer for optimization in all cases - just never optimize!optimization_OK that works precisely in all cases - it is undecidable like the Halting Problem.
optimization_OK in all cases then we could compare any optimization_OK itself to an infinite loop def loops(), and hence we’d have solved the Halting Problem.optimization_OK is commutative; O(f,g) == O(g,f)Regular Expressions in Python (Google Code University)
tan() and tanh().