May 31, 2020
Syntax Analysis #
Syntax analysis happens after the Lexical phase, and it is responsible for detecting syntax errors.
Be design, computer languages have defined structure of what constitutes a valid program. in Python, a program is made up of functions/classes/imports, a function requires declarations and/or statements and so on. In C, a valid program needs to have a least a function called main, otherwise the GNU’s linker is unable to link the program.
Programmers may define language grammars using the Backus-Naur Form (BNF)
notation. Defining grammars offer a lot of benefits for the language designer.
Like code, language grammars may change overtime. For example, one of Python’s
3.5 new feature was the support to coroutines with
The following diff shows some of the changes the developer needed to add
support for the
async before the function definition or allow an
function definition to be used as a decorator.
diff -r ccac513ee610 Grammar/Grammar --- a/Grammar/Grammar Mon Apr 20 21:05:23 2015 +0200 +++ b/Grammar/Grammar Mon Apr 20 15:54:06 2015 -0400 @@ -21,8 +21,11 @@ decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE decorators: decorator+ -decorated: decorators (classdef | funcdef) -funcdef: 'def' NAME parameters ['->' test] ':' suite +decorated: decorators (classdef | funcdef | async_funcdef) + +async_funcdef: ASYNC funcdef +funcdef: ('def' NAME parameters ['->' test] ':' suite) + parameters: '(' [typedargslist] ')' typedargslist: (tfpdef ['=' test] (',' tfpdef ['=' test])* [',' ['*' [tfpdef] (',' tfpdef ['=' test])* [',' '**' tfpdef] | '**' tfpdef]] @@ -37,18 +40,19 @@ simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
Other than making it easier for the language designer to design the language, it allows the grammar to be easily documented and maintained by other language designers.
Some grammars allow parsers (syntax analyzers) to be constructed to determine the syntactic structure of the program. By relying on the structured grammar, it allows the parsers to be developed more easily and testable.
Types of parsers #
There are 3 types of parsers we can use to write a syntax parser.
- Universal: this method can parse any grammar
- Top-down: this method builds parse trees from the top (root) to the button (leaves)
- Bottom-up: this method builds parse trees from the button (leaves) to the top (root)
The last two methods reads the input from left to right and one symbol at a time.
The top-down and bottom-up are known to be more efficient in production use1.
I am not really sure why this is, as I haven’t studied Universal methods] ↩︎