NAME
Parser::Combinators - A library of building blocks for parsing, similar
to Haskell's Parsec
SYNOPSIS
use Parser::Combinators;
my $parser = < a combination of the parser building blocks from Parser::Combinators >
(my $status, my $rest, my $matches) = $parser->($str);
my $parse_tree = getParseTree($matches);
DESCRIPTION
Parser::Combinators is a simple parser combinator library inspired by
the Parsec parser combinator library in Haskell. It is not complete
(i.e. not all Parsec combinators have been implemented), I have just
implemented what I needed:
whiteSpace : parses any white space, always returns success. I
Lexeme parsers (they remove trailing whitespace):
word : (\w+)
number : (\d+)
symbol : parses a given symbol, e.g. symbol('int')
comma : parses a comma
char : parses a given character
Combinators:
sequence( [ $parser1, $parser2, ... ], $optional_sub_ref )
choice( $parser1, $parser2, ...) : tries the specified parsers in order
try : normally, the parser consums matching input. try() stops a parser from consuming the string
maybe : is like try() but always reports success
parens( $parser ) : parser '(', then applies $parser, then ')'
many( $parser) : applies $parser zero or more times
sepBy( $separator, $parser) : parses a list of $parser separated by $separator
oneOf( [$patt1, $patt2,...]): like symbol() but parses the patterns in order
Dangerous: the following parsers take a regular expression
upto( $patt )
greedyUpto( $patt)
regex( $patt)
As there is no Haskell-style syntactic sugar in Perl, I use the
sequence() combinator where in Haskell you would use the do-notation.
sequence() takes a ref to a list of parsers and optionally a code ref to
a sub that can manipulate the result before returning it.
Also, you can label any parser in a sequence using an anonymous hash,
for example:
sub type_parser {
sequence [
{Type => word},
maybe parens choice(
{Kind => number},
sequence [
symbol('kind'),
symbol('='),
{Kind => number}
]
)
]
}
Applying this parser returns a tuple as follows:
my $str = 'integer(kind=8), '
(my $status, my $rest, my $matches) = type_parser($str);
Here,`$status` is 0 if the match failed, 1 if it succeeded. `$rest`
contains the rest of the string. The actual matches are stored in the
array $matches. As every parser returns its resuls as an array ref,
$matches contains the concrete parsed syntax, i.e. a nested array of
arrays of strings.
Dumper($matches) ==> [{'Type' => ['integer']},[['kind'],['\\='],{'Kind' => ['8']}]]
You can extract only the labeled matches using `getParseTree`:
my $parse_tree = getParseTree($matches);
Dumper($parse_tree) ==> [{'Type' => 'integer'},{'Kind' => '8'}]
PS: I have also implemented bind() and enter() (as 'return' is reserved)
for those who like monads ^_^
AUTHOR
Wim Vanderbauwhede <Wim.Vanderbauwhede@gmail.com>
COPYRIGHT
Copyright 2013- Wim Vanderbauwhede
LICENSE
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
SEE ALSO