Parsing library based on combinators and ppx extension to write languages

Pacomb is a parsing library that compiles grammars to combinators prior to parsing together with a PPX extension to write parsers inside OCaml files.

The advantages of Pacomb are

  • Grammars as first class values defined in your OCaml files. This is an example from the distribution:

( The three levels of priorities *) type p = Atom | Prod | Sum let%parser rec

This includes each priority level in the next one )
 expr p = Atom < Prod < Sum
all other rule are selected by their priority level )
        ; (p=Atom) (x::FLOAT)                        => x
        ; (p=Atom) '(' (e::expr Sum) ')'             => e
        ; (p=Prod) (x::expr Prod) '
' (y::expr Atom) => x*.y

        ; (p=Prod) (x::expr Prod) '/' (y::expr Atom) => x/.y
        ; (p=Sum ) (x::expr Sum ) '+' (y::expr Prod) => x+.y
        ; (p=Sum ) (x::expr Sum ) '-' (y::expr Prod) => x-.y
  • Good performances:

    • on non ambiguous grammars, 2 to 3 time slower compared to ocamlyacc

    • on ambiguous grammars O(N^3 ln(N)) can be achieved.

  • Parsing from left to right (despite the use of combinators) allowing not to keep the whole input in memory and allowing to parse streams.

  • Dependant sequence allowing for self extensible grammars (like new infix with a given priority in a given example).

  • Managing of blanks that for instance allows for nested language using different kind of comments or blanks.

  • Support for cache and merge for ambiguous grammars (to get O(N^3 ln(N)))

  • Enough support for utf8 to write parser for a language using utf8.

  • Comes with documentation and various examples illustrating most possibilities.

All this makes Pacomb a promising solution to write languages in OCaml.

AuthorsChristophe Raffalli <> and Rodolphe Lepigre <>
MaintainerChristophe Raffalli <>
