Welcome to pyparsing-highlighting’s documentation!¶

View on GitHub

Syntax highlighting with pyparsing, supporting both HTML output and prompt_toolkit–style terminal output. The PPHighlighter class can also be used as a lexer for syntax highlighting as you type in prompt_toolkit. It is compatible with existing Pygments styles.

The main benefit of pyparsing-highlighting over Pygments is that pyparsing parse expressions are both more powerful and easier to understand than Pygments lexers. pyparsing implements parsing expression grammars using parser combinators, which means that higher level parse expressions are built up in Python code out of lower level parse expressions in a straightforward to construct, readable, modular, well-structured, and easily maintainable way.

See the official pyparsing documentation or my unofficial (epydoc) documentation.

Requirements¶

Python 3.5+

Note that PyPy, a JIT compiler implementation of Python, is often able to achieve around 5x the performance of CPython, the reference Python implementation.

pyparsing
prompt_toolkit 2.0+
Pygments (optional; needed to use Pygments styles)

Installation¶

pip3 install -U pyparsing-highlighting

Or, after cloning the repository on GitHub:

python3 setup.py install

(or, with PyPy):

pypy3 setup.py install

Examples¶

The following code demonstrates the use of PPHighlighter:

from pp_highlighting import PPHighlighter
from prompt_toolkit.styles import Style
import pyparsing as pp
from pyparsing import pyparsing_common as ppc

def parser_factory(styler):
    a = styler('class:int', ppc.integer)
    return pp.delimitedList(a)

pph = PPHighlighter(parser_factory)
style = Style([('int', '#528f50')])
pph.print('1, 2, 3', style=style)

This prints out the following to the terminal:

The following code generates HTML:

pph.highlight_html('1, 2, 3')

The output is:

<pre class="highlight"><span class="int">1</span>, <span class="int">2</span>, <span class="int">3</span></pre>

There is also a lower-level API—pph.highlight('1, 2, 3') returns the following:

FormattedText([('class:int', '1'), ('', ', '), ('class:int', '2'), ('', ', '), ('class:int', '3')])

A FormattedText instance can be passed to prompt_toolkit.print_formatted_text(), along with a Style mapping the class names to colors, for display on the terminal. See the prompt_toolkit formatted text documentation and formatted text API documentation.

PPHighlighter can also be passed to a prompt_toolkit.PromptSession as the lexer argument, which will perform syntax highlighting as you type. For examples of this, see examples/calc.py, examples/json_pph.py, examples/repr.py, and examples/sexp.py. The examples can be run by (from the project root directory):

python3 -m examples.calc
python3 -m examples.json_pph
python3 -m examples.repr
python3 -m examples.sexp

Error Handling¶

If the parse expression should fail to match, it will be tried again at successive locations until it succeeds. Text encountered during retrying will be passed through unstyled. For example:

from pp_highlighting import PPHighlighter
import pyparsing as pp
from pyparsing import pyparsing_common as ppc

def parser_factory(styler):
    return styler('ansicyan', ppc.integer) + styler('ansired', ppc.identifier)

pph = PPHighlighter(parser_factory)
pph.print('1a 2b three 4c')

The output is:

Note that this parse expression does not explicitly match more than one integer/identifier pair. After it matches, it is retried on the space after the first pair, which fails, and then it is retried again starting on the first character of the second pair, which succeeds. It is then retried until it reaches 4c, which succeeds.

It is often possible to take advantage of pyparsing-highlighting’s error handling to write a simplified parse expression that does not parse a language fully but which still does ‘lexer-like’ analysis in a way that is robust to errors, and which continues to work even while the user is still typing. examples/repr.py is an example along these lines.

Testing¶

(From the project root directory):

To run the unit tests:

python3 -m unittest

To run the regression benchmark:

python3 -m tests.benchmark

Module pp_highlighting¶

Syntax highlighting for prompt_toolkit and HTML with pyparsing.

pp_highlighting.dummy_styler = <pp_highlighting.pp_highlighter.DummyStyler object>¶

An importable instance of DummyStyler to pass to parser factories.

Type:	DummyStyler

class pp_highlighting.DummyStyler[source]¶

Bases: pp_highlighting.pp_highlighter.Styler

A drop-in replacement for Styler which, when called, merely returns a copy of the given parse expression without capturing text or applying styles. To aid in testing whether a parser factory has been passed a DummyStyler object, bool(DummyStyler()) is False.

__call__(style, expr)[source]¶

Returns a copy of the given parse expression.

Parameters:	style (Union[pygments.token.Token, str]) – Ignored. expr (Union[pyparsing.ParserElement, str]) – Copied, unless it is a string literal, in which case it will be wrapped by `pyparsing.ParserElement._literalStringClass` (default `pyparsing.Literal`).
Returns:	pyparsing.ParserElement – A copy of the input parse expression.

class pp_highlighting.PPHighlighter(parser_factory, *, uses_pygments_tokens=False)[source]¶

Bases: prompt_toolkit.lexers.base.Lexer

Syntax highlighting for prompt_toolkit and HTML with pyparsing.

This class can be used to highlight text via its highlight() method (for prompt_toolkit.print_formatted_text()—see the prompt_toolkit documentation for details), its highlight_html() method, its print() method, and by passing it as the lexer argument to a prompt_toolkit.PromptSession.

__init__(parser_factory, *, uses_pygments_tokens=False)[source]¶

Constructs a new PPHighlighter.

You should supply a parser factory, a function that takes one argument and returns a parse expression. PPHighlighter will pass a Styler object as the argument (see Styler for more details).

Examples

>>> def parser_factory(styler):
>>>     a = styler('class:int', ppc.integer)
>>>     return pp.delimitedList(a)
>>> pph = PPHighlighter(parser_factory)
>>> pph.highlight('1, 2, 3')
FormattedText([('class:int', '1'), ('', ', '), ('class:int', '2'),
('', ', '), ('class:int', '3')])

FormattedText instances can be passed to prompt_toolkit.print_formatted_text().

Parameters:	parser_factory (Callable[[Styler], pyparsing.ParserElement]) – The parser factory. uses_pygments_tokens (bool) – Whether or not the parser is styled using Pygments tokens.
Raises:	`ImportError` – If `uses_pygments_tokens` is `True` and Pygments is not installed.

highlight(s)[source]¶

Highlights a string, returning a list of fragments suitable for prompt_toolkit.print_formatted_text().

Parameters:	s (str) – The input string.
Returns:	prompt_toolkit.formatted_text.FormattedText – The resulting list of prompt_toolkit text fragments.

lex_document(document)[source]¶

Takes a Document and returns a callable that takes a line number and returns a list of (style_str, text) tuples for that line.

XXX: Note that in the past, this was supposed to return a list: of (Token, text) tuples, just like a Pygments lexer.

highlight_html(s, *, css_class='highlight')[source]¶

Highlights a string, returning HTML.

Only CSS class names are currently supported. Parts of the style string that do not begin with class: will be ignored. If there are dots in the class name, they will be turned into hyphens.

Parameters:	s (str) – The input string. css_class (str) – The CSS class for the wrapping tag.
Returns:	str – The generated HTML.

print(*values, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, **kwargs)[source]¶

Highlights and prints the values to a stream, or to sys.stdout by default. It calls prompt_toolkit.print_formatted_text() internally and takes the same keyword arguments as it (compatible with the builtin print()).

Default values of keyword-only arguments:

print(*values, sep=' ', end='\n', file=sys.stdout, flush=False,
      style=None, output=None, color_depth=None,
      style_transformation=None, include_default_pygments_style=None)

class pp_highlighting.PPValidator(expr, *, multiline=True, move_cursor_to_end=False)[source]¶

Bases: prompt_toolkit.validation.Validator

A prompt_toolkit Validator for pyparsing.

__init__(expr, *, multiline=True, move_cursor_to_end=False)[source]¶

Constructs a new PPValidator.

Parameters:	expr (pyparsing.ParserElement) – The parser to use for validation. multiline (bool) – Whether to include the line number in the error message. move_cursor_to_end (bool) – Whether to move the cursor to the end of the input if a non-pyparsing exception was raised during parsing.

validate(document)[source]¶

Validate the input. If invalid, this should raise a ValidationError.

Parameters:	document – `Document` instance.

class pp_highlighting.Styler[source]¶

Bases: object

Wraps pyparsing parse expressions to capture styled text fragments.

__init__()[source]¶: Initialize self. See help(type(self)) for accurate signature.

__call__(style, expr)[source]¶

Wraps the given parse expression to capture the original text it matched, and returns the modified parse expression. The style argument can be either a prompt_toolkit style string or a Pygments token.

Parameters:	style (Union[pygments.token.Token, str]) – The style to set for this text fragment, as a string or a Pygments token. expr (Union[pyparsing.ParserElement, str]) – The pyparsing parser to wrap. If a literal string is specified, it will be wrapped by `pyparsing.ParserElement._literalStringClass` (default `pyparsing.Literal`).
Returns:	pyparsing.ParserElement – The wrapped parser.

clear()[source]¶: Removes all captured styled text fragments.

delete(loc)[source]¶

Removes the styled text fragment starting at a given location if it exists.

Parameters:	loc (int) – The styled text fragment to delete’s start location.

get(loc)[source]¶

Returns the styled text fragment starting at a given location if it exists, else None.

Parameters:	loc (int) – The styled text fragment’s start location.
Returns:	Optional[Tuple[Union[pygments.token.Token, str], str]] – The styled text fragment, if it exists.

locs()[source]¶

Returns a sorted list of styled text start locations.

Returns:	List[int] – A sorted list of styled text start locations.