bx_extras.pyparsing module

pyparsing module - Classes and methods to define and execute parsing grammars

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you don’t need to learn a new syntax for defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct the grammar directly in Python.

Here is a program to parse “Hello, World!” (or any greeting of the form “<salutation>, <addressee>!”):

from pyparsing import Word, alphas

# define grammar of a greeting
greet = Word( alphas ) + "," + Word( alphas ) + "!"

hello = "Hello, World!"
print hello, "->", greet.parseString( hello )

The program outputs the following:

Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of ‘+’, ‘|’ and ‘^’ operators.

The parsed results returned from parseString() can be accessed as a nested list, a dictionary, or an object with named attributes.

The pyparsing module handles some of the problems that are typically vexing when writing text parsers:
  • extra or missing whitespace (the above program will also handle “Hello,World!”, “Hello , World !”, etc.)

  • quoted strings

  • embedded comments

class bx_extras.pyparsing.And(exprs, savelist=True)

Bases: ParseExpression

Requires all given ParseExpressions to be found in the given order. Expressions may be separated by whitespace. May be constructed using the ‘+’ operator.

checkRecursion(parseElementList)
parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.CaselessKeyword(matchString, identChars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_$')

Bases: Keyword

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.CaselessLiteral(matchString)

Bases: Literal

Token to match a specified string, ignoring case of letters. Note: the matched results will always be in the case of the given match string, NOT the case of the input text.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.CharsNotIn(notChars, min=1, max=0, exact=0)

Bases: Token

Token for matching words composed of characters not in a given set. Defined with string containing all disallowed characters, and an optional minimum, maximum, and/or exact length. The default value for min is 1 (a minimum value < 1 is not valid); the default values for max and exact are 0, meaning no maximum or exact length restriction.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.Combine(expr, joinString='', adjacent=True)

Bases: TokenConverter

Converter to concatenate all matching tokens to a single string. By default, the matching patterns must also be contiguous in the input string; this can be disabled by specifying ‘adjacent=False’ in the constructor.

ignore(other)

Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.

postParse(instring, loc, tokenlist)
class bx_extras.pyparsing.Dict(exprs)

Bases: TokenConverter

Converter to return a repetitive expression as a list, but also as a dictionary. Each element can also be referenced using the first token in the expression as its key. Useful for tabular report scraping when the first column can be used as a item key.

postParse(instring, loc, tokenlist)
class bx_extras.pyparsing.Each(exprs, savelist=True)

Bases: ParseExpression

Requires all given ParseExpressions to be found, but in any order. Expressions may be separated by whitespace. May be constructed using the ‘&’ operator.

checkRecursion(parseElementList)
parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.Empty

Bases: Token

An empty token, will always match.

class bx_extras.pyparsing.FollowedBy(expr)

Bases: ParseElementEnhance

Lookahead matching of the given parse expression. FollowedBy does not advance the parsing position within the input string, it only verifies that the specified parse expression matches at the current position. FollowedBy always returns a null token list.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.Forward(other=None)

Bases: ParseElementEnhance

Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation. When the expression is known, it is assigned to the Forward variable using the ‘<<’ operator.

Note: take care when assigning to Forward not to overlook precedence of operators. Specifically, ‘|’ has a lower precedence than ‘<<’, so that:

fwdExpr << a | b | c
will actually be evaluated as::

(fwdExpr << a) | b | c

thereby leaving b and c out as parseable alternatives. It is recommended that you explicitly group the values inserted into the Forward:

fwdExpr << (a | b | c)
copy()

Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing pattern, using copies of the original parse element.

leaveWhitespace()

Disables the skipping of whitespace before matching the characters in the ParserElement’s defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.

streamline()
validate(validateTrace=None)

Check defined expressions for valid structure, check for infinite recursive definitions.

class bx_extras.pyparsing.GoToColumn(colno)

Bases: _PositionToken

Token to advance to a specific column of input text; useful for tabular report scraping.

parseImpl(instring, loc, doActions=True)
preParse(instring, loc)
class bx_extras.pyparsing.Group(expr)

Bases: TokenConverter

Converter to return the matched tokens as a list - useful for returning tokens of ZeroOrMore and OneOrMore expressions.

postParse(instring, loc, tokenlist)
class bx_extras.pyparsing.Keyword(matchString, identChars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_$', caseless=False)

Bases: Token

Token to exactly match a specified string as a keyword, that is, it must be immediately followed by a non-keyword character. Compare with Literal:

Literal("if") will match the leading 'if' in 'ifAndOnlyIf'.
Keyword("if") will not; it will only match the leading 'if in 'if x=1', or 'if(y==2)'

Accepts two optional constructor arguments in addition to the keyword string: identChars is a string of characters that would be valid identifier characters, defaulting to all alphanumerics + “_” and “$”; caseless allows case-insensitive matching, default is False.

DEFAULT_KEYWORD_CHARS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_$'
copy()

Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing pattern, using copies of the original parse element.

parseImpl(instring, loc, doActions=True)
static setDefaultKeywordChars(chars)

Overrides the default Keyword chars

class bx_extras.pyparsing.LineEnd

Bases: _PositionToken

Matches if current position is at the end of a line within the parse string

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.LineStart

Bases: _PositionToken

Matches if current position is at the beginning of a line within the parse string

parseImpl(instring, loc, doActions=True)
preParse(instring, loc)
class bx_extras.pyparsing.Literal(matchString)

Bases: Token

Token to exactly match a specified string.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.MatchFirst(exprs, savelist=False)

Bases: ParseExpression

Requires that at least one ParseExpression is found. If two expressions match, the first one listed is the one that will match. May be constructed using the ‘|’ operator.

checkRecursion(parseElementList)
parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.NoMatch

Bases: Token

A token that will never match.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.NotAny(expr)

Bases: ParseElementEnhance

Lookahead to disallow matching with the given parse expression. NotAny does not advance the parsing position within the input string, it only verifies that the specified parse expression does not match at the current position. Also, NotAny does not skip over leading whitespace. NotAny always returns a null token list. May be constructed using the ‘~’ operator.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.OneOrMore(expr, savelist=False)

Bases: ParseElementEnhance

Repetition of one or more of the given expression.

parseImpl(instring, loc, doActions=True)
setResultsName(name, listAllMatches=False)

Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.

class bx_extras.pyparsing.OnlyOnce(methodCall)

Bases: object

Wrapper for parse actions, to ensure they are only called once.

reset()
class bx_extras.pyparsing.Optional(exprs, default=<bx_extras.pyparsing._NullToken object>)

Bases: ParseElementEnhance

Optional matching of the given expression. A default return string can also be specified, if the optional expression is not found.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.Or(exprs, savelist=False)

Bases: ParseExpression

Requires that at least one ParseExpression is found. If two expressions match, the expression that matches the longest string will be used. May be constructed using the ‘^’ operator.

checkRecursion(parseElementList)
parseImpl(instring, loc, doActions=True)
exception bx_extras.pyparsing.ParseBaseException(pstr, loc=0, msg=None, elem=None)

Bases: Exception

base exception class for all parsing runtime exceptions

loc
markInputline(markerString='>!<')

Extracts the exception line from the input string, and marks the location of the exception with a special symbol.

msg
parserElement
pstr
class bx_extras.pyparsing.ParseElementEnhance(expr, savelist=False)

Bases: ParserElement

Abstract subclass of ParserElement, for combining and post-processing parsed tokens.

checkRecursion(parseElementList)
ignore(other)

Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.

leaveWhitespace()

Disables the skipping of whitespace before matching the characters in the ParserElement’s defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.

parseImpl(instring, loc, doActions=True)
streamline()
validate(validateTrace=None)

Check defined expressions for valid structure, check for infinite recursive definitions.

exception bx_extras.pyparsing.ParseException(pstr, loc=0, msg=None, elem=None)

Bases: ParseBaseException

exception thrown when parse expressions don’t match class; supported attributes by name are:

  • lineno - returns the line number of the exception text

  • col - returns the column number of the exception text

  • line - returns the line containing the exception text

loc
msg
parserElement
pstr
class bx_extras.pyparsing.ParseExpression(exprs, savelist=False)

Bases: ParserElement

Abstract subclass of ParserElement, for combining and post-processing parsed tokens.

append(other)
ignore(other)

Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.

leaveWhitespace()

Extends leaveWhitespace defined in base class, and also invokes leaveWhitespace on all contained expressions.

setResultsName(name, listAllMatches=False)

Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.

streamline()
validate(validateTrace=None)

Check defined expressions for valid structure, check for infinite recursive definitions.

exception bx_extras.pyparsing.ParseFatalException(pstr, loc=0, msg=None, elem=None)

Bases: ParseBaseException

user-throwable exception thrown when inconsistent parse content is found; stops all parsing immediately

loc
msg
parserElement
pstr
class bx_extras.pyparsing.ParseResults(toklist, name=None, asList=True, modal=True)

Bases: object

Structured parse results, to provide multiple means of access to the parsed data: - as a list (len(results)) - by list index (results[0], results[1], etc.) - by attribute (results.<resultsName>)

asDict()

Returns the named parse results as dictionary.

asList()

Returns the parse results as a nested list of matching tokens, all converted to strings.

asXML(doctag=None, namedItemsOnly=False, indent='', formatted=True)

Returns the parse results as XML. Tags are created for tokens and lists that have defined results names.

copy()

Returns a new copy of a ParseResults object.

dump(indent='', depth=0)

Diagnostic method for listing out the contents of a ParseResults. Accepts an optional indent argument so that this string can be embedded in a nested display of other data.

get(key, defaultValue=None)

Returns named result matching the given key, or if there is no such name, then returns the given defaultValue or None if no defaultValue is specified.

getName()

Returns the results name for this token expression.

items()

Returns all named result keys and values as a list of tuples.

keys()

Returns all named result keys.

pop(index=-1)

Removes and returns item at specified index (default=last). Will work with either numeric indices or dict-key indicies.

values()

Returns all named result values.

exception bx_extras.pyparsing.ParseSyntaxException(pe)

Bases: ParseFatalException

just like ParseFatalException, but thrown internally when an ErrorStop indicates that parsing is to stop immediately because an unbacktrackable syntax error has been found

loc
msg
parserElement
pstr
class bx_extras.pyparsing.ParserElement(savelist=False)

Bases: object

Abstract base level parser element class.

DEFAULT_WHITE_CHARS = ' \n\t\r'
addParseAction(*fns, **kwargs)

Add parse action to expression’s list of parse actions. See L{I{setParseAction}<setParseAction>}.

checkRecursion(parseElementList)
copy()

Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing pattern, using copies of the original parse element.

static enablePackrat()

Enables “packrat” parsing, which adds memoizing to the parsing logic. Repeated parse attempts at the same string location (which happens often in many complex grammars) can immediately return a cached value, instead of re-executing parsing/validating code. Memoizing is done of both valid results and parsing exceptions.

This speedup may break existing programs that use parse actions that have side-effects. For this reason, packrat parsing is disabled when you first import pyparsing. To activate the packrat feature, your program must call the class method ParserElement.enablePackrat(). If your program uses psyco to “compile as you go”, you must call enablePackrat before calling psyco.full(). If you do not do this, Python will crash. For best results, call enablePackrat() immediately after importing pyparsing.

getException()
ignore(other)

Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.

leaveWhitespace()

Disables the skipping of whitespace before matching the characters in the ParserElement’s defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.

parseFile(file_or_filename)

Execute the parse expression on the given file or filename. If a filename is specified (instead of a file object), the entire file is opened, read, and closed before parsing.

parseImpl(instring, loc, doActions=True)
parseString(instring, parseAll=False)

Execute the parse expression with the given string. This is the main interface to the client code, once the complete expression has been built.

If you want the grammar to require that the entire input string be successfully parsed, then set parseAll to True (equivalent to ending the grammar with StringEnd()).

Note: parseString implicitly calls expandtabs() on the input string, in order to report proper column numbers in parse actions. If the input string contains tabs and the grammar uses parse actions that use the loc argument to index into the string being parsed, you can ensure you have a consistent view of the input string by:

  • calling parseWithTabs on your grammar before calling parseString (see L{I{parseWithTabs}<parseWithTabs>})

  • define your parse action using the full (s,loc,toks) signature, and reference the input string using the parse action’s s argument

  • explictly expand the tabs in your input string before calling parseString

parseWithTabs()

Overrides default behavior to expand <TAB>s to spaces before parsing the input string. Must be called before parseString when the input grammar contains elements that match <TAB> characters.

postParse(instring, loc, tokenlist)
preParse(instring, loc)
static resetCache()
scanString(instring, maxMatches=9223372036854775807)

Scan the input string for expression matches. Each match will return the matching tokens, start location, and end location. May be called with optional maxMatches argument, to clip scanning after ‘n’ matches are found.

Note that the start and end locations are reported relative to the string being parsed. See L{I{parseString}<parseString>} for more information on parsing strings with embedded tabs.

searchString(instring, maxMatches=9223372036854775807)

Another extension to scanString, simplifying the access to the tokens found to match the given parse expression. May be called with optional maxMatches argument, to clip searching after ‘n’ matches are found.

setBreak(breakFlag=True)

Method to invoke the Python pdb debugger when this element is about to be parsed. Set breakFlag to True to enable, False to disable.

setDebug(flag=True)

Enable display of debugging messages while doing pattern matching. Set flag to True to enable, False to disable.

setDebugActions(startAction, successAction, exceptionAction)

Enable display of debugging messages while doing pattern matching.

static setDefaultWhitespaceChars(chars)

Overrides the default whitespace chars

setFailAction(fn)

Define action to perform if parsing fails at this expression. Fail acton fn is a callable function that takes the arguments fn(s,loc,expr,err) where:

  • s = string being parsed

  • loc = location where expression match was attempted and failed

  • expr = the parse expression that failed

  • err = the exception thrown

The function returns no value. It may throw ParseFatalException if it is desired to stop parsing immediately.

setName(name)

Define name for this expression, for use in debugging.

setParseAction(*fns, **kwargs)

Define action to perform when successfully matching parse element definition. Parse action fn is a callable method with 0-3 arguments, called as fn(s,loc,toks), fn(loc,toks), fn(toks), or just fn(), where:

  • s = the original string being parsed (see note below)

  • loc = the location of the matching substring

  • toks = a list of the matched tokens, packaged as a ParseResults object

If the functions in fns modify the tokens, they can return them as the return value from fn, and the modified list of tokens will replace the original. Otherwise, fn does not need to return any value.

Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See L{I{parseString}<parseString>} for more information on parsing strings containing <TAB>s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.

setResultsName(name, listAllMatches=False)

Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.

setWhitespaceChars(chars)

Overrides the default whitespace chars

streamline()
suppress()

Suppresses the output of this ParserElement; useful to keep punctuation from cluttering up returned output.

transformString(instring)

Extension to scanString, to modify matching text with modified tokens that may be returned from a parse action. To use transformString, define a grammar and attach a parse action to it that modifies the returned token list. Invoking transformString() on a target string will then scan for matches, and replace the matched text patterns according to the logic in the parse action. transformString() returns the resulting transformed string.

tryParse(instring, loc)
validate(validateTrace=None)

Check defined expressions for valid structure, check for infinite recursive definitions.

class bx_extras.pyparsing.QuotedString(quoteChar, escChar=None, escQuote=None, multiline=False, unquoteResults=True, endQuoteChar=None)

Bases: Token

Token for matching strings that are delimited by quoting characters.

parseImpl(instring, loc, doActions=True)
exception bx_extras.pyparsing.RecursiveGrammarException(parseElementList)

Bases: Exception

exception thrown by validate() if the grammar could be improperly recursive

class bx_extras.pyparsing.Regex(pattern, flags=0)

Bases: Token

Token for matching strings that match a given regular expression. Defined with string specifying the regular expression in a form recognized by the inbuilt Python re module.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.SkipTo(other, include=False, ignore=None)

Bases: ParseElementEnhance

Token for skipping over all undefined text until the matched expression is found. If include is set to true, the matched expression is also consumed. The ignore argument is used to define grammars (typically quoted strings and comments) that might contain false matches.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.StringEnd

Bases: _PositionToken

Matches if current position is at the end of the parse string

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.StringStart

Bases: _PositionToken

Matches if current position is at the beginning of the parse string

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.Suppress(expr, savelist=False)

Bases: TokenConverter

Converter for ignoring the results of a parsed expression.

postParse(instring, loc, tokenlist)
suppress()

Suppresses the output of this ParserElement; useful to keep punctuation from cluttering up returned output.

class bx_extras.pyparsing.Token

Bases: ParserElement

Abstract ParserElement subclass, for defining atomic matching patterns.

setName(name)

Define name for this expression, for use in debugging.

class bx_extras.pyparsing.TokenConverter(expr, savelist=False)

Bases: ParseElementEnhance

Abstract subclass of ParseExpression, for converting parsed results.

class bx_extras.pyparsing.Upcase(*args)

Bases: TokenConverter

Converter to upper case all matching tokens.

postParse(instring, loc, tokenlist)
class bx_extras.pyparsing.White(ws=' \t\r\n', min=1, max=0, exact=0)

Bases: Token

Special matching class for matching whitespace. Normally, whitespace is ignored by pyparsing grammars. This class is included when some whitespace structures are significant. Define with a string containing the whitespace characters to be matched; default is “ tn”. Also takes optional min, max, and exact arguments, as defined for the Word class.

parseImpl(instring, loc, doActions=True)
whiteStrs = {'\t': '<TAB>', '\n': '<LF>', '\x0c': '<FF>', '\r': '<CR>', ' ': '<SPC>'}
class bx_extras.pyparsing.Word(initChars, bodyChars=None, min=1, max=0, exact=0, asKeyword=False)

Bases: Token

Token for matching words composed of allowed character sets. Defined with string containing all allowed initial characters, an optional string containing allowed body characters (if omitted, defaults to the initial character set), and an optional minimum, maximum, and/or exact length. The default value for min is 1 (a minimum value < 1 is not valid); the default values for max and exact are 0, meaning no maximum or exact length restriction.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.WordEnd(wordChars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')

Bases: _PositionToken

Matches if the current position is at the end of a Word, and is not followed by any character in a given set of wordChars (default=printables). To emulate the  behavior of regular expressions, use WordEnd(alphanums). WordEnd will also match at the end of the string being parsed, or at the end of a line.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.WordStart(wordChars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')

Bases: _PositionToken

Matches if the current position is at the beginning of a Word, and is not preceded by any character in a given set of wordChars (default=printables). To emulate the  behavior of regular expressions, use WordStart(alphanums). WordStart will also match at the beginning of the string being parsed, or at the beginning of a line.

parseImpl(instring, loc, doActions=True)
class bx_extras.pyparsing.ZeroOrMore(expr)

Bases: ParseElementEnhance

Optional repetition of zero or more of the given expression.

parseImpl(instring, loc, doActions=True)
setResultsName(name, listAllMatches=False)

Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.

bx_extras.pyparsing.col(loc, strg)

Returns current column within a string, counting newlines as line separators. The first column is number 1.

Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See L{I{ParserElement.parseString}<ParserElement.parseString>} for more information on parsing strings containing <TAB>s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.

bx_extras.pyparsing.countedArray(expr)

Helper to define a counted list of expressions. This helper defines a pattern of the form:

integer expr expr expr...

where the leading integer tells how many expr expressions follow. The matched tokens returns the array of expr tokens as a list - the leading count token is suppressed.

bx_extras.pyparsing.delimitedList(expr, delim=',', combine=False)

Helper to define a delimited list of expressions - the delimiter defaults to ‘,’. By default, the list elements and delimiters can have intervening whitespace, and comments, but this can be overridden by passing ‘combine=True’ in the constructor. If combine is set to True, the matching tokens are returned as a single token string, with the delimiters included; otherwise, the matching tokens are returned as a list of tokens, with the delimiters suppressed.

bx_extras.pyparsing.dictOf(key, value)

Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value. Takes care of defining the Dict, ZeroOrMore, and Group tokens in the proper order. The key pattern can include delimiting markers or punctuation, as long as they are suppressed, thereby leaving the significant key text. The value pattern can include named results, so that the Dict results can include named token fields.

bx_extras.pyparsing.downcaseTokens(s, l, t)

Helper parse action to convert tokens to lower case.

bx_extras.pyparsing.getTokensEndLoc()

Method to be called from within a parse action to determine the end location of the parsed tokens.

bx_extras.pyparsing.indentedBlock(blockStatementExpr, indentStack, indent=True)

Helper method for defining space-delimited indentation blocks, such as those used to define block statements in Python source code.

Parameters:
  • blockStatementExpr - expression defining syntax of statement that

    is repeated within the indented block

  • indentStack - list created by caller to manage indentation stack

    (multiple statementWithIndentedBlock expressions within a single grammar should share a common indentStack)

  • indent - boolean indicating whether block must be indented beyond the

    the current level; set to False for block of left-most statements (default=True)

A valid block must contain at least one blockStatement.

bx_extras.pyparsing.keepOriginalText(s, startLoc, t)

Helper parse action to preserve original parsed text, overriding any nested parse actions.

bx_extras.pyparsing.line(loc, strg)

Returns the line of text containing loc within a string, counting newlines as line separators.

bx_extras.pyparsing.lineno(loc, strg)

Returns current line number within a string, counting newlines as line separators. The first line is number 1.

Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See L{I{ParserElement.parseString}<ParserElement.parseString>} for more information on parsing strings containing <TAB>s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.

bx_extras.pyparsing.makeHTMLTags(tagStr)

Helper to construct opening and closing tag expressions for HTML, given a tag name

bx_extras.pyparsing.makeXMLTags(tagStr)

Helper to construct opening and closing tag expressions for XML, given a tag name

bx_extras.pyparsing.matchOnlyAtCol(n)

Helper method for defining parse actions that require matching at a specific column in the input text.

bx_extras.pyparsing.matchPreviousExpr(expr)

Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a ‘repeat’ of a previous expression. For example:

first = Word(nums)
second = matchPreviousExpr(first)
matchExpr = first + ":" + second

will match “1:1”, but not “1:2”. Because this matches by expressions, will not match the leading “1:1” in “1:10”; the expressions are evaluated first, and then compared, so “1” is compared with “10”. Do not use with packrat parsing enabled.

bx_extras.pyparsing.matchPreviousLiteral(expr)

Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a ‘repeat’ of a previous expression. For example:

first = Word(nums)
second = matchPreviousLiteral(first)
matchExpr = first + ":" + second

will match “1:1”, but not “1:2”. Because this matches a previous literal, will also match the leading “1:1” in “1:10”. If this is not desired, use matchPreviousExpr. Do not use with packrat parsing enabled.

bx_extras.pyparsing.nestedExpr(opener='(', closer=')', content=None, ignoreExpr=quotedString using single or double quotes)

Helper method for defining nested lists enclosed in opening and closing delimiters (“(” and “)” are the default).

Parameters:
  • opener - opening character for a nested list (default=”(“); can also be a pyparsing expression

  • closer - closing character for a nested list (default=”)”); can also be a pyparsing expression

  • content - expression for items within the nested lists (default=None)

  • ignoreExpr - expression for ignoring opening and closing delimiters (default=quotedString)

If an expression is not provided for the content argument, the nested expression will capture all whitespace-delimited content between delimiters as a list of separate values.

Use the ignoreExpr argument to define expressions that may contain opening or closing characters that should not be treated as opening or closing characters for nesting, such as quotedString or a comment expression. Specify multiple expressions using an Or or MatchFirst. The default is quotedString, but if no expressions are to be ignored, then pass None for this argument.

bx_extras.pyparsing.nullDebugAction(*args)

‘Do-nothing’ debug action, to suppress debugging output during parsing.

bx_extras.pyparsing.oneOf(strs, caseless=False, useRegex=True)

Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.

Parameters:
  • strs - a string of space-delimited literals, or a list of string literals

  • caseless - (default=False) - treat all literals as caseless

  • useRegex - (default=True) - as an optimization, will generate a Regex object; otherwise, will generate a MatchFirst object (if caseless=True, or if creating a Regex raises an exception)

bx_extras.pyparsing.operatorPrecedence(baseExpr, opList)

Helper method for constructing grammars of expressions made up of operators working in a precedence hierarchy. Operators may be unary or binary, left- or right-associative. Parse actions can also be attached to operator expressions.

Parameters:
  • baseExpr - expression representing the most basic element for the nested

  • opList - list of tuples, one for each operator precedence level in the expression grammar; each tuple is of the form (opExpr, numTerms, rightLeftAssoc, parseAction), where:

    • opExpr is the pyparsing expression for the operator;

      may also be a string, which will be converted to a Literal; if numTerms is 3, opExpr is a tuple of two expressions, for the two operators separating the 3 terms

    • numTerms is the number of terms for this operator (must

      be 1, 2, or 3)

    • rightLeftAssoc is the indicator whether the operator is

      right or left associative, using the pyparsing-defined constants opAssoc.RIGHT and opAssoc.LEFT.

    • parseAction is the parse action to be associated with

      expressions matching this operator expression (the parse action tuple member may be omitted)

bx_extras.pyparsing.removeQuotes(s, l, t)

Helper parse action for removing quotation marks from parsed quoted strings. To use, add this parse action to quoted string using:

quotedString.setParseAction( removeQuotes )
bx_extras.pyparsing.replaceHTMLEntity(t)
bx_extras.pyparsing.replaceWith(replStr)

Helper method for common parse actions that simply return a literal value. Especially useful when used with transformString().

bx_extras.pyparsing.srange(s)

Helper to easily define string ranges for use in Word construction. Borrows syntax from regexp ‘[]’ string range definitions:

srange("[0-9]")   -> "0123456789"
srange("[a-z]")   -> "abcdefghijklmnopqrstuvwxyz"
srange("[a-z$_]") -> "abcdefghijklmnopqrstuvwxyz$_"

The input string must be enclosed in []’s, and the returned string is the expanded character set joined into a single string. The values enclosed in the []’s may be:

a single character
an escaped character with a leading backslash (such as \- or \])
an escaped hex character with a leading '\0x' (\0x21, which is a '!' character)
an escaped octal character with a leading '\0' (\041, which is a '!' character)
a range of any of the above, separated by a dash ('a-z', etc.)
any combination of the above ('aeiouy', 'a-zA-Z0-9_$', etc.)
bx_extras.pyparsing.traceParseAction(f)

Decorator for debugging parse actions.

bx_extras.pyparsing.upcaseTokens(s, l, t)

Helper parse action to convert tokens to upper case.

bx_extras.pyparsing.withAttribute(*args, **attrDict)

Helper to create a validating parse action to be used with start tags created with makeXMLTags or makeHTMLTags. Use withAttribute to qualify a starting tag with a required attribute value, to avoid false matches on common tags such as <TD> or <DIV>.

Call withAttribute with a series of attribute names and values. Specify the list of filter attributes names and values as:

  • keyword arguments, as in (class=”Customer”,align=”right”), or

  • a list of name-value tuples, as in ( (“ns1:class”, “Customer”), (“ns2:align”,”right”) )

For attribute names with a namespace prefix, you must use the second form. Attribute names are matched insensitive to upper/lower case.

To verify that the attribute exists, but without specifying a value, pass withAttribute.ANY_VALUE as the value.