bx_extras.pyparsing module
pyparsing module - Classes and methods to define and execute parsing grammars
The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you don’t need to learn a new syntax for defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct the grammar directly in Python.
Here is a program to parse “Hello, World!” (or any greeting of the form “<salutation>, <addressee>!”):
from pyparsing import Word, alphas
# define grammar of a greeting
greet = Word( alphas ) + "," + Word( alphas ) + "!"
hello = "Hello, World!"
print hello, "->", greet.parseString( hello )
The program outputs the following:
Hello, World! -> ['Hello', ',', 'World', '!']
The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of ‘+’, ‘|’ and ‘^’ operators.
The parsed results returned from parseString() can be accessed as a nested list, a dictionary, or an object with named attributes.
- The pyparsing module handles some of the problems that are typically vexing when writing text parsers:
extra or missing whitespace (the above program will also handle “Hello,World!”, “Hello , World !”, etc.)
quoted strings
embedded comments
- class bx_extras.pyparsing.And(exprs, savelist=True)
Bases:
ParseExpression
Requires all given ParseExpressions to be found in the given order. Expressions may be separated by whitespace. May be constructed using the ‘+’ operator.
- checkRecursion(parseElementList)
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.CaselessKeyword(matchString, identChars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_$')
Bases:
Keyword
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.CaselessLiteral(matchString)
Bases:
Literal
Token to match a specified string, ignoring case of letters. Note: the matched results will always be in the case of the given match string, NOT the case of the input text.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.CharsNotIn(notChars, min=1, max=0, exact=0)
Bases:
Token
Token for matching words composed of characters not in a given set. Defined with string containing all disallowed characters, and an optional minimum, maximum, and/or exact length. The default value for min is 1 (a minimum value < 1 is not valid); the default values for max and exact are 0, meaning no maximum or exact length restriction.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.Combine(expr, joinString='', adjacent=True)
Bases:
TokenConverter
Converter to concatenate all matching tokens to a single string. By default, the matching patterns must also be contiguous in the input string; this can be disabled by specifying ‘adjacent=False’ in the constructor.
- ignore(other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.
- postParse(instring, loc, tokenlist)
- class bx_extras.pyparsing.Dict(exprs)
Bases:
TokenConverter
Converter to return a repetitive expression as a list, but also as a dictionary. Each element can also be referenced using the first token in the expression as its key. Useful for tabular report scraping when the first column can be used as a item key.
- postParse(instring, loc, tokenlist)
- class bx_extras.pyparsing.Each(exprs, savelist=True)
Bases:
ParseExpression
Requires all given ParseExpressions to be found, but in any order. Expressions may be separated by whitespace. May be constructed using the ‘&’ operator.
- checkRecursion(parseElementList)
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.FollowedBy(expr)
Bases:
ParseElementEnhance
Lookahead matching of the given parse expression. FollowedBy does not advance the parsing position within the input string, it only verifies that the specified parse expression matches at the current position. FollowedBy always returns a null token list.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.Forward(other=None)
Bases:
ParseElementEnhance
Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation. When the expression is known, it is assigned to the Forward variable using the ‘<<’ operator.
Note: take care when assigning to Forward not to overlook precedence of operators. Specifically, ‘|’ has a lower precedence than ‘<<’, so that:
fwdExpr << a | b | c
- will actually be evaluated as::
(fwdExpr << a) | b | c
thereby leaving b and c out as parseable alternatives. It is recommended that you explicitly group the values inserted into the Forward:
fwdExpr << (a | b | c)
- copy()
Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing pattern, using copies of the original parse element.
- leaveWhitespace()
Disables the skipping of whitespace before matching the characters in the ParserElement’s defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.
- streamline()
- validate(validateTrace=None)
Check defined expressions for valid structure, check for infinite recursive definitions.
- class bx_extras.pyparsing.GoToColumn(colno)
Bases:
_PositionToken
Token to advance to a specific column of input text; useful for tabular report scraping.
- parseImpl(instring, loc, doActions=True)
- preParse(instring, loc)
- class bx_extras.pyparsing.Group(expr)
Bases:
TokenConverter
Converter to return the matched tokens as a list - useful for returning tokens of ZeroOrMore and OneOrMore expressions.
- postParse(instring, loc, tokenlist)
- class bx_extras.pyparsing.Keyword(matchString, identChars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_$', caseless=False)
Bases:
Token
Token to exactly match a specified string as a keyword, that is, it must be immediately followed by a non-keyword character. Compare with Literal:
Literal("if") will match the leading 'if' in 'ifAndOnlyIf'. Keyword("if") will not; it will only match the leading 'if in 'if x=1', or 'if(y==2)'
Accepts two optional constructor arguments in addition to the keyword string: identChars is a string of characters that would be valid identifier characters, defaulting to all alphanumerics + “_” and “$”; caseless allows case-insensitive matching, default is False.
- DEFAULT_KEYWORD_CHARS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_$'
- copy()
Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing pattern, using copies of the original parse element.
- parseImpl(instring, loc, doActions=True)
- static setDefaultKeywordChars(chars)
Overrides the default Keyword chars
- class bx_extras.pyparsing.LineEnd
Bases:
_PositionToken
Matches if current position is at the end of a line within the parse string
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.LineStart
Bases:
_PositionToken
Matches if current position is at the beginning of a line within the parse string
- parseImpl(instring, loc, doActions=True)
- preParse(instring, loc)
- class bx_extras.pyparsing.Literal(matchString)
Bases:
Token
Token to exactly match a specified string.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.MatchFirst(exprs, savelist=False)
Bases:
ParseExpression
Requires that at least one ParseExpression is found. If two expressions match, the first one listed is the one that will match. May be constructed using the ‘|’ operator.
- checkRecursion(parseElementList)
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.NoMatch
Bases:
Token
A token that will never match.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.NotAny(expr)
Bases:
ParseElementEnhance
Lookahead to disallow matching with the given parse expression. NotAny does not advance the parsing position within the input string, it only verifies that the specified parse expression does not match at the current position. Also, NotAny does not skip over leading whitespace. NotAny always returns a null token list. May be constructed using the ‘~’ operator.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.OneOrMore(expr, savelist=False)
Bases:
ParseElementEnhance
Repetition of one or more of the given expression.
- parseImpl(instring, loc, doActions=True)
- setResultsName(name, listAllMatches=False)
Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.
- class bx_extras.pyparsing.OnlyOnce(methodCall)
Bases:
object
Wrapper for parse actions, to ensure they are only called once.
- reset()
- class bx_extras.pyparsing.Optional(exprs, default=<bx_extras.pyparsing._NullToken object>)
Bases:
ParseElementEnhance
Optional matching of the given expression. A default return string can also be specified, if the optional expression is not found.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.Or(exprs, savelist=False)
Bases:
ParseExpression
Requires that at least one ParseExpression is found. If two expressions match, the expression that matches the longest string will be used. May be constructed using the ‘^’ operator.
- checkRecursion(parseElementList)
- parseImpl(instring, loc, doActions=True)
- exception bx_extras.pyparsing.ParseBaseException(pstr, loc=0, msg=None, elem=None)
Bases:
Exception
base exception class for all parsing runtime exceptions
- loc
- markInputline(markerString='>!<')
Extracts the exception line from the input string, and marks the location of the exception with a special symbol.
- msg
- parserElement
- pstr
- class bx_extras.pyparsing.ParseElementEnhance(expr, savelist=False)
Bases:
ParserElement
Abstract subclass of ParserElement, for combining and post-processing parsed tokens.
- checkRecursion(parseElementList)
- ignore(other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.
- leaveWhitespace()
Disables the skipping of whitespace before matching the characters in the ParserElement’s defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.
- parseImpl(instring, loc, doActions=True)
- streamline()
- validate(validateTrace=None)
Check defined expressions for valid structure, check for infinite recursive definitions.
- exception bx_extras.pyparsing.ParseException(pstr, loc=0, msg=None, elem=None)
Bases:
ParseBaseException
exception thrown when parse expressions don’t match class; supported attributes by name are:
lineno - returns the line number of the exception text
col - returns the column number of the exception text
line - returns the line containing the exception text
- loc
- msg
- parserElement
- pstr
- class bx_extras.pyparsing.ParseExpression(exprs, savelist=False)
Bases:
ParserElement
Abstract subclass of ParserElement, for combining and post-processing parsed tokens.
- append(other)
- ignore(other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.
- leaveWhitespace()
Extends leaveWhitespace defined in base class, and also invokes leaveWhitespace on all contained expressions.
- setResultsName(name, listAllMatches=False)
Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.
- streamline()
- validate(validateTrace=None)
Check defined expressions for valid structure, check for infinite recursive definitions.
- exception bx_extras.pyparsing.ParseFatalException(pstr, loc=0, msg=None, elem=None)
Bases:
ParseBaseException
user-throwable exception thrown when inconsistent parse content is found; stops all parsing immediately
- loc
- msg
- parserElement
- pstr
- class bx_extras.pyparsing.ParseResults(toklist, name=None, asList=True, modal=True)
Bases:
object
Structured parse results, to provide multiple means of access to the parsed data: - as a list (len(results)) - by list index (results[0], results[1], etc.) - by attribute (results.<resultsName>)
- asDict()
Returns the named parse results as dictionary.
- asList()
Returns the parse results as a nested list of matching tokens, all converted to strings.
- asXML(doctag=None, namedItemsOnly=False, indent='', formatted=True)
Returns the parse results as XML. Tags are created for tokens and lists that have defined results names.
- copy()
Returns a new copy of a ParseResults object.
- dump(indent='', depth=0)
Diagnostic method for listing out the contents of a ParseResults. Accepts an optional indent argument so that this string can be embedded in a nested display of other data.
- get(key, defaultValue=None)
Returns named result matching the given key, or if there is no such name, then returns the given defaultValue or None if no defaultValue is specified.
- getName()
Returns the results name for this token expression.
- items()
Returns all named result keys and values as a list of tuples.
- keys()
Returns all named result keys.
- pop(index=-1)
Removes and returns item at specified index (default=last). Will work with either numeric indices or dict-key indicies.
- values()
Returns all named result values.
- exception bx_extras.pyparsing.ParseSyntaxException(pe)
Bases:
ParseFatalException
just like ParseFatalException, but thrown internally when an ErrorStop indicates that parsing is to stop immediately because an unbacktrackable syntax error has been found
- loc
- msg
- parserElement
- pstr
- class bx_extras.pyparsing.ParserElement(savelist=False)
Bases:
object
Abstract base level parser element class.
- DEFAULT_WHITE_CHARS = ' \n\t\r'
- addParseAction(*fns, **kwargs)
Add parse action to expression’s list of parse actions. See L{I{setParseAction}<setParseAction>}.
- checkRecursion(parseElementList)
- copy()
Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing pattern, using copies of the original parse element.
- static enablePackrat()
Enables “packrat” parsing, which adds memoizing to the parsing logic. Repeated parse attempts at the same string location (which happens often in many complex grammars) can immediately return a cached value, instead of re-executing parsing/validating code. Memoizing is done of both valid results and parsing exceptions.
This speedup may break existing programs that use parse actions that have side-effects. For this reason, packrat parsing is disabled when you first import pyparsing. To activate the packrat feature, your program must call the class method ParserElement.enablePackrat(). If your program uses psyco to “compile as you go”, you must call enablePackrat before calling psyco.full(). If you do not do this, Python will crash. For best results, call enablePackrat() immediately after importing pyparsing.
- getException()
- ignore(other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.
- leaveWhitespace()
Disables the skipping of whitespace before matching the characters in the ParserElement’s defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.
- parseFile(file_or_filename)
Execute the parse expression on the given file or filename. If a filename is specified (instead of a file object), the entire file is opened, read, and closed before parsing.
- parseImpl(instring, loc, doActions=True)
- parseString(instring, parseAll=False)
Execute the parse expression with the given string. This is the main interface to the client code, once the complete expression has been built.
If you want the grammar to require that the entire input string be successfully parsed, then set parseAll to True (equivalent to ending the grammar with StringEnd()).
Note: parseString implicitly calls expandtabs() on the input string, in order to report proper column numbers in parse actions. If the input string contains tabs and the grammar uses parse actions that use the loc argument to index into the string being parsed, you can ensure you have a consistent view of the input string by:
calling parseWithTabs on your grammar before calling parseString (see L{I{parseWithTabs}<parseWithTabs>})
define your parse action using the full (s,loc,toks) signature, and reference the input string using the parse action’s s argument
explictly expand the tabs in your input string before calling parseString
- parseWithTabs()
Overrides default behavior to expand <TAB>s to spaces before parsing the input string. Must be called before parseString when the input grammar contains elements that match <TAB> characters.
- postParse(instring, loc, tokenlist)
- preParse(instring, loc)
- static resetCache()
- scanString(instring, maxMatches=9223372036854775807)
Scan the input string for expression matches. Each match will return the matching tokens, start location, and end location. May be called with optional maxMatches argument, to clip scanning after ‘n’ matches are found.
Note that the start and end locations are reported relative to the string being parsed. See L{I{parseString}<parseString>} for more information on parsing strings with embedded tabs.
- searchString(instring, maxMatches=9223372036854775807)
Another extension to scanString, simplifying the access to the tokens found to match the given parse expression. May be called with optional maxMatches argument, to clip searching after ‘n’ matches are found.
- setBreak(breakFlag=True)
Method to invoke the Python pdb debugger when this element is about to be parsed. Set breakFlag to True to enable, False to disable.
- setDebug(flag=True)
Enable display of debugging messages while doing pattern matching. Set flag to True to enable, False to disable.
- setDebugActions(startAction, successAction, exceptionAction)
Enable display of debugging messages while doing pattern matching.
- static setDefaultWhitespaceChars(chars)
Overrides the default whitespace chars
- setFailAction(fn)
Define action to perform if parsing fails at this expression. Fail acton fn is a callable function that takes the arguments fn(s,loc,expr,err) where:
s = string being parsed
loc = location where expression match was attempted and failed
expr = the parse expression that failed
err = the exception thrown
The function returns no value. It may throw ParseFatalException if it is desired to stop parsing immediately.
- setName(name)
Define name for this expression, for use in debugging.
- setParseAction(*fns, **kwargs)
Define action to perform when successfully matching parse element definition. Parse action fn is a callable method with 0-3 arguments, called as fn(s,loc,toks), fn(loc,toks), fn(toks), or just fn(), where:
s = the original string being parsed (see note below)
loc = the location of the matching substring
toks = a list of the matched tokens, packaged as a ParseResults object
If the functions in fns modify the tokens, they can return them as the return value from fn, and the modified list of tokens will replace the original. Otherwise, fn does not need to return any value.
Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See L{I{parseString}<parseString>} for more information on parsing strings containing <TAB>s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.
- setResultsName(name, listAllMatches=False)
Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.
- setWhitespaceChars(chars)
Overrides the default whitespace chars
- streamline()
- suppress()
Suppresses the output of this ParserElement; useful to keep punctuation from cluttering up returned output.
- transformString(instring)
Extension to scanString, to modify matching text with modified tokens that may be returned from a parse action. To use transformString, define a grammar and attach a parse action to it that modifies the returned token list. Invoking transformString() on a target string will then scan for matches, and replace the matched text patterns according to the logic in the parse action. transformString() returns the resulting transformed string.
- tryParse(instring, loc)
- validate(validateTrace=None)
Check defined expressions for valid structure, check for infinite recursive definitions.
- class bx_extras.pyparsing.QuotedString(quoteChar, escChar=None, escQuote=None, multiline=False, unquoteResults=True, endQuoteChar=None)
Bases:
Token
Token for matching strings that are delimited by quoting characters.
- parseImpl(instring, loc, doActions=True)
- exception bx_extras.pyparsing.RecursiveGrammarException(parseElementList)
Bases:
Exception
exception thrown by validate() if the grammar could be improperly recursive
- class bx_extras.pyparsing.Regex(pattern, flags=0)
Bases:
Token
Token for matching strings that match a given regular expression. Defined with string specifying the regular expression in a form recognized by the inbuilt Python re module.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.SkipTo(other, include=False, ignore=None)
Bases:
ParseElementEnhance
Token for skipping over all undefined text until the matched expression is found. If include is set to true, the matched expression is also consumed. The ignore argument is used to define grammars (typically quoted strings and comments) that might contain false matches.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.StringEnd
Bases:
_PositionToken
Matches if current position is at the end of the parse string
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.StringStart
Bases:
_PositionToken
Matches if current position is at the beginning of the parse string
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.Suppress(expr, savelist=False)
Bases:
TokenConverter
Converter for ignoring the results of a parsed expression.
- postParse(instring, loc, tokenlist)
- suppress()
Suppresses the output of this ParserElement; useful to keep punctuation from cluttering up returned output.
- class bx_extras.pyparsing.Token
Bases:
ParserElement
Abstract ParserElement subclass, for defining atomic matching patterns.
- setName(name)
Define name for this expression, for use in debugging.
- class bx_extras.pyparsing.TokenConverter(expr, savelist=False)
Bases:
ParseElementEnhance
Abstract subclass of ParseExpression, for converting parsed results.
- class bx_extras.pyparsing.Upcase(*args)
Bases:
TokenConverter
Converter to upper case all matching tokens.
- postParse(instring, loc, tokenlist)
- class bx_extras.pyparsing.White(ws=' \t\r\n', min=1, max=0, exact=0)
Bases:
Token
Special matching class for matching whitespace. Normally, whitespace is ignored by pyparsing grammars. This class is included when some whitespace structures are significant. Define with a string containing the whitespace characters to be matched; default is “ tn”. Also takes optional min, max, and exact arguments, as defined for the Word class.
- parseImpl(instring, loc, doActions=True)
- whiteStrs = {'\t': '<TAB>', '\n': '<LF>', '\x0c': '<FF>', '\r': '<CR>', ' ': '<SPC>'}
- class bx_extras.pyparsing.Word(initChars, bodyChars=None, min=1, max=0, exact=0, asKeyword=False)
Bases:
Token
Token for matching words composed of allowed character sets. Defined with string containing all allowed initial characters, an optional string containing allowed body characters (if omitted, defaults to the initial character set), and an optional minimum, maximum, and/or exact length. The default value for min is 1 (a minimum value < 1 is not valid); the default values for max and exact are 0, meaning no maximum or exact length restriction.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.WordEnd(wordChars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
Bases:
_PositionToken
Matches if the current position is at the end of a Word, and is not followed by any character in a given set of wordChars (default=printables). To emulate the behavior of regular expressions, use WordEnd(alphanums). WordEnd will also match at the end of the string being parsed, or at the end of a line.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.WordStart(wordChars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
Bases:
_PositionToken
Matches if the current position is at the beginning of a Word, and is not preceded by any character in a given set of wordChars (default=printables). To emulate the behavior of regular expressions, use WordStart(alphanums). WordStart will also match at the beginning of the string being parsed, or at the beginning of a line.
- parseImpl(instring, loc, doActions=True)
- class bx_extras.pyparsing.ZeroOrMore(expr)
Bases:
ParseElementEnhance
Optional repetition of zero or more of the given expression.
- parseImpl(instring, loc, doActions=True)
- setResultsName(name, listAllMatches=False)
Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a copy of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.
- bx_extras.pyparsing.col(loc, strg)
Returns current column within a string, counting newlines as line separators. The first column is number 1.
Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See L{I{ParserElement.parseString}<ParserElement.parseString>} for more information on parsing strings containing <TAB>s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.
- bx_extras.pyparsing.countedArray(expr)
Helper to define a counted list of expressions. This helper defines a pattern of the form:
integer expr expr expr...
where the leading integer tells how many expr expressions follow. The matched tokens returns the array of expr tokens as a list - the leading count token is suppressed.
- bx_extras.pyparsing.delimitedList(expr, delim=',', combine=False)
Helper to define a delimited list of expressions - the delimiter defaults to ‘,’. By default, the list elements and delimiters can have intervening whitespace, and comments, but this can be overridden by passing ‘combine=True’ in the constructor. If combine is set to True, the matching tokens are returned as a single token string, with the delimiters included; otherwise, the matching tokens are returned as a list of tokens, with the delimiters suppressed.
- bx_extras.pyparsing.dictOf(key, value)
Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value. Takes care of defining the Dict, ZeroOrMore, and Group tokens in the proper order. The key pattern can include delimiting markers or punctuation, as long as they are suppressed, thereby leaving the significant key text. The value pattern can include named results, so that the Dict results can include named token fields.
- bx_extras.pyparsing.downcaseTokens(s, l, t)
Helper parse action to convert tokens to lower case.
- bx_extras.pyparsing.getTokensEndLoc()
Method to be called from within a parse action to determine the end location of the parsed tokens.
- bx_extras.pyparsing.indentedBlock(blockStatementExpr, indentStack, indent=True)
Helper method for defining space-delimited indentation blocks, such as those used to define block statements in Python source code.
- Parameters:
- blockStatementExpr - expression defining syntax of statement that
is repeated within the indented block
- indentStack - list created by caller to manage indentation stack
(multiple statementWithIndentedBlock expressions within a single grammar should share a common indentStack)
- indent - boolean indicating whether block must be indented beyond the
the current level; set to False for block of left-most statements (default=True)
A valid block must contain at least one blockStatement.
- bx_extras.pyparsing.keepOriginalText(s, startLoc, t)
Helper parse action to preserve original parsed text, overriding any nested parse actions.
- bx_extras.pyparsing.line(loc, strg)
Returns the line of text containing loc within a string, counting newlines as line separators.
- bx_extras.pyparsing.lineno(loc, strg)
Returns current line number within a string, counting newlines as line separators. The first line is number 1.
Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See L{I{ParserElement.parseString}<ParserElement.parseString>} for more information on parsing strings containing <TAB>s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.
- bx_extras.pyparsing.makeHTMLTags(tagStr)
Helper to construct opening and closing tag expressions for HTML, given a tag name
- bx_extras.pyparsing.makeXMLTags(tagStr)
Helper to construct opening and closing tag expressions for XML, given a tag name
- bx_extras.pyparsing.matchOnlyAtCol(n)
Helper method for defining parse actions that require matching at a specific column in the input text.
- bx_extras.pyparsing.matchPreviousExpr(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a ‘repeat’ of a previous expression. For example:
first = Word(nums) second = matchPreviousExpr(first) matchExpr = first + ":" + second
will match “1:1”, but not “1:2”. Because this matches by expressions, will not match the leading “1:1” in “1:10”; the expressions are evaluated first, and then compared, so “1” is compared with “10”. Do not use with packrat parsing enabled.
- bx_extras.pyparsing.matchPreviousLiteral(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a ‘repeat’ of a previous expression. For example:
first = Word(nums) second = matchPreviousLiteral(first) matchExpr = first + ":" + second
will match “1:1”, but not “1:2”. Because this matches a previous literal, will also match the leading “1:1” in “1:10”. If this is not desired, use matchPreviousExpr. Do not use with packrat parsing enabled.
- bx_extras.pyparsing.nestedExpr(opener='(', closer=')', content=None, ignoreExpr=quotedString using single or double quotes)
Helper method for defining nested lists enclosed in opening and closing delimiters (“(” and “)” are the default).
- Parameters:
opener - opening character for a nested list (default=”(“); can also be a pyparsing expression
closer - closing character for a nested list (default=”)”); can also be a pyparsing expression
content - expression for items within the nested lists (default=None)
ignoreExpr - expression for ignoring opening and closing delimiters (default=quotedString)
If an expression is not provided for the content argument, the nested expression will capture all whitespace-delimited content between delimiters as a list of separate values.
Use the ignoreExpr argument to define expressions that may contain opening or closing characters that should not be treated as opening or closing characters for nesting, such as quotedString or a comment expression. Specify multiple expressions using an Or or MatchFirst. The default is quotedString, but if no expressions are to be ignored, then pass None for this argument.
- bx_extras.pyparsing.nullDebugAction(*args)
‘Do-nothing’ debug action, to suppress debugging output during parsing.
- bx_extras.pyparsing.oneOf(strs, caseless=False, useRegex=True)
Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.
- Parameters:
strs - a string of space-delimited literals, or a list of string literals
caseless - (default=False) - treat all literals as caseless
useRegex - (default=True) - as an optimization, will generate a Regex object; otherwise, will generate a MatchFirst object (if caseless=True, or if creating a Regex raises an exception)
- bx_extras.pyparsing.operatorPrecedence(baseExpr, opList)
Helper method for constructing grammars of expressions made up of operators working in a precedence hierarchy. Operators may be unary or binary, left- or right-associative. Parse actions can also be attached to operator expressions.
- Parameters:
baseExpr - expression representing the most basic element for the nested
opList - list of tuples, one for each operator precedence level in the expression grammar; each tuple is of the form (opExpr, numTerms, rightLeftAssoc, parseAction), where:
- opExpr is the pyparsing expression for the operator;
may also be a string, which will be converted to a Literal; if numTerms is 3, opExpr is a tuple of two expressions, for the two operators separating the 3 terms
- numTerms is the number of terms for this operator (must
be 1, 2, or 3)
- rightLeftAssoc is the indicator whether the operator is
right or left associative, using the pyparsing-defined constants opAssoc.RIGHT and opAssoc.LEFT.
- parseAction is the parse action to be associated with
expressions matching this operator expression (the parse action tuple member may be omitted)
- bx_extras.pyparsing.removeQuotes(s, l, t)
Helper parse action for removing quotation marks from parsed quoted strings. To use, add this parse action to quoted string using:
quotedString.setParseAction( removeQuotes )
- bx_extras.pyparsing.replaceHTMLEntity(t)
- bx_extras.pyparsing.replaceWith(replStr)
Helper method for common parse actions that simply return a literal value. Especially useful when used with transformString().
- bx_extras.pyparsing.srange(s)
Helper to easily define string ranges for use in Word construction. Borrows syntax from regexp ‘[]’ string range definitions:
srange("[0-9]") -> "0123456789" srange("[a-z]") -> "abcdefghijklmnopqrstuvwxyz" srange("[a-z$_]") -> "abcdefghijklmnopqrstuvwxyz$_"
The input string must be enclosed in []’s, and the returned string is the expanded character set joined into a single string. The values enclosed in the []’s may be:
a single character an escaped character with a leading backslash (such as \- or \]) an escaped hex character with a leading '\0x' (\0x21, which is a '!' character) an escaped octal character with a leading '\0' (\041, which is a '!' character) a range of any of the above, separated by a dash ('a-z', etc.) any combination of the above ('aeiouy', 'a-zA-Z0-9_$', etc.)
- bx_extras.pyparsing.traceParseAction(f)
Decorator for debugging parse actions.
- bx_extras.pyparsing.upcaseTokens(s, l, t)
Helper parse action to convert tokens to upper case.
- bx_extras.pyparsing.withAttribute(*args, **attrDict)
Helper to create a validating parse action to be used with start tags created with makeXMLTags or makeHTMLTags. Use withAttribute to qualify a starting tag with a required attribute value, to avoid false matches on common tags such as <TD> or <DIV>.
Call withAttribute with a series of attribute names and values. Specify the list of filter attributes names and values as:
keyword arguments, as in (class=”Customer”,align=”right”), or
a list of name-value tuples, as in ( (“ns1:class”, “Customer”), (“ns2:align”,”right”) )
For attribute names with a namespace prefix, you must use the second form. Attribute names are matched insensitive to upper/lower case.
To verify that the attribute exists, but without specifying a value, pass withAttribute.ANY_VALUE as the value.