The bracket matching problem deals with parentheses ()
, square brackets []
, braces {}
and other syntactic elements that provide for nested syntactic structures with balanced sets of delimiters.
To tackle bracket matching with Parabix methods, a key concept is that of a Nesting Depth BixNum. This is a representation of the nesting depth of syntactic elements at each position in an input stream. The following example shows an input stream with square brackets and braces, the corresponding nesting depth of each element in the data stream and the nesting depth bixnum (ND) as a set of 3 parallel bit streams (giving a 3-bit number at each character position).
Bracket Stream ...[..{.}..{...[.[]..]..}...]...
Nesting Depth 00011122211222233443332221111000
ND BixNum[2] .................11.............
ND BixNum[1] ......111..111111..111111.......
ND BixNum[0] ...111...11....11..111...1111...
NestingDepth Kernel
The NestingDepth kernel is a Parabix kernel (to be implemented) that can compute the nesting
depth of characters in an input stream, given two character class bit streams
L
identifying all the left or opening delimiters and R
identifying all the right or closing delimiters. For our example bracket stream above, L
and R
are shown below.
Bracket Stream ...[..{.}..{...[.[]..]..}...]...
L ...1..1....1...1.1..............
R ........1.........1..1..1...1...
The NestingDepth Kernel also takes a parameter 'MaxDepth` identifying the maximum nesting depth expected for an input source. The number of bit streams in the NestingDepth BixNum must be equal to ceillog2(MaxDepth + 1).
Syntax Validation Using Nesting Depth
Consider the problem of validating JSON array syntax (where whitespace may be freely included between tokens):
array ::= '[' [value {',' value}] ']'
object ::= '{' string : [value {',' value}] '}'
value ::= object | array | numeral | string | 'true' | 'false' | 'null'
Assume that valueToken
is a bit stream that marks the end position of
any legal JSON numeral, string or keyword, while anyToken
is a bit stream
marking the end position of any legal or illegal token.
Assume also that LBrak
, RBrak
, and Comma
are streams marking the
position of JSON [
, ]
, and ,
tokens respectively.
Suppose that a BixNum ND
has been computed as the nesting depth involving
the left and right delimiter sets consisting of square brackets and braces.
Then the validation of array syntax at each nesting depth d
can be
determined as follows.
Each array will be a span from an opening '[' to its closing ']'. Between
these brackets will be other tokens as well as nested arrays and objects.
However, nested arrays and objects can be easily computed as those spans
of elements having a nesting depth greater than d. Using a BixNum compiler
bnc
, the spans may be computed as bnc.UGT(ND, d)
. Tokens at depth d
can be identified as those at positions identified by the stream bnc.EQ(ND, d)
.
The following code can be used to validate all arrays at depth d
,
determining whether there are any errors after the opening LBrak or
after any value or Comma.
atDepth = bnc.EQ(ND, d)
nested = bnc.UGT(ND, d)
arrayStart = atDepth & LBrak
arrayEnd = ScanThru(arrayStart, nested, atDepth & ~ (RBrak | RBrace))
errorAtEnd = arrayEnd & RBrace
arraySpan = ExclusiveSpan(arrayStart, arrayEnd)
// Now validate that every value or nested item is followed
// either by a comma or a the end RBrak.
afterNested = Advance(nested & arraySpan, 1) & atDepth
afterToken = Advance(valueToken & arraySpan, 1)
tokenNext = ScanThru(afterNested | afterToken, whitespace)
errAfterValue = tokenNext &~(Comma | RBrak)
//
// Every Comma must be followed by a value
errAfterComma = ScanTo(Advance(Comma & arraySpan, 1), anyToken) & ~ (nested | valueToken)
// After the LBrak we must have either a value or an RBrak.
errAfterLBrak = ScanTo(Advance(arrayStart, 1), anyToken) & ~ (nested | valueToken | RBrak)