The JSnobol Class

SNOBOL pattern matching in Java is implemented as a Java Class, called JSnobol. It has a lot of public methods, some of which you use a lot, and others you seldom use.

Here is a screen-shot from my Eclipse Java development environment, showing the public members of the JSnobol class (please pardon my custom colors):


Basically, there are three methods of the class that you use a lot:

Most Commonly-Used Methods

setPattern(name, patternText)

The first (String) parameter, is the name of the pattern (available in the JSnobol pattern-matching context). The second (String) parameter is the text of the pattern. The method returns a boolean value indicating if the pattern was successfully compiled (true = success, false = failure).

This method is similar to a SNOBOL statement, such as:

  pattern = 'this' 'is' 'a' ('good' | 'bad') 'pattern'

In the above example, “pattern” (to the left of the '=' sign) is the pattern name, and the text to the right of the '=' sign is the pattern text.

Once the setPattern method has been called, defining a pattern and its name, that pattern can be referenced by its name in subsequent patterns (also defined by a setPattern call).

The pattern is compiled into its own version of byte-codes (different from Java byte-codes).

This method does not throw an exception if it reports failure to compile. Information on the cause (and place) of the error can be gotten using the getErrorMessage() and getErrorOffset() methods. The JSnobolApp IDE does that for you.

match(stringToMatch, patternName)

This method attempts to match the string text (specified in the first parameter), using the pattern name specified in the 2nd parameter. It returns a boolean value (true = success, false = failure) indicating the success or failure of the pattern-match.

If there is an error detected (which is different from failure to match), an Exception is thrown, with an error message describing the problem.

This method is similar to a SNOBOL statement like:

  'text to be matched' pattern

match(stringBufferToMatch, patternName, replacementString)

This method performs a match-and-replace operation, matching the 1st parameter's text, using the 2nd parameter's pattern name, replacing the matched text with the 3rd parameter String text.

Unlike the match method, the string to be matched must be specified as a StringBuffer (rather than a string).

It returns a boolean value (true = success, false = failure) indicating the success or failure of the pattern-match.

If there is an error detected (which is different from failure to match), an Exception is thrown, with an error message describing the problem.

This method is similar to a SNOBOL statement like:

  'text to be matched' pattern 'replace matched-text with this'

Less Commonly-Used Methods

m_Variables

You can reference all of the variables created/maniplated during pattern-matching as a Hashtable<String, String>. The key (the variable name, as a String), references its value (also a String). Your Java program can also enter variables into this Hashtable.

JSnobol()

This is the constructor for creating an instance of the JSnobol class.

dump()

This method returns a String, which is a dump of the compiled pattern.

getErrorMessage()

When a pattern fails to compile (setPattern returns a boolean value of false), this method is used to find out the cause of the error. The error message is returned as a String.

getErrorOffset()

When a pattern fails to compile (setPattern returns a boolean value of false), this method is used to find out where (in the source pattern) the error occurred. The error offset is returned as an int value.

getVariable(variableName)

This method is used to obtain the value of the variable (whose name is passed as a String). It returns a String value. If the variable name was not given a value during pattern-matching, NULL (a zero-length String) is returned.

setAbend(isSet)

This method is used to set (value = 1) or clear (value = 0) the &ABEND keyword. The boolean argument sets it if true, or clears it if false. It returns the boolean value of its argument.

setAnchor(isSet)

This method is used to set (value = 1) or clear (value = 0) the &ANCHOR keyword. The boolean argument sets it if true, or clears it if false. It returns the boolean value of its argument.

setCaseSensitive(isSet)

This method is used to set (value = 1) or clear (value = 0) the &CASESENSITIVE keyword. The boolean argument sets it if true, or clears it if false. It returns the boolean value of its argument.

setFullscan(isSet)

This method is used to set (value = 1) or clear (value = 0) the &FULLSCAN keyword. The boolean argument sets it if true, or clears it if false. It returns the boolean value of its argument. Only fullscan-mode is currently supported.

setMaxLength(maxStringLength)

This method is used to set the maximum string length, as specified by its argument (an int value). It returns the int value of its argument. This corresponds to the &STRLEN keyword.

setStLimit(maxStatements)

This method is used to set the maximum number of pattern-nodes to be executed in pattern-matching, as specified by its argument (an int value). It returns the int value of its argument. This corresponds to the &STLIMIT keyword.

setVariable(variableName, stringValue)

This method is used to set the pattern-match variable whose name is specified in the first parameter (as a String), to the String value of the second parameter. It returns no value (void). The variable set with this method can be referenced during pattern-matching.

toString()

This method returns the de-compiled patterns so-far defined in the JSnobol object, as a String.

Extensions To The SNOBOL Language

There are a few extensions to the SNOBOL language, which make things easier:

Text Specification

Non-space text, within patterns being defined, need not be specified with quotes surrounding it. If such text is the name of a pattern, a reference to that pattern will be generated. If it is the name of a variable (a value has been assigned to it) the value of the variable will be used. Otherwise, the text itself will be used.

This doesn't apply to text that affects the syntax of the language, which must always be enclosed in quotation marks.

If you don't want text interpreted as the name of a pattern (or the name of a variable), simply enclose it in quotation marks, and it will be treated simply as a literal string.

As with SNOBOL, strings may be delimited either using single-quotes ('), or double-quotes (“).

Defining Sub-Patterns

Anywhere within a pattern being specified, a sub-pattern can easily be defined.

Where you want the sub-pattern to begin, specify the sub-pattern's name, followed immediately by a colon (:), followed by a space. The sub-pattern can thereafter be referenced by its name, and consists of everything from its name to the end of the current parenthetical level (or the end of the pattern itself).

For example, in the pattern:

(Sep: ("\n" | " " | RPOS(0)))

(LetterCombinations: ((be | bea | bear) (ds | d) | (ro | roo | roos) (ts | t)))

(ArbnoPat: (ARBNO(LetterCombinations $ OUTPUT $ image ?(imageNum = imageNum + 1) Sep)))

The sub-patterns “Sep”, “LetterCombinations”, and “ArbnoPat” are created, and can be used by name within the pattern-matching environment.

Upper And Lower Case

Though in the SNOBOL compiler I used (on a Univac 1100 computer) was strictly upper-case, as well as was documented in my SNOBOL 4 Programming Manual, everything was strictly upper-case, upper and lower-case pattern-matching is now supported. The keyword “&CASESENSITIVE” has been defined to control case-sensitive (value 1) matching, or case-insensitive (value 0) matching.

Although it is not required (keywords and SNOBOL primitive functions can be either upper, or lower-case) it is good practice (and style) to specify SNOBOL primitive functions and keywords as upper-case.

Assignment Statements Within Patterns

Snobol-style assignment statements can be included in patterns, such as the “?(imageNum = imageNum + 1)” evaluated expression in the example above. Evaluated expressions are run in concatenation-mode, as opposed to match-mode, though the success of the evaluated statement can affect the pattern-match.

Comments Within Patterns

In a nut-shell, comments within JSnobol pattern statements are the same a comments within a Java program.

You can enter C-style comments within your patterns. These comments will be passed-over by the compiler (as white-space), and don't take up space in the byte-codes.

The introducer (beginning-delimiter) of a comment is a forward-slash, followed by an asterisk (/*). Everything following that delimiter is ignored by the compiler, until the ending-delimiter (an asterisk, followed by a forward-slash (*/)) is encountered. The delimiters, and everything between them, is treated as white-space by the compiler.

For example, here is a comment within a pattern:

(PhraseStart: (over | under | beside | with | into | /*out of |*/ away from) /* beginnings of common phrases */)

Here is an example of a multi-line comment within a pattern:

/*

* VARIABLES USED FOR SELECTING SEGMENTS, FIELDS, SUBFIELDS, SUB-SUBFIELDS, AND REPETITIONS:

*

* segmentID: The name of the desired HL7 segment. Example ?(segmentID = "NK1")

* segNum: The instance of that segment in the message (1 to n) Example: ?(segNum = 2)

* fieldNum: The field number in the segment (1 to n). Example ?(fieldNum = 2)

* repNum: The repetition # of the fieldNum field (1 to n). Example ?(repNum = 1)

* subfieldNum: The subfield number of the subfield in the field (1 to n). Example: ?(subfieldNum = 2)

* subSubfieldNum: The sub-subfield number of the sub-subfield (1 to n). Example: ?(subSubfieldNum = 1)

*/

You can also designate an entire line (or remainder of a line) as a comment by using a double forward-slash (//), as in the following example:

// Pattern to initialize, then get the HL7 encoding characters:

(HL7Chars: (

InitializeChars (GetChars | SUCCEED)

))

Current Limitations/Differences

There are a few limitations in the JSnobol pattern matching:

(Back To Help Index)