SNOBOL Pattern-Matching In Java

The SNOBOL language was developed at Bell Labs in the 1960's. Its statement syntax, though unique, is based on labels, and GOTO's, which is something best left in the 60's.

Its pattern-matching capabilities, on the other hand, far surpass the Regular Expressions, used for pattern-matching in modern languages, and are a lot more programmer-friendly than regular expressions.

This ability of the SNOBOL language is best illustrated by the fact that the artificial-intelligence program (Heuristic Analysis of Language), available from this web-site, in its first, simpler version, was implemented using the SNOBOL language, and its source-code was only a page and a half long!

Being used to SNOBOL pattern-matching, and having to use Java regular-expressions to do pattern-matching, the task seemed needlessly difficult.

Wouldn't it be nice, I thought, if I could have the full generality of SNOBOL pattern-matching within an object-oriented, Java application?

Having already developed the compiler (developed to accomplish earlier tasks), I began to seriously consider making this possible, and in 2005, adapted not only the compiler, but developed the pattern-matching code, and even an Integrated Development Environment (IDE) for coding and testing SNOBOL patterns.

That IDE also generates the Java source-code statements for putting it into a Java program.

I demonstrated this software (which I call JSnobol) when I worked for Intermountain HealthCare, and after that, the software languished for several years.

Now that I am making software I developed available on the Internet, I took this project up again at the end of 2015. The result of that more recent work appears on this web-site.

What Is JSnobol

The heart of JSnobol is a class named “JSnobol”, which contains the vast majority of the code. It is available in binary-form, as a JAR-file, which can be included into your Java projects build-path. It is also available in source-code form (terms negotiated), by sending an e-mail to:

aere@aeresrealm.com

The JSnobol class is more thoroughly documented starting at the web-page link below:

The JSnobol Class

The class was created using information from my manual:

The Snobol4 Programming Language, by R.E. Griswold, J.F. Poage, and I.P. Polonsky, printing number 13-815357-4. This is not the most recent version existing of that manual.

Since you need to know something about the class in order to develop JSnobol patterns, it is more briefly documented here.

Here is a screen-shot from my Eclipse Java development environment, showing the public members of the JSnobol class (please pardon my custom colors):


Basically, there are three methods of the class that you use a lot:

setPattern(name, patternText)

The first (String) parameter, is the name of the pattern (available in the JSnobol pattern-matching context). The second (String) parameter is the text of the pattern. The method returns a boolean value indicating if the pattern was successfully compiled (true = success, false = failure).

This method is similar to a SNOBOL statement, such as:

 pattern = 'this' 'is' 'a' ('good' | 'bad') 'pattern'

In the above example, “pattern” (to the left of the '=' sign) is the pattern name, and the text to the right of the '=' sign is the pattern text.

Once the setPattern method has been called, defining a pattern and its name, that pattern can be referenced by its name in subsequent patterns (also defined by a setPattern call).

The pattern is compiled into its own version of byte-codes (different from Java byte-codes).

This method does not throw an exception if it reports failure to compile. Information on the cause (and place) of the error can be gotten using the getErrorMessage() and getErrorOffset() methods. The JSnobolApp IDE does that for you.

match(stringToMatch, patternName)

This method attempts to match the string text (specified in the first parameter), using the pattern name specified in the 2nd parameter. It returns a boolean value (true = success, false = failure) indicating the success or failure of the pattern-match.

If there is an error detected (which is different from failure to match), an Exception is thrown, with an error message describing the problem.

This method is similar to a SNOBOL statement like:

 'text to be matched' pattern

match(stringBufferToMatch, patternName, replacementString)

This method performs a match-and-replace operation, matching the 1st parameter's text, using the 2nd parameter's pattern name, replacing the matched text with the 3rd parameter String text.

Unlike the match method, the string to be matched must be specified as a StringBuffer (rather than a string).

It returns a boolean value (true = success, false = failure) indicating the success or failure of the pattern-match.

If there is an error detected (which is different from failure to match), an Exception is thrown, with an error message describing the problem.

This method is similar to a SNOBOL statement like:

'text to be matched' pattern 'replace matched-text with this'

Additional Methods

In addition to the three methods detailed above, there are methods for getting the values of variables created during the pattern-match, and methods for setting variables to be used in the pattern-matching to be done.

There are also methods for getting detailed error information, and also for setting keyword parameters, as in the SNOBOL statement:

 &ANCHOR = 0

Since upper-case-only went away in the 60's, there are keywords added for controlling whether or not the match is case-sensitive.

Of course, there is a zero-parameter constructor for the class.

Extensions To The SNOBOL Language

There are a few extensions to the SNOBOL language, which make things easier:

Text Specification

Non-space text, within patterns being defined, need not be specified with quotes surrounding it. If such text is the name of a pattern, a reference to that pattern will be generated. If it is the name of a variable (a value has been assigned to it) the value of the variable will be used. Otherwise, the text itself will be used.

This doesn't apply to text that affects the syntax of the language, which must always be enclosed in quotation marks.

If you don't want text interpreted as the name of a pattern (or the name of a variable), simply enclose it in quotation marks, and it will be treated simply as a literal string.

As with SNOBOL, strings may be delimited either using single-quotes ('), or double-quotes (“).

Defining Sub-Patterns

Anywhere within a pattern being specified, a sub-pattern can easily be defined.

Where you want the sub-pattern to begin, specify the sub-pattern's name, followed immediately by a colon (:), followed by a space. The sub-pattern can thereafter be referenced by its name, and consists of everything from its name to the end of the current parenthetical level (or the end of the pattern itself).

For example, in the pattern:

(Sep: ("\n" | " " | RPOS(0)))

(LetterCombinations: ((be | bea | bear) (ds | d) | (ro | roo | roos) (ts | t)))

(ArbnoPat: (ARBNO(LetterCombinations $ OUTPUT $ image ?(imageNum = imageNum + 1) Sep)))

The sub-patterns “Sep”, “LetterCombinations”, and “ArbnoPat” are created, and can be used by name within the pattern-matching environment.

Upper And Lower Case

Though in the SNOBOL compiler I used (on a Univac 1100 computer) was strictly upper-case, as well as was documented in my SNOBOL 4 Programming Manual, everything was strictly upper-case, upper and lower-case pattern-matching is now supported. The keyword “&CASESENSITIVE” has been defined to control case-sensitive (value 1) matching, or case-insensitive (value 0) matching.

Although it is not required (keywords and SNOBOL primitive functions can be either upper, or lower-case) it is good practice (and style) to specify SNOBOL primitive functions and keywords as upper-case.

Assignment Statements Within Patterns

Snobol-style assignment statements can be included in patterns, such as the “?(imageNum = imageNum + 1)” evaluated expression in the example above. Evaluated expressions are run in concatenation-mode, as opposed to match-mode, though the success of the evaluated statement can affect the pattern-match.

The JSnobolApp Application

The Integrated Development Environment for developing and testing JSnobol patterns, is called “JSnobolApp”, and is a Java application making use of the JSnobol class.

This application allows you to code and test JSnobol patterns in their original syntax, without worrying about supplying the various escape-sequences required within Java strings.

The IDE actually generates the Java code for you, which can then be cut-and-pasted into your Java program.

The application uses (and asks permission to initially create) a folder called “JSnobolGUI”, within your home folder. Everything you develop using the application will be stored in that folder, unless you browse to someplace else to put it, or get it from.

The following links (in the order shown) will tell you of the functionality and details of each pane of the Integrated Development Environment:

Loading A Project

The Main Patterns Pane

The Sub-Patterns Pane

The Patterns Start/Finish Pane

The Parameters Pane

The Run/Execute Pane

The Debug Pane

The Java Code Pane



(Back To JSnobol Index)