Compilers, Python

Getting Started with ANTLR for Python

ANTLR is a compiler writing tool, similar Lex/Yacc or Flex/Bison but much more capable, modern, and generally less frustrating. I am currently reading through The Definitive ANTLR 4 Reference by ANTLR’s creator, Terence Parr. It’s a wonderful reasource on the workings and usage of ANTLR, but it’s written in Java – one of my least favorite languages. Thankfully ANTLR targets multiple languages, so I figured I’d follow along in Python – also one of my least favorite languages.

ANTLR itself is written in Java, so regardles of which language you’ll be writing in (Python in this case) you’ll need to have Java installed, the ANTLR Jar file downloaded, and a Java Classpath setup in your shell environment. Assuming you’re using Fish, that looks something like this:

With that out of the way, we’re pretty much done with Java. We can move on to more Python specific steps, and the first one is to install the Python ANTLR runtime. That’s as simple as running pip install:

The final bit of setup is to configure an alias for invoking the ANTLR tool itself. While this is purely optional, I find that it saves a lot of typing. This is similar to the script on page 5 of the the ANTLR book, but tailored for Python.

Note: Be sure to chmod +x the above script and place it somewhere within your path.

Now we can get back to enjoying learning about ANTLR. For the most part following along with the book is a straight forward translation process, with the Java class names referenced in the book being very similar (if not identical) to the ones exposed by the Python ANTLR framework.

For example, here is a sample “main” file for invoking your grammar:

You simple swap out “YourLexer” and “YourParser” for the appropriate names of your lexer and parser.

Even the creation of visitors and listeners is pretty much a 1:1 translation of their Java examples from the book. For example, a blueprint for a listener might look like:

It’s actually very nice that the convetions used by the Java runtime are so very similar to those used by the Python one. I suspect this is intentional, as to reduce the amount of work needed to add new language targets, and reduce the complexity of their documentation.

However, all of this has a downside.

Despite language that implies to the contary in the ANTLR book itself, ANTLR grammars are not language agnostic. It is entirely possible to constract an ANTLR grammar that cannot be compiled as-is to any number of target languages. In fact, we run into this situation rather early in the book on page 43 where we’re exposed to the listener pattern.

The grammar Java.g4 does not work when targeting Python, ANTLR generates the following errors:

That’s a lot of errors!

Let’s pull out the first actual error and look a little closer.

It is complaining that the rule “type” is causing us problems. To better understand why, let’s look at the applicable part of the grammar:

And also the Java parser generted by this bit of grammar:

And, if it’s not 100% clear, let’s look at the Java code we’d need to port to Python to utilize this grammar:

In essence, the grammar defines a rule called “type”, but “type” is a reserved word in Python. ANTLR itself then produces a parser that literally defines a function called “type”, which apparently is fine in Java, but fails wonderfully in Python.

This problem could have easily been avoided by using some common prefix for rules, or having rules be something looked up in a dictionary. Instead, by taking literal rule names and expecting that each one of them can be defined as a function of the exact same name in the target language, ANTLR is, by definition, not language agnostic. For any grammar, there are likely multiple real or hypothetical backends that such a grammar cannot be compiled for.

For example, here is a grammar that works perfectly fine in Python, but cannot be compiled for Java:

Sadly, this github issue implies that this is also a known issue by the author, and one they don’t consider worth fixing.

While many, if not most, grammars are likely not an issue, it is rather sad that:

  1. Such an issue occurs rather early in the book
  2. The book highly implies, and other resources on the Internet downright say, that ANTLR is language agnostic – no it’s not
  3. The author doesn’t seem inclined to more broadly correct either this bug, or this misconception about his tool.

That aside, ANTLR is actually a wonderful tool, especially if you go into it with knowledge and awareness of its warts. I’m excited to see what wonderful things I can create with this amazing tool.

Never miss an article! Subscribe to my newsletter and I'll keep you updated with the latest content.


About Jason

Jason is an experienced entrepreneur & software developer skilled in leadership, mobile development, data synchronization, and SaaS architecture. He earned his Bachelor of Science (B.S.) in Computer Science from Arkansas State University.
View all posts by Jason →

Leave a Reply

Your email address will not be published. Required fields are marked *