Implements #2383 Add syntax modes for FreeMarker template language #2847

blutorange · 2021-12-21T22:04:20Z

Woah, test cases got really large, much larger than the implementation...

Adds syntax highlighting support for Apache FreeMarker, resolves #2383. Includes tests and samples for the website.

Language ID is freemarker2 since FreeMarker 3 will most likely change the syntax
FreeMarker actually defines 6 slightly different syntaxes. Therefore this commit adds the 6 language modes freemarker2.tag-*.interpolation-*. It also adds the mode freemarker2, which is an alias for freemarker2.tag-angle.interpolation-dollar, the default mode when using FreeMarker via the Java API.
If anybody has a better suggestion for naming the modes, feel free to suggest it.

(from the source code comment)

The grammar for FreeMarker 2.x. This tokenizer is intentionally limited to FreeMarker 2 as the next release FreeMarker 3 is a breaking change that will change the syntax, see:
https://cwiki.apache.org/confluence/display/FREEMARKER/FreeMarker+3

FreeMarker does not just have one grammar, it has 6 (!) different syntaxes.

3 possibilities for the tag syntax: angle, bracket, auto
2 possibilities for the interpolation syntax: dollar, bracket

These can be combined, resulting in 3*2=6 syntaxes. There's another tag syntax, but that one is legacy and therefore ignored by this tokenizer.

Angle tag syntax is like <#if true>...</#if>
Bracket tag syntax is like [#if true]...[/#if]
Auto tag syntax inspects the first directive and uses that.

Dollar interpolation syntax is like ${1+2}, bracket syntax like [=1+2].

To prevent duplicate code, there are factory functions that take a syntax mode and dynamically create the tokenizer for that mode. This does not affect performance since the tokenizer is created only once.

Auto mode is implemented via parser states. Each parser state exists three times, one for each tag syntax mode (e.g. default.auto, default.angle, default.bracket). Auto mode starts in default.auto and switches to default.angle or default.bracket when it encounters the first directive.

FreeMarker allows expressions within strings (a${1+2}b), but these are impossible to tokenize. String interpolation is not implemented via a mode change when encountering ${. Rather, FreeMarker tokenizes the string as a literal string first. Then, during the AST build phase, it creates a new parses and parses the unescaped string content.

This is adapted from the official JavaCC grammar for FreeMarker: https://github.com/apache/freemarker/blob/2.3-gae/src/main/javacc/FTL.jj

Taken from the above file, a short rundown of the basic parser states:

The lexer portion defines 5 lexical states:
DEFAULT, FM_EXPRESSION, IN_PAREN, NO_PARSE, and EXPRESSION_COMMENT.
The DEFAULT state is when you are parsing
text but are not inside a FreeMarker expression.
FM_EXPRESSION is the state you are in
when the parser wants a FreeMarker expression.
IN_PAREN is almost identical really. The difference
is that you are in this state when you are within
FreeMarker expression and also within (...).
This is a necessary subtlety because the
">" and ">=" symbols can only be used
within parentheses because otherwise, it would
be ambiguous with the end of a directive.
So, for example, you enter the FM_EXPRESSION state
right after a ${ and leave it after the matching }.
Or, you enter the FM_EXPRESSION state right after
an "<if" and then, when you hit the matching ">"
that ends the if directive,
you go back to DEFAULT lexical state.
If, within the FM_EXPRESSION state, you enter a
parenthetical expression, you enter the IN_PAREN
state.
Note that whitespace is ignored in the
FM_EXPRESSION and IN_PAREN states
but is passed through to the parser as PCDATA in the DEFAULT state.
NO_PARSE and EXPRESSION_COMMENT are extremely simple
lexical states. NO_PARSE is when you are in a comment
block and EXPRESSION_COMMENT is when you are in a comment
that is within an FTL expression.

It should be noted that there are another parser state not mentioned in the above excerpt: NO_DIRECTIVE is used as the initial starting state when parsing the contents of a string literal, which is allowed to contain interpolations, but no directives. However, note that FreeMarker first tokenizes a string literal as-is, then during the parsing stage, it takes the (unescaped) content of the string literal, and tokenizes + parses that content with a new child parser.

…nguage

melloware · 2021-12-21T22:51:31Z

This is fantastic. I have been testing it with my real world FreeMarker templates and its working exactly the way I would expect it to!

FlipWarthog · 2021-12-21T23:04:44Z

Holy cow, this is amazing. Well done, @blutorange !

hediet · 2022-01-03T09:47:11Z

Thank you for this PR! It seems like you put a lot of effort into this.

Generally, it looks really good. However, I did not review every single line of those added ~25k lines of code, especially not the tests. At ten lines per second, a "proper" review would take more than 40 minutes.

@alexdima what do you think how we should proceed here?

alexdima · 2022-01-14T08:47:21Z

This looks very good and thorough, thank you very much!

Implements microsoft#2383 Add syntax modes for FreeMarker template la…

94f81dc

…nguage

melloware approved these changes Dec 21, 2021

View reviewed changes

FlipWarthog approved these changes Dec 21, 2021

View reviewed changes

alexdima merged commit 93c7165 into microsoft:main Jan 14, 2022

alexdima added this to the January 2022 milestone Jan 14, 2022

github-actions bot locked and limited conversation to collaborators Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements #2383 Add syntax modes for FreeMarker template language #2847

Implements #2383 Add syntax modes for FreeMarker template language #2847

blutorange commented Dec 21, 2021 •

edited

Loading

melloware commented Dec 21, 2021

FlipWarthog commented Dec 21, 2021

hediet commented Jan 3, 2022

alexdima commented Jan 14, 2022

Implements #2383 Add syntax modes for FreeMarker template language #2847

Implements #2383 Add syntax modes for FreeMarker template language #2847

Conversation

blutorange commented Dec 21, 2021 • edited Loading

melloware commented Dec 21, 2021

FlipWarthog commented Dec 21, 2021

hediet commented Jan 3, 2022

alexdima commented Jan 14, 2022

blutorange commented Dec 21, 2021 •

edited

Loading