-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New manpage format #1921
base: gh-pages
Are you sure you want to change the base?
New manpage format #1921
Conversation
I just triggered a pair of workflow runs to update the manual pages and to update the translated manual pages, fetched the result and rendered it locally. Here are two examples:
Personally, I cannot spot any difference, apart from the version number (because this here PR branch is based on v2.46.2 while the updated manual pages include v2.47.0) and the incorrect Even looking at the HTML of the synopses (taking the French version, so that there is a known difference), I only see this: diff --git a/before b/after
index 1a87d1348..6185fb72b 100644
--- a/before
+++ b/after
@@ -1,5 +1,5 @@
<pre class="content"><em>git config list</em> [<option-de-fichier>] [<option-d-affichage>] [--includes]
-<em>git config get</em> [<option-de-fichier>] [<option-d-affichage>] [--includes] [--all] [--regexp=<regexp>] [--value=<valeur>] [--fixed-value] [--default=<default>] <nom>
+<em>git config get</em> [<option-de-fichier>] [<option-d-affichage>] [--includes] [--all] [--regexp] [--value=<valeur>] [--fixed-value] [--default=<default>] <nom>
<em>git config set</em> [<option-de-fichier>] [--type=<type>] [--all] [--value=<valeur>] [--fixed-value] <nom> <valeur>
<em>git config unset</em> [<option-de-fichier>] [--all] [--value=<valeur>] [--fixed-value] <nom> <valeur>
<em>git config rename-section</em> [<option-de-fichier>] <ancien-name> <nouveau-name> @jnavila what am I missing? |
The manpage of git-config has not been converted yet. After importing, here is the result: I'm not satisfied with the styles, particularly when dealing with inline formats: you can test by yourself locally, and tell me your judgment. |
The new style makes the code spans lighter and more integrated into the text. The new style also makes the code spans more readable and less intrusive. Signed-off-by: Jean-Noël Avila <[email protected]>
3cb6e5e
to
9bd765a
Compare
@dscho I updated the CSS, so it is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnavila I've added a few questions. Thanks for this contribution.
script/asciidoctor-extensions.rb
Outdated
|
||
def process parent, reader, attrs | ||
outlines = reader.lines.map do |l| | ||
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a line of comment wouldn't hurt with these regexes. Maybe best with an example:
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') | |
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') # wrap ellipsis in backticks: ...something => `...`something |
I think the intended use is for [...<more>]
? Should we include the [
and ]
in the regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is trying to differentiate the three dots in different contexts, where they have a different meaning and require different formatting.
First there is the form <commit1>...<commit2> when describing a range of commits, where the three dots are a "keyword" understood by git and must be formatted as code.
Then there is the forms used in the grammar to express repetition, such as in "<path> ..." with optionally square brackets, such as "[<path>...]" which usually appear at the end of the command line. These three dots must not be formatted as code, but left as is.
This line matches the former case and forces the corresponding format. I'll add a comment in the same line as yours.
script/asciidoctor-extensions.rb
Outdated
def process parent, reader, attrs | ||
outlines = reader.lines.map do |l| | ||
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') | ||
.gsub(%r{([\[\] |()>]|^)([-a-zA-Z0-9:+=~@,/_^\$]+)}, '\1{empty}`\2`{empty}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't know what this one is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is the line that matches all the words which are not placeholders and not grammar signs, and format them as code. These words (in the general sense here) are keywords (option names, enum strings, two or three dot notation, etc).
script/asciidoctor-extensions.rb
Outdated
outlines = reader.lines.map do |l| | ||
l.gsub(/(\.\.\.?)([^\]$.])/, '`\1`\2') | ||
.gsub(%r{([\[\] |()>]|^)([-a-zA-Z0-9:+=~@,/_^\$]+)}, '\1{empty}`\2`{empty}') | ||
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '__\\1__') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to dig deep to find what [[:word:]]
does, but it seems to be a Ruby non-POSIX bracket expression: https://docs.ruby-lang.org/en/master/Regexp.html#class-Regexp-label-POSIX+Bracket+Expressions. Personally I'm not a fan, what's the advantage over \w
?
Also why are the inner brackets round brackets?
I just wonder if we can simplify to:
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '__\\1__') | |
.gsub(/(<[^>]+>)/, '__\\1__') |
And one more question, why the double backslash in the replacement string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The '\w` is for ascii, but here, we are going to process internationalized texts (because placeholders are translated), and this processing requires the special form with double brackets. I'm not an expert in Ruby regexes; this is the form I have found to work well with the translations.
As for the using a more generic regex (expecting everything between brackets to be the placeholder's name), the placeholder's names are not supposed to contain spaces, which is perfect when we have to match something like:
$ git foo < in-file > out-file
script/asciidoctor-extensions.rb
Outdated
if node.type == :monospaced | ||
node.text.gsub(/(\.\.\.?)([^\]$.])/, '<code>\1</code>\2') | ||
.gsub(%r{([\[\s|()>.]|^|\]|>)(\.?([-a-zA-Z0-9:+=~@,/_^\$]+\.{0,2})+)}, '\1<code>\2</code>') | ||
.gsub(/(<([[:word:]]|[-0-9.])+>)/, '<em>\1</em>') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we more or less need to repeat the regexes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate, but the two regex are very alike, except that this one processes the text after some pre-processing steps, and the transformations need to be in final result form (with tags and escaped characters).
I evaluated the opportunity for factorization, but it makes the code more messy than it is already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really uneasy with this large amount of hard-to-understand regular expressions. Not only makes this bugs easy to hide, it also inadvertently opens the door to DoS attacks. Here is an example where something like this has had a really high impact.
It would probably make much more sense to implement a StringScanner
-based parser that is much easier to reason about and whose performance is well-understood, e.g. following this tutorial.
These regexes are basically the same ones that I already pushed to git/git. They are applied during the conversion phase, not in live, and only on the quoted strings of text, after the initial asciidoc parsing has been performed. Anyway, as they are cryptic, I can try to convert the code to a parser, but I doubt this will be a lot clearer. |
Writing a parser turns out to be more involving than expected, mainly because the new parser must be resilient to formatting mistakes in the translations (particularly in Chinese) and manpages that were not converted to the new format (where the backticks are used for widely varying strings). |
A parser would at least document a lot more cohesively what is being done. As they are, I know what the regular expressions do, but only after studying them extensively, an effort that I would most likely have to repeat were I in the need to fix any bugs in that code in the future. I am extremely uncomfortable with that, source code should be as obvious as possible, and this is not it. As such, I will not review this any further and not merge it. I won't oppose anybody else merging it, but I want nothing to do with the added code, myself. |
This commit adds a upcoming manpage format to the AsciiDoc backend. The new format changes are: * The synopsis is now a section with a dedicated style. This "synopsis" style allows to automatically format the keywords as monospaced and <placeholders> as italic. * the backticks are now used to format synopsis-like syntax in inline elements. The parsing of synopsis is done with a new AsciiDoc extension that makes use of the PEG parser parslet. All the asciidoc manpages sources are processed with this extension. It may upset the formatting for older manpages, making it not consistent across a page, but this will be a mild side effect, as this was not really consistent before. Signed-off-by: Jean-Noël Avila <[email protected]>
9bd765a
to
87aab82
Compare
I've pushed an updated version with a parser. In case the parser fails, we just roll back to the basis code format. |
@dscho Is this version better for you? |
@jnavila I am sorry, I had not planned any involvement in this anymore, due to time constraints. |
Ah cool, I just realize you're trying to fix #1972 here. |
@jnavila I've noticed we have a synopsis processor at https://github.com/git/git/blob/master/Documentation/asciidoctor-extensions.rb.in. Why did you chose to write it yourself? I rather not maintain 2 versions. I've used that extension locally (without the postprocessor) and this is how the synopsis on git-clone looks: Or on git-diff-index: I feel it's very much a "drop-in" extension we can/should use. |
In fact, I wrote both of the extensions.. 😆 . The rewrite here is a request from @dscho which is very convincing to me. The synopsis is a language in itself, so it is more comprehensive to have a dedicated parser for this language instead of a bunch of repelling regexps. This rewrite came after I proposed the regexp version for inclusion in git/git . Now, I'm contemplating reverting the git/git version to the parser flavor, and at the same time removing this ugly pre-processor hack by a38edab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. This sure is a lot more verbose, but vastly easier to understand than the regular expression-based solution. Thank you for putting in the work.
I have a couple of questions/suggestions below, and also the request: Please rebase to current gh-pages
(resolving the merge conflict in Gemfile
, too). Thank you!
rule(:space) { match('[\s\t\n ]').repeat(1) } | ||
rule(:space?) { space.maybe } | ||
rule(:keyword) { match('[-a-zA-Z0-9:+=~@,\./_\^\$\'"\*%!{}#]').repeat(1) } | ||
rule(:placeholder) { str('<') >> match('[[:word:]]|-').repeat(1) >> str('>') } | ||
rule(:opt_or_alt) { match('[\[\] |()]') >> space? } | ||
rule(:ellipsis) { str('...') >> match('\]|$').present? } | ||
rule(:grammar) { opt_or_alt | ellipsis } | ||
rule(:ignore) { match('[\'`]') } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that I was quite thrown by the syntax, especially the >>
one. https://kschiess.github.io/parslet/parser.html to the rescue (which we might want to link to, in a code comment).
Narrators voice: The >>
indicates a "simple sequence", for example str('...') >> match('\]|$').present?
means "first match three periods, then ensure that they are either followed by a closing bracket or they are at the end.
rule(:ignore) { match('[\'`]') } | ||
|
||
rule(:token) do | ||
grammar.as(:grammar) | placeholder.as(:placeholder) | space.as(:grammar) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be space.as(:space)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar and space are left unchanged, so I put them in the same bag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean grammar
is handled by this line and this line by leaving the original text unchanged?
It still might be clearer to introduce a corresponding rule(space: simple(:space)) { space.to_s }
line (and to let SynopsisQuoteToHtml5
inherit from SynopsisQuoteToAdoc
, overriding only keyword
and placeholder
).
rule(:space) { match('[\s\t\n ]').repeat(1) } | ||
rule(:space?) { space.maybe } | ||
rule(:keyword) { match('[-a-zA-Z0-9:+=~@,\./_\^\$\'"\*%!{}#]').repeat(1) } | ||
rule(:placeholder) { str('<') >> match('[[:word:]]|-').repeat(1) >> str('>') } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only difference to AdocSynopsisQuote
, why not use class EscapedSynopsisQuote <AdocSynopsisQuote
and override just this rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I haven't tried to use inherit between my classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it will work, otherwise I'd have caused you a lot of effort for nothing in return.
rule(:ignore) { match('[\'`]') } | ||
|
||
rule(:token) do | ||
grammar.as(:grammar) | placeholder.as(:placeholder) | space.as(:grammar) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, should this be space.as(:space)
?
@jnavila I'm sorry for dropping in completely ignorant!
I haven't looked at it in detail yet, but at first sight that makes sense.
Do you mean git/git@a38edab ? If you mean the hack to fill in That said, can we deduplicate having a synopsis parser here and one in git/git.git ? |
@jnavila Without thinking about how to achieve this, how do you prefer the end result will look like? In a quick mockup I've ended up with the following: Is that something you'd like? |
That would be great 😃. The parser cannot make a difference between the "git commit" part and the options, so they would have the same style ( bold red… for instance) |
Couldn't the parser learn that Git commands start with |
Changes
This PR changes the way the asciidoc source of manpage is processed, by adding the "synopsis" paragraph style and reworking the backtick format.
Context
The style change has been pushed to master and will be applied to git-clone and git-init in the next version.