Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a StandardizeDocumentationComments rule #959

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

natecook1000
Copy link
Member

@natecook1000 natecook1000 commented Mar 7, 2025

This adds a formatting rule that rewraps and restructures documentation comments to be in a consistent order and format. Other comments are left unchanged. The standardization enforces the following rules:

  • All documentation comments are rendered as ///-prefixed.
  • Documentation comments are re-wrapped to the preferred line length.
  • The order of elements in a standardized documentation comment is:
    • Abstract
    • Discussion w/ paragraphs, code samples, lists, etc.
    • Parameter docs (outlined if > 1)
    • Return value docs
    • Throwing docs

The change needs more tests, especially for parameters with rich documentation (since most of that will get dropped on the floor right now). There are also some slight issues in the way swiftlang/swift-markdown does line-wrapping, particularly around inline code. I've opened a fix for those issues here: swiftlang/swift-markdown#215

In addition to the tests, you can see the result of this rule on ArgumentParser in this branch:
https://github.com/apple/swift-argument-parser/compare/standardized-docs

This adds a formatting rule that rewraps and restructures documentation
comments to be in a consistent order and format. Other comments are left
unchanged.
Copy link
Member

@allevato allevato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so good. 😍

I'll wait to do a deeper review when things are more settled, but in terms of direction, this is exactly the kind of behavior I wanted to see.

// different for different node types (e.g. an accessor has a `body`, while an
// actor has a `memberBlock`).

public override func visit(_ node: AccessorDeclSyntax) -> DeclSyntax {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the Swift compiler or DocC process comments on accessor independently of those
on the parent property/subscript?

Likewise for some of the other declarations included here, like deinit and extension.

I suppose it doesn't hurt to include them here so we shouldn't necessarily exclude them, but I'm more curious if there are generally user expectations around these specific ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, not sure! I just grabbed the list of decls from... here. I'll try it out and see what happens – we could probably just include nominal declarations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and the other non-documentable declaration types 👍🏻

This behavior matches DocC, which treats all outline list items
within a parameter's documentation as plain markup, rather than
as special fields.
@natecook1000 natecook1000 marked this pull request as ready for review March 10, 2025 14:31
@natecook1000 natecook1000 marked this pull request as draft March 10, 2025 15:12
Instead of parsing nested parameter documentation differently,
which can have source-breaking effects, capture all the unfiltered
body nodes when initializing a `DocumentationComment` from markup.
This lets us choose whether to use the filtered body (at the top level
of a doc comment) or the full list of body nodes (within a parameter's
nested documentation).
@natecook1000 natecook1000 marked this pull request as ready for review March 10, 2025 17:24
@natecook1000 natecook1000 requested a review from allevato March 10, 2025 17:24
@natecook1000
Copy link
Member Author

Potential controversies in documentation formatting:

  • All indented code blocks are converted to fenced. On the plus side, this means that all code blocks have consistent demarcation, whether with a language specifier or not, and you have four more characters of width to work with. On the minus side, the text may feel more cluttered for the vast majority of code blocks that use the default language (Swift).

  • Reference-style links (e.g. [link title][url] + a later [url]: https://...) are converted to inline links. This is a challenge for line wrapping, since URLs so frequently push a line over (or way over) a line limit. Fixing this to allow (or to push toward) reference-style links would require augmenting swift-markdown; cmark doesn't provide API for distinguishing between different link types.

  • Lists and code blocks always have a leading line break. Many doc comments transition straight from a paragraph to a list or code block. This is allowed by the Markdown spec, but the output always has an empty line.

@allevato
Copy link
Member

Potential controversies in documentation formatting:

I think these are fine; after all, the rule is named Standardize. Of those you listed, the first and last don't bother me. Not being able to distinguish between inline and reference links isn't ideal, but if we're limited by the underlying library, there's probably not much we can do.


Another issue just occurred to me, which is a little trickier to solve. The structure of the formatter is such that whitespace/indentation isn't handled until the pretty-printing phase, after all the syntax walking rules have done their operations. That means that the calculations you make in that rule may not be the correct ones. Consider this example:

struct S {
/// Some doc comment
func f() {
}
}

This will reflow the doc comment based on the difference between the line width and its leading trivia, which gives is two more spaces than it should have. When the pretty printer indents that function and its comment, it could exceed the line width, and then formatting it again would fix it.

This means that if we want it to be correct on the first format, rendering the doc comment back to source would have to happen in the pretty printer, once we know what the actual indentation is going to be. We could do everything but rewrapping in the rule that you have now, with the drawback being that we'd end up parsing every doc comment twice (once in the rule pipeline to standardize the Markdown structure, then again to get the Markdown doc that we re-render). To avoid that, we could do both in the pretty printer instead.

Fortunately, the way the pretty printer works is that the TokenStreamCreator collects all adjacent docLine comments into a single Comment token, so it might not be too difficult to take the logic you've already implemented and move it into the print method where you're given access to the expected indentation. (We'd need to pass in the line length as well, but that's a reasonable change to make.)

At that point in the code, we don't have access to what kind of node the doc comment was attached to. We could augment the Comment type to carry that info, or we could just say that anything that looks like it's meant to be a doc comment is going to be standardized, no matter what it's attached to. I can see arguments for both sides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants