Skip to content

[Syntax] support nul character as garbage text trivia in libSyntax #14962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 5, 2018

Conversation

omochi
Copy link
Contributor

@omochi omochi commented Mar 4, 2018

In previously, Lexer does not handle nul character in lexTrivia function.
So libSyntax lose information about nul character and can not achieve round trip translation.

With this PR, I update lexTrivia and libSyntax handles nul character as garbage text trivia.

I split my work into 2 commits.

In first commit, I introduce Lexer::NulCharacterKind enum to refactor Lexer code.
There are 3 types of meaning about nul character in Lexer.

  • Embedded nul character
  • CodeCompletion marker
  • String buffer terminator

We need to handle correctly and carefully these cases when see nul character in Lexer.
To clear this task and prevent to forget about this, this enum is useful.

How do you think?

In second commit, I update lexTrivia.
This commit achieve main purpose of this PR.
And add test cases to check round trip of source file with nul, trivia construction, diagnostics about embedded nul character.

A new test case Syntax/tokens_nul.swift covers all consideration of Syntax/lexer_invalid_nul.swift which is added by me in previous PR #14954
So I removed this.

@@ -461,6 +461,9 @@ class Lexer {
};

private:
/// nul character meaning kind
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by nit: Please capitalize and punctuate comments.

@omochi omochi force-pushed the syntax-nul-trivia branch from e0ff301 to 5dc1f47 Compare March 4, 2018 06:53
@omochi
Copy link
Contributor Author

omochi commented Mar 4, 2018

@xwu Thanks, I updated.

@@ -1857,6 +1869,16 @@ bool Lexer::tryLexConflictMarker(bool EatNewline) {
return false;
}

Lexer::NulCharacterKind Lexer::getNulCharacterKind(const char *Ptr) const {
assert(Ptr != nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add && *Ptr == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good idea. I will fix it.

/// There are 3 kinds of meaning about nul character in Lexer.
/// 1. BufferEnd is string buffer terminator.
/// 2. Embedded is embedded nul character.
/// 3. CodeCompletion is code completion marker.
enum class NulCharacterKind { BufferEnd, Embedded, CodeCompletion };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this:

/// Nul character meaning kind.
enum class NulCharacterKind {
  /// String buffer terminator.
  BufferEnd,
  /// Embedded is embedded nul character.
  Embedded,
...

numbered list (1,2,3) might be confusable with step-by-step procedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, I will fix it.

@omochi
Copy link
Contributor Author

omochi commented Mar 4, 2018

Ah, I mistook to apply changes to correct commit.
I will fix history again.

@omochi omochi force-pushed the syntax-nul-trivia branch from 0f0fbfa to 71a13b9 Compare March 4, 2018 13:21
@omochi
Copy link
Contributor Author

omochi commented Mar 4, 2018

Ok, please review it.

diagnoseEmbeddedNul(Diags, CurPtr-1);
LLVM_FALLTHROUGH;
case NulCharacterKind::CodeCompletion:
goto LoopStart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be replaced with a continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks.
I overlooked continue in switch is bound to outer while not to switch what is different from break.
Interesting...
I will fix it so.

diagnoseEmbeddedNul(Diags, CurPtr - 1);
LLVM_FALLTHROUGH;
case NulCharacterKind::CodeCompletion:
goto LoopStart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@omochi omochi force-pushed the syntax-nul-trivia branch from 71a13b9 to ecaa097 Compare March 4, 2018 23:46
@omochi
Copy link
Contributor Author

omochi commented Mar 4, 2018

I updated in response to review.

@@ -353,8 +353,13 @@ void Lexer::skipToEndOfLine(bool EatNewline) {
case 0:
// If this is a random nul character in the middle of a buffer, skip it as
// whitespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be moved into case NulCharacterKind::Embedded: branch.

LLVM_FALLTHROUGH;
case NulCharacterKind::CodeCompletion:
continue;
case NulCharacterKind::BufferEnd:
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the task in the case of NulCharacterKind::BufferEnd into this branch? i.e.

case NulCharacterKind::BufferEnd:
  // The last line of the file does not have a newline.
  --CurPtr;
  return;
}

LLVM_FALLTHROUGH;
case NulCharacterKind::CodeCompletion:
continue;
case NulCharacterKind::BufferEnd:
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

diagnoseEmbeddedNul(Diags, CurPtr-1);
goto Restart;
case NulCharacterKind::BufferEnd:
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@omochi omochi force-pushed the syntax-nul-trivia branch from ecaa097 to 7d3aab1 Compare March 5, 2018 05:41
@omochi
Copy link
Contributor Author

omochi commented Mar 5, 2018

I updated.
@rintaro Thanks, I intended to reduce diff at first time. But I agree your review.

LLVM_FALLTHROUGH;
case NulCharacterKind::CodeCompletion:
continue;
case NulCharacterKind::BufferEnd:
Copy link
Member

@rintaro rintaro Mar 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last one request from me.
Please enclose this particular branch with braces case NulCharacterKind::BufferEnd: { ... }. Lacking braces make us difficult to add another case (or re-order cases).

@omochi omochi force-pushed the syntax-nul-trivia branch from 7d3aab1 to 190af6c Compare March 5, 2018 07:53
@omochi
Copy link
Contributor Author

omochi commented Mar 5, 2018

updated 🙂

@rintaro
Copy link
Member

rintaro commented Mar 5, 2018

@swift-ci Please smoke test

@rintaro rintaro merged commit db47cb1 into swiftlang:master Mar 5, 2018
@omochi omochi deleted the syntax-nul-trivia branch March 7, 2018 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants