-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Syntax] support nul character as garbage text trivia in libSyntax #14962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
include/swift/Parse/Lexer.h
Outdated
@@ -461,6 +461,9 @@ class Lexer { | |||
}; | |||
|
|||
private: | |||
/// nul character meaning kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by nit: Please capitalize and punctuate comments.
e0ff301
to
5dc1f47
Compare
@xwu Thanks, I updated. |
lib/Parse/Lexer.cpp
Outdated
@@ -1857,6 +1869,16 @@ bool Lexer::tryLexConflictMarker(bool EatNewline) { | |||
return false; | |||
} | |||
|
|||
Lexer::NulCharacterKind Lexer::getNulCharacterKind(const char *Ptr) const { | |||
assert(Ptr != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add && *Ptr == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good idea. I will fix it.
include/swift/Parse/Lexer.h
Outdated
/// There are 3 kinds of meaning about nul character in Lexer. | ||
/// 1. BufferEnd is string buffer terminator. | ||
/// 2. Embedded is embedded nul character. | ||
/// 3. CodeCompletion is code completion marker. | ||
enum class NulCharacterKind { BufferEnd, Embedded, CodeCompletion }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this:
/// Nul character meaning kind.
enum class NulCharacterKind {
/// String buffer terminator.
BufferEnd,
/// Embedded is embedded nul character.
Embedded,
...
numbered list (1,2,3) might be confusable with step-by-step procedure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey, I will fix it.
5dc1f47
to
0f0fbfa
Compare
Ah, I mistook to apply changes to correct commit. |
0f0fbfa
to
71a13b9
Compare
Ok, please review it. |
lib/Parse/Lexer.cpp
Outdated
diagnoseEmbeddedNul(Diags, CurPtr-1); | ||
LLVM_FALLTHROUGH; | ||
case NulCharacterKind::CodeCompletion: | ||
goto LoopStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be replaced with a continue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks.
I overlooked continue
in switch
is bound to outer while
not to switch
what is different from break
.
Interesting...
I will fix it so.
lib/Parse/Lexer.cpp
Outdated
diagnoseEmbeddedNul(Diags, CurPtr - 1); | ||
LLVM_FALLTHROUGH; | ||
case NulCharacterKind::CodeCompletion: | ||
goto LoopStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
71a13b9
to
ecaa097
Compare
I updated in response to review. |
lib/Parse/Lexer.cpp
Outdated
@@ -353,8 +353,13 @@ void Lexer::skipToEndOfLine(bool EatNewline) { | |||
case 0: | |||
// If this is a random nul character in the middle of a buffer, skip it as | |||
// whitespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be moved into case NulCharacterKind::Embedded:
branch.
lib/Parse/Lexer.cpp
Outdated
LLVM_FALLTHROUGH; | ||
case NulCharacterKind::CodeCompletion: | ||
continue; | ||
case NulCharacterKind::BufferEnd: | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the task in the case of NulCharacterKind::BufferEnd
into this branch? i.e.
case NulCharacterKind::BufferEnd:
// The last line of the file does not have a newline.
--CurPtr;
return;
}
lib/Parse/Lexer.cpp
Outdated
LLVM_FALLTHROUGH; | ||
case NulCharacterKind::CodeCompletion: | ||
continue; | ||
case NulCharacterKind::BufferEnd: | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
lib/Parse/Lexer.cpp
Outdated
diagnoseEmbeddedNul(Diags, CurPtr-1); | ||
goto Restart; | ||
case NulCharacterKind::BufferEnd: | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
ecaa097
to
7d3aab1
Compare
I updated. |
lib/Parse/Lexer.cpp
Outdated
LLVM_FALLTHROUGH; | ||
case NulCharacterKind::CodeCompletion: | ||
continue; | ||
case NulCharacterKind::BufferEnd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last one request from me.
Please enclose this particular branch with braces case NulCharacterKind::BufferEnd: { ... }
. Lacking braces make us difficult to add another case
(or re-order case
s).
7d3aab1
to
190af6c
Compare
updated 🙂 |
@swift-ci Please smoke test |
In previously, Lexer does not handle nul character in
lexTrivia
function.So libSyntax lose information about nul character and can not achieve round trip translation.
With this PR, I update
lexTrivia
and libSyntax handles nul character as garbage text trivia.I split my work into 2 commits.
In first commit, I introduce
Lexer::NulCharacterKind
enum to refactor Lexer code.There are 3 types of meaning about nul character in Lexer.
We need to handle correctly and carefully these cases when see nul character in Lexer.
To clear this task and prevent to forget about this, this enum is useful.
How do you think?
In second commit, I update
lexTrivia
.This commit achieve main purpose of this PR.
And add test cases to check round trip of source file with nul, trivia construction, diagnostics about embedded nul character.
A new test case
Syntax/tokens_nul.swift
covers all consideration ofSyntax/lexer_invalid_nul.swift
which is added by me in previous PR #14954So I removed this.