-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix buffer_position()
after resuming parsing after IllFormed
errors
#689
Conversation
Dead code analysis should be able to remove this implementation from the final binary
…d::unmatched_end_tag2` in reader-errors.rs
…ltaneously I found that that way it is more easy to debug tests
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #689 +/- ##
==========================================
- Coverage 65.47% 65.06% -0.42%
==========================================
Files 38 38
Lines 18025 17942 -83
==========================================
- Hits 11802 11674 -128
- Misses 6223 6268 +45
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Ignore that, I hit the button early by mistake. |
tests/reader-errors.rs
Outdated
@@ -14,23 +14,27 @@ macro_rules! ok { | |||
fn borrowed() { | |||
let mut reader = Reader::from_str($xml); | |||
reader.config_mut().enable_all_checks(true); | |||
assert_eq!(reader.read_event().unwrap(), $event); | |||
assert_eq!(reader.read_event().unwrap(), $event, "Reader"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just split it up into individual tests under a mod reader
and mod ns_reader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should do that. Will rewrite
I still have one more commit to review |
…module That tests would panic after next commit which will fix behavior for new tests that will be added soon, because they calls internal method in a state in which Reader will never be in normal circumstances
…cial meaning in a markup failures (18): syntax::doctype::unclosed04::async_tokio syntax::doctype::unclosed04::borrowed syntax::doctype::unclosed04::buffered syntax::doctype::unclosed07::async_tokio syntax::doctype::unclosed07::borrowed syntax::doctype::unclosed07::buffered syntax::doctype::unclosed10::async_tokio syntax::doctype::unclosed10::borrowed syntax::doctype::unclosed10::buffered syntax::doctype::unclosed13::async_tokio syntax::doctype::unclosed13::borrowed syntax::doctype::unclosed13::buffered syntax::doctype::unclosed16::async_tokio syntax::doctype::unclosed16::borrowed syntax::doctype::unclosed16::buffered syntax::doctype::unclosed19::async_tokio syntax::doctype::unclosed19::borrowed syntax::doctype::unclosed19::buffered
…<!DOCTYPE` sequences All tests fixed
failures (66): ill_formed::double_hyphen_in_comment2::ns_reader::async_tokio ill_formed::double_hyphen_in_comment2::ns_reader::borrowed ill_formed::double_hyphen_in_comment2::ns_reader::buffered ill_formed::double_hyphen_in_comment2::reader::async_tokio ill_formed::double_hyphen_in_comment2::reader::borrowed ill_formed::double_hyphen_in_comment2::reader::buffered ill_formed::double_hyphen_in_comment3::ns_reader::async_tokio ill_formed::double_hyphen_in_comment3::ns_reader::borrowed ill_formed::double_hyphen_in_comment3::ns_reader::buffered ill_formed::double_hyphen_in_comment3::reader::async_tokio ill_formed::double_hyphen_in_comment3::reader::borrowed ill_formed::double_hyphen_in_comment3::reader::buffered ill_formed::double_hyphen_in_comment4::ns_reader::async_tokio ill_formed::double_hyphen_in_comment4::ns_reader::borrowed ill_formed::double_hyphen_in_comment4::ns_reader::buffered ill_formed::double_hyphen_in_comment4::reader::async_tokio ill_formed::double_hyphen_in_comment4::reader::borrowed ill_formed::double_hyphen_in_comment4::reader::buffered ill_formed::mismatched_end_tag2::ns_reader::async_tokio ill_formed::mismatched_end_tag2::ns_reader::borrowed ill_formed::mismatched_end_tag2::ns_reader::buffered ill_formed::mismatched_end_tag2::reader::async_tokio ill_formed::mismatched_end_tag2::reader::borrowed ill_formed::mismatched_end_tag2::reader::buffered ill_formed::mismatched_end_tag3::ns_reader::async_tokio ill_formed::mismatched_end_tag3::ns_reader::borrowed ill_formed::mismatched_end_tag3::ns_reader::buffered ill_formed::mismatched_end_tag3::reader::async_tokio ill_formed::mismatched_end_tag3::reader::borrowed ill_formed::mismatched_end_tag3::reader::buffered ill_formed::mismatched_end_tag4::ns_reader::async_tokio ill_formed::mismatched_end_tag4::ns_reader::borrowed ill_formed::mismatched_end_tag4::ns_reader::buffered ill_formed::mismatched_end_tag4::reader::async_tokio ill_formed::mismatched_end_tag4::reader::borrowed ill_formed::mismatched_end_tag4::reader::buffered ill_formed::missing_doctype_name1::ns_reader::async_tokio ill_formed::missing_doctype_name1::ns_reader::borrowed ill_formed::missing_doctype_name1::ns_reader::buffered ill_formed::missing_doctype_name1::reader::async_tokio ill_formed::missing_doctype_name1::reader::borrowed ill_formed::missing_doctype_name1::reader::buffered ill_formed::missing_doctype_name2::ns_reader::async_tokio ill_formed::missing_doctype_name2::ns_reader::borrowed ill_formed::missing_doctype_name2::ns_reader::buffered ill_formed::missing_doctype_name2::reader::async_tokio ill_formed::missing_doctype_name2::reader::borrowed ill_formed::missing_doctype_name2::reader::buffered ill_formed::unmatched_end_tag1::ns_reader::async_tokio ill_formed::unmatched_end_tag1::ns_reader::borrowed ill_formed::unmatched_end_tag1::ns_reader::buffered ill_formed::unmatched_end_tag1::reader::async_tokio ill_formed::unmatched_end_tag1::reader::borrowed ill_formed::unmatched_end_tag1::reader::buffered ill_formed::unmatched_end_tag2::ns_reader::async_tokio ill_formed::unmatched_end_tag2::ns_reader::borrowed ill_formed::unmatched_end_tag2::ns_reader::buffered ill_formed::unmatched_end_tag2::reader::async_tokio ill_formed::unmatched_end_tag2::reader::borrowed ill_formed::unmatched_end_tag2::reader::buffered ill_formed::unmatched_end_tag3::ns_reader::async_tokio ill_formed::unmatched_end_tag3::ns_reader::borrowed ill_formed::unmatched_end_tag3::ns_reader::buffered ill_formed::unmatched_end_tag3::reader::async_tokio ill_formed::unmatched_end_tag3::reader::borrowed ill_formed::unmatched_end_tag3::reader::buffered
@@ -95,7 +100,7 @@ impl ReaderState { | |||
off += p + 1; | |||
// if next byte after `-` is also `-`, return an error | |||
if buf[3 + off] == b'-' { | |||
self.offset -= len - 2 - p; | |||
self.last_error_offset = self.offset - len + 2 + p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that the constants are being cleaned up in the rewrite, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, with current approach some constants are always will be there because events don't include <
and >
, but the parser reports offsets that includes those characters. I should just add explanation to all such constants, but in new parser they will be more obvious. In that case, for example, self.offset
just after >
, buf
contains !-- con--tent --
and p
is counted from byte after <!--
:
<!-- con--tent -->:
~~~~~~~~~~~~~~~~ : - buf
: |---p : - p is counted from | (| is 0)
: : : ^ - self.offset
^ : : - self.offset - len
^ : - self.offset - len + 2
^ - self.offset - len + 2 + p
In the perfect world we would add offset to the `Error` type, but right now this is impractical, because this significantly changes some API that would be removed soon Co-authored-by: Daniel Alley <[email protected]>
Because since #677 we can continue parsing after
Error::IllFormed
we no longer can changebuffer_position()
so that it points to useful position of error. The best solution would be to add position to theError
itself, in form of:but that would require massive changes, also in the code that I plan to remove soon during rewrite. To keep changes relative small and do not touch that code I decided to add a new
error_position()
getter which would report position of the error. After rewrite I'll try to implement the best solution.reader-error.rs
tests was updated to checkbuffer_position()
result after resuming parsing after error. Also a couple of new cases was added, which also allowed me to find a bug in one case.