-
Notifications
You must be signed in to change notification settings - Fork 253
Treat broken reference links as links #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I worked around the
|
@marcusklaas I'm not really sure where to go from here. I know what I want to do, which is:
However I'm not really sure how to implement that, I'm not understanding the current code very well. |
Just as I said that I figured out how to do it actually 😆 |
No, that just causes more test failures (from https://github.com/jyn514/pulldown-cmark/tree/reference-link-dup):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay in reviewing this, and thanks again for getting started.
If I were to try this, I'd try skip parsing until the end of the sequence. So if we find that [label][ref]
is not a valid reference link, we extend the node of the first ]
to ][ref]
and continue after this. This would effectively skip the parsing of [ref]
and hence prevent the double call to the broken link function. This would probably be done here. That line makes the ]
simple text. You would need to add some editing of the node end, and manipulation of cur
/ cur_ix
to make it work, I think.
I fully acknowledge that the link parsing logic is one of the most complex pieces of pulldown, so please don't feel bad if the above suggestion is daunting.
@@ -3124,4 +3125,19 @@ mod test { | |||
} | |||
assert_eq!(found, 1); | |||
} | |||
|
|||
#[test] | |||
fn broken_links_called_only_once() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good test!
src/parse.rs
Outdated
|
||
if tos.ty == LinkStackTy::Link { | ||
self.link_stack.disable_all_links(); | ||
} | ||
} else { | ||
self.tree[cur_ix].item.body = ItemBody::Text; | ||
} | ||
// assume the reference link was meant to be valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely understand what this change does. Is it reordering some operations, and if so, for what reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea, which didn't work, was to treat [a][b]
as a link even if b
didn't resolve. Before it would only be treated as a link if the reference was valid. Now it causes the following markdown to be parsed incorrectly:
[foo][bar][baz]
[baz]: /url
It should be parsed as [foo]
and then a link to /url
with text bar
. Instead it's now parsed as a self-contained reference link with text baz
. I think this is because I took [bar]
completely out of the tree so it isn't being considered as a label.
I think that would run into the same issues I mentioned in https://github.com/raphlinus/pulldown-cmark/pull/445/files#r435628484: If it's removed from the tree altogether, it's not considered as a label for future links. |
Ok, I had an idea that almost worked: I said if |
I changed it to only skip the callback and not any other processing. It passes all the test cases, but I'm no longer sure this is implemented correctly. |
This seems like a reasonable solution, although I wonder if the reset of [brokenlink1] some other node [brokenlink2] |
Good catch, I added a test for that. I tried fixing it by resetting $ diff --git a/src/parse.rs b/src/parse.rs
index d43b81f..34a60d9 100644
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -1997,6 +1997,8 @@ impl<'a> Parser<'a> {
let block_text = &self.text[..block_end];
while let Some(mut cur_ix) = cur {
+ let last_broken = saw_broken;
+ saw_broken = false;
match self.tree[cur_ix].item.body {
ItemBody::MaybeHtml => {
let next = self.tree[cur_ix].next;
@@ -2200,8 +2202,7 @@ impl<'a> Parser<'a> {
(link_type, url, title)
})
.or_else(|| {
- if saw_broken {
- saw_broken = false;
+ if last_broken {
|
Yeah, it looks like it executes three times on a single reference link:
Do you know why it has that behavior? |
Closing in favor of #469. |
This ensures that broken_link_callback will only see broken links once.
Closes #444
Outdated: trouble with FnMut
I tried to add the following test, but it fails because callbacks are only allowed to be
Fn
, notFnMut
:I tried allowing FnMut closures, but that had all sorts of other things that broke, including needing to pass in
Some(&mut closure)
instead ofSome(&closure)
and removing theClone
impl forParser
. Let me know if you want me to follow up with that, I'm not sure what the best approach is there.