Skip to content

Treat broken reference links as links #445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

jyn514
Copy link
Contributor

@jyn514 jyn514 commented May 31, 2020

This ensures that broken_link_callback will only see broken links once.

Closes #444

Outdated: trouble with FnMut

I tried to add the following test, but it fails because callbacks are only allowed to be Fn, not FnMut:

    #[test]
    fn broken_links_called_only_once() {
        let markdown = "See also [`g()`][crate::g].";
        let mut times_called = 0;
        let mut callback = |_: &str, _: &str| {
            times_called += 1;
            None
        };
        let parser = Parser::new_with_broken_link_callback(markdown, Options::empty(), Some(&callback));
        for _ in parser {}
        assert_eq!(times_called, 1);
    }
error[E0525]: expected a closure that implements the `Fn` trait, but this closure only implements `FnMut`
    --> src/parse.rs:3133:28
     |
3133 |         let mut callback = |_: &str, _: &str| {
     |                            ^^^^^^^^^^^^^^^^^^ this closure implements `FnMut`, not `Fn`
3134 |             times_called += 1;
     |             ------------ closure is `FnMut` because it mutates the variable `times_called` here
...
3137 |         let parser = Parser::new_with_broken_link_callback(markdown, Options::empty(), Some(&callback));
     |                                                                                             --------- the requirement to implement `Fn` derives from here

I tried allowing FnMut closures, but that had all sorts of other things that broke, including needing to pass in Some(&mut closure) instead of Some(&closure) and removing the Clone impl for Parser. Let me know if you want me to follow up with that, I'm not sure what the best approach is there.

@jyn514
Copy link
Contributor Author

jyn514 commented May 31, 2020

I worked around the FnMut thing by using RefCell. Now some other tests are failing:

thread 'suite::spec::spec_test_519' panicked at 'assertion failed: `(left == right)`
  left: `"<p><em>foo [bar</em> baz]</p>"`,
 right: `"<p>*foo [bar* baz]</p>"`', tests/lib.rs:32:5

thread 'suite::spec::spec_test_541' panicked at 'assertion failed: `(left == right)`
  left: `"<p>[bar][foo!]</p>"`,
 right: `"<p>[bar][foo\\!]</p>"`', tests/lib.rs:32:5

thread 'suite::spec::spec_test_565' panicked at 'assertion failed: `(left == right)`
  left: `"<p>[foo]<a href=\"/url\">bar</a></p>"`,
 right: `"<p>[foo][bar]<a href=\"/url\">baz</a></p>"`', tests/lib.rs:32:5

thread 'suite::spec::spec_test_567' panicked at 'assertion failed: `(left == right)`
  left: `"<p>[foo]<a href=\"/url1\">bar</a></p>"`,
 right: `"<p>[foo][bar]<a href=\"/url1\">baz</a></p>"`', tests/lib.rs:32:5

@jyn514
Copy link
Contributor Author

jyn514 commented May 31, 2020

@marcusklaas I'm not really sure where to go from here. I know what I want to do, which is:

  1. mark the current node as text
  2. mark the failed reference as 'not a reference'
  3. continue the main loop, but in the next iteration never mark the failed reference as a reference

However I'm not really sure how to implement that, I'm not understanding the current code very well.

@jyn514
Copy link
Contributor Author

jyn514 commented May 31, 2020

Just as I said that I figured out how to do it actually 😆

@jyn514
Copy link
Contributor Author

jyn514 commented May 31, 2020

No, that just causes more test failures (from https://github.com/jyn514/pulldown-cmark/tree/reference-link-dup):

---- suite::spec::spec_test_524 stdout ----
thread 'suite::spec::spec_test_524' panicked at 'assertion failed: `(left == right)`
  left: `"<p><a href=\"/uri\">link [foo [bar]]</a></p>"`,
 right: `"<p>[link [foo [bar]]]<a href=\"/uri\">ref</a></p>"`', tests/lib.rs:32:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- suite::spec::spec_test_538 stdout ----
thread 'suite::spec::spec_test_538' panicked at 'assertion failed: `(left == right)`
  left: `"<p>[foo] <a href=\"/url\" title=\"title\">bar</a></p>"`,
 right: `"<p>[foo] [bar]</p>"`', tests/lib.rs:32:5

---- suite::spec::spec_test_539 stdout ----
thread 'suite::spec::spec_test_539' panicked at 'assertion failed: `(left == right)`
  left: `"<p>[foo] <a href=\"/url\" title=\"title\">bar</a></p>"`,
 right: `"<p>[foo] [bar]</p>"`', tests/lib.rs:32:5

---- suite::spec::spec_test_565 stdout ----
thread 'suite::spec::spec_test_565' panicked at 'assertion failed: `(left == right)`
  left: `"<p>[foo]<a href=\"/url\">bar</a></p>"`,
 right: `"<p>[foo][bar]<a href=\"/url\">baz</a></p>"`', tests/lib.rs:32:5

---- suite::spec::spec_test_567 stdout ----
thread 'suite::spec::spec_test_567' panicked at 'assertion failed: `(left == right)`
  left: `"<p>[foo]<a href=\"/url1\">bar</a></p>"`,
 right: `"<p>[foo][bar]<a href=\"/url1\">baz</a></p>"`', tests/lib.rs:32:5

Copy link
Collaborator

@marcusklaas marcusklaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay in reviewing this, and thanks again for getting started.

If I were to try this, I'd try skip parsing until the end of the sequence. So if we find that [label][ref] is not a valid reference link, we extend the node of the first ] to ][ref] and continue after this. This would effectively skip the parsing of [ref] and hence prevent the double call to the broken link function. This would probably be done here. That line makes the ] simple text. You would need to add some editing of the node end, and manipulation of cur/ cur_ix to make it work, I think.

I fully acknowledge that the link parsing logic is one of the most complex pieces of pulldown, so please don't feel bad if the above suggestion is daunting.

@@ -3124,4 +3125,19 @@ mod test {
}
assert_eq!(found, 1);
}

#[test]
fn broken_links_called_only_once() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test!

src/parse.rs Outdated

if tos.ty == LinkStackTy::Link {
self.link_stack.disable_all_links();
}
} else {
self.tree[cur_ix].item.body = ItemBody::Text;
}
// assume the reference link was meant to be valid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't completely understand what this change does. Is it reordering some operations, and if so, for what reason?

Copy link
Contributor Author

@jyn514 jyn514 Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea, which didn't work, was to treat [a][b] as a link even if b didn't resolve. Before it would only be treated as a link if the reference was valid. Now it causes the following markdown to be parsed incorrectly:

[foo][bar][baz]

[baz]: /url

It should be parsed as [foo] and then a link to /url with text bar. Instead it's now parsed as a self-contained reference link with text baz. I think this is because I took [bar] completely out of the tree so it isn't being considered as a label.

@jyn514
Copy link
Contributor Author

jyn514 commented Jun 5, 2020

So if we find that [label][ref] is not a valid reference link, we extend the node of the first ] to ][ref] and continue after this. This would effectively skip the parsing of [ref] and hence prevent the double call to the broken link function.

I think that would run into the same issues I mentioned in https://github.com/raphlinus/pulldown-cmark/pull/445/files#r435628484: If it's removed from the tree altogether, it's not considered as a label for future links.

@jyn514 jyn514 mentioned this pull request Jun 11, 2020
@jyn514
Copy link
Contributor Author

jyn514 commented Jun 11, 2020

Ok, I had an idea that almost worked: I said if [x][y] fails, then only on the next loop iteration, if you see a shortcut reference link [y], ignore it and treat it as text. This breaks because [x] [y]\ny: https://link fails the first time but succeeds when you get to [y]. Maybe I'm just checking this at the wrong time? I pushed the code in case you have any ideas.

@jyn514
Copy link
Contributor Author

jyn514 commented Jun 11, 2020

I changed it to only skip the callback and not any other processing. It passes all the test cases, but I'm no longer sure this is implemented correctly.

@marcusklaas
Copy link
Collaborator

This seems like a reasonable solution, although I wonder if the reset of saw_broken is always done. It seems it's only done when the callback is called. Would this not cause the following snippet to only call the function once?

[brokenlink1] some other node [brokenlink2]

@jyn514
Copy link
Contributor Author

jyn514 commented Jun 13, 2020

Good catch, I added a test for that. I tried fixing it by resetting saw_broken at the start of every loop and keeping the old value for exactly one iteration, but that didn't seem to work. I think I'm misunderstanding how many times the loop executes on a single reference link?

$ diff --git a/src/parse.rs b/src/parse.rs
index d43b81f..34a60d9 100644
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -1997,6 +1997,8 @@ impl<'a> Parser<'a> {
         let block_text = &self.text[..block_end];
 
         while let Some(mut cur_ix) = cur {
+            let last_broken = saw_broken;
+            saw_broken = false;
             match self.tree[cur_ix].item.body {
                 ItemBody::MaybeHtml => {
                     let next = self.tree[cur_ix].next;
@@ -2200,8 +2202,7 @@ impl<'a> Parser<'a> {
                                         (link_type, url, title)
                                     })
                                     .or_else(|| {
-                                        if saw_broken {
-                                            saw_broken = false;
+                                        if last_broken {
thread 'parse::test::broken_links_called_only_once' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `1`', src/parse.rs:3148:13

@jyn514
Copy link
Contributor Author

jyn514 commented Jun 13, 2020

Yeah, it looks like it executes three times on a single reference link:

[src/parse.rs:2000] saw_broken = false
[src/parse.rs:2000] saw_broken = false
[src/parse.rs:2000] saw_broken = false
[src/parse.rs:2205] last_broken = false
[src/parse.rs:2000] saw_broken = true
[src/parse.rs:2000] saw_broken = false
[src/parse.rs:2000] saw_broken = false
[src/parse.rs:2205] last_broken = false
[src/parse.rs:2000] saw_broken = true

Do you know why it has that behavior?

@jyn514
Copy link
Contributor Author

jyn514 commented Aug 25, 2020

Closing in favor of #469.

@jyn514 jyn514 closed this Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser calls broken_link_callback twice for broken reference links
2 participants