-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuously fuzzing tidy-html5 with OSS-Fuzz #788
Comments
I just wanted to add that we're starting seeing a number of crashes from fuzzing tidy-html5, and it would be very helpful if someone could take a look at them. |
@stefanbucur thank you for the comments, and information... I am NOT interested in setting up But I am very interested in any Any that are valid, repeatable bugs should be added to these issues, together with a minimal sample html, setup, and configuration used... thanks... It goes without saying, code fixes, patches or PR's are always welcome... |
I have just CC-ed your GitHub e-mail address on some of the most important issues (for example, here). Some of the crashes are potentially security vulnerabilities, so by default they are not publicly visible, to reduce the chance of them being exploited while we give maintainers a chance to take a look first. Regarding how to report these bugs, I can try filing the existing bugs as issues here on GitHub too (but these will be publicly visible). Let me know if you'd like me to do this for all crashes. I can also try producing some fixes, but unfortunately I'm not familiar with the implementation so I might not be able to fix too many things. |
@stefanbucur thanks for the comment... and the CC-ed emails... which I have started to look at... will continue that... Regarding using the Also see like #588, #622, which got solved, and closed... there are probably like issues open/closed... still checking... any pointers would help... Before we get to I built an Of course, running tidy on that almost random set of bytes yields some 9067 warnings and 149 errors... but no problem... So the most important thing here is to produce a repeatable test... where debuggers can help isolate the precise problem... etc... can do nothing without a test... Any help really appreciated... thanks... |
@stefanbucur went through the emails.. seems there are 3 issues - please confirm?
This is my 3 downloads... copied to respective
To do a bit of testing... quick and dirty... but should do it...
Is there any specific Certainly need help to reproduce the |
Thanks a lot @geoffmcl for looking into these issues! You are right that the test cases in the bug reports don't seem to reproduce directly with the 'tidy' tool, so I spent some time to understand what's going on. I believe the source of differences is that our fuzzer uses a special driver that relies on in-memory buffers to parse the input HTML file. This differs from what the 'tidy' tool does, which reads the data directly from a file. Perhaps reading from a file would mask any memory issues that the sanitizer would catch? In any case, here is how one can reproduce the crashes using the fuzzer driver: $ git clone [email protected]:google/oss-fuzz.git && cd oss-fuzz/
$ python infra/helper.py build_image tidy-html5
$ python infra/helper.py build_fuzzers --sanitizer=address tidy-html5
# You should now be able to reproduce the crash.
$ build/out/tidy-html5/tidy_fuzzer ~/Downloads/clusterfuzz-testcase-minimized-tidy_fuzzer-5639351547985920 That being said, it is good to know that the tidy tool itself is not directly vulnerable, but users of the tidy API may be exposed. |
Oops, there seems a 4th...
@stefanbucur thanks for the information... ok, you run the tests, using your own module... interesting... and exports Just reading the source, a little suspect of the Will try certainly the steps you outlined soonest... looks easy as pie... a python builder, looks interesting for Windows build, but... every fix begins by repeating the bug... somehow... but that will probably be tomorrow now... Agree 100%, Be back soonest... |
file: tidy-sanitize-04.txt @stefanbucur when it is interesting code/coding, I can come back to my computers after dinner... to play a little more... No particular problem following your setup, except some steps required And running
Wow, that looks interesting... but what am I to make of this? In say, a current
You filed this issue as It seems a good lead... getting closer... I hope... |
@geoffmcl thanks for pointing out this issue! It does look like running the fuzzer binary directly prevents ASAN from resolving symbols and showing them in stack traces. I did a bit of research and realized that there is a better way to debug the issue:
I hope this setup will be more effective at pointing out how the execution leads to the crash. |
@stefanbucur thank you for the additional clues, help... I think what you propose for
Assuming that It ensures the generated static Will try to test, try, this... soonest... Including a
Seems no reason to You could even append a Still to test out the results... Some initial trouble, that maybe Thanks for the help, feedback... look forward to more... thanks... |
@stefanbucur - suggested changes push to my fork,
But still some troubles in testing... any help appreciated... thanks... |
tidy-sanitize-05.txt @stefanbucur some steps forward, but some failures... Was unable to get the error output, to show the addresses... do not know why... need more help... but... Using the 4 samples - 12111 12309 12074 12411 - only 1 and 4 showed an error, but it looks like the same error! a 1 byte overrun... Using my fork amended Just to hopefully prove the point, made another test fix of
Note I respect your desire to add a null byte to the input stream... but do not think this adds anything... And please see API NOTE: using Prefer the direct data transfer to the Starting to conclude there may be a bug in the |
@geoffmcl my apologies for the latency (I was away for a while). Thank you for the detailed investigation, let me take a closer look and will get back to you soon! |
@geoffmcl you are right that copying the original buffer to a null-terminated string is unnecessary! My original confusion stemmed from an assumption that tidy would not accept \0 characters in its input, which was not the case. So I ended up simplifying the code even more, and now I made the following changes: diff --git a/projects/tidy-html5/tidy_fuzzer.c b/projects/tidy-html5/tidy_fuzzer.c
index 3cf2732..3de6854 100644
--- a/projects/tidy-html5/tidy_fuzzer.c
+++ b/projects/tidy-html5/tidy_fuzzer.c
@@ -38,19 +38,6 @@ void run_tidy_parser(TidyBuffer* data_buffer,
tidyRelease(tdoc);
}
-void attach_string_to_buffer(const uint8_t* data,
- size_t size,
- TidyBuffer* buffer) {
- // Use a NULL-terminated copy to make it more likely to expose
- // buffer overflows.
- char *data_string = strndup((const char*)data, size);
- if (data_string == NULL) {
- perror("Could not allocate string buffer.");
- abort();
- }
- tidyBufAttach(buffer, (byte*)data_string, strlen(data_string) + 1);
-}
-
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
TidyBuffer data_buffer;
TidyBuffer output_buffer;
@@ -59,11 +46,11 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
tidyBufInit(&output_buffer);
tidyBufInit(&error_buffer);
- attach_string_to_buffer(data, size, &data_buffer);
+ tidyBufAttach(&data_buffer, (byte*)data, size);
run_tidy_parser(&data_buffer, &output_buffer, &error_buffer);
-
+
tidyBufFree(&error_buffer);
tidyBufFree(&output_buffer);
- tidyBufFree(&data_buffer);
+ tidyBufDetach(&data_buffer);
return 0;
} I believe this is in line with what you were suggesting too. Note that I directly attached the original data buffer to the tidy buffer: This is because the original buffer is guaranteed to be allocated in its own block, without any extra padding in memory. This helps ASAN catch any memory operations outside the buffer. The next step was to transform the original crashing input: for some reason, tidy was crashing if that input was considered only until the null terminator, including the null. So I wrote a one liner to copy the test case to a null-terminated string:
You can check that the two files are actually different, so the original file had a null character inside. (On a side note, I found it very interesting that the OSS-Fuzz crash minimizer wasn't able to remove everything after the first null character, and we had to do it manually here.) Now we can rebuilt the image, rebuild the fuzzer, etc. and run the new Let me know if you are now able to reproduce the issue, and thanks a lot for looking into this thus far! |
@stefanbucur thanks for the feedback... yes, definitely, using a Reviewing your patch, using But where is the code? Has it been pushed to the oss-fuzz repo, or somewhere else... can not seem to get the code... Will try to find time to patch my fork accordingly... and test again... including running your You will note issue #798 ... is maybe related to issue 12074 - As indicated, the important issue here is to be able to |
@geoffmcl thanks for the reminder, I had forgotten to submit a PR for the change. This is now done in google/oss-fuzz#2125. Hopefully this time the crash will be reproducible! |
@stefanbucur thanks for the merged PR... updated my Got no error in the In my But, sorry, still no errors... Seem out of options... What to try next? Any help appreciated... Thanks... |
@stefanbucur oops, belay that... Now that I look at the test log, there is an error on the modified But it is the same as previous... Now if we could get address references into that, we may be on the way to somewhere... As stated, any help appreciated... thanks... |
20190203:oss-fuzz-02.txt Yikes, Had done, as mentioned -
Got -
Docker rebuilt all -
Running -
then -
Still digesting all that... Seems a good lead, in that it happens in Investigating, as time permits... |
Windows: Tried Dr. Memory - undated to DrMemory-Windows-1.11.17918-1 - using the tidy
But this does not involve printing to a |
@geoffmcl Did you get the chance to look into this issue more closely? It is indeed curious that this happens when saving the buffer (one would usually think such issues appear at parsing time). So it must be that the parser creates some structure that keeps around pointers to the input, but the structure has the sizes of the tokens wrong in some places. However, I can't seem to pin-point the exact moment when this happens, because parsing is already done by the time the buffer is saved... In any case, I also wanted to ask you if you'd like to be notified of any new such findings in the future. I've realized I can configure the project to list you as the primary maintainer, so you also get access to all the issues by default. Let me know. |
@stefanbucur yes, a week or so back, I did run my tidy-by-buf app, sort of a special version of tidy, only using
But in quite a number of MSVC Debug sessions, on this, have been unable to get to the actual overrun... so no You a very correct, The input stream is then closed. After this, there is the Then in the output phase, Waiting until I can get up steam to attack this again... it is quite frustrating ... As to whether I want to be listed, in the Already this issue of Out of the 4 items raised - 12111/~85920/UTF8Bytes, 12309/~15552/GetSurrogate, 12074/~75712/IsHighSurr, 12411/~84448/pprinttext - only in the first were we able to maybe repeat the results, and even then, not until you worked on it, modified it... including the I certainly do not want to be overloaded with so called But will always do my best to investigate if/when raised, as an issue, here, with a repeatable sample... So we really need to do some sort of triage of Is that person you? or others? Certainly need help testing, finding, fixing... etc... Look forward to further feedback... thanks... |
Feel free to add me. I'll close this. |
Thanks! I opened google/oss-fuzz#6029 |
@balthisar you should now be CC-ed on all tidy-html5 bugs found by fuzzing. There are currently over 25 open issues, which were found over time (but no one has taken a look at them yet): https://bugs.chromium.org/p/oss-fuzz/issues/list?q=label:Proj-tidy-html5 |
@stefanbucur, I'm not really sure how to "take a look at them"; I'll try to understand everything that you and @geoffmcl went through, but in general when I debug with the test cases, I sometimes see the failures, and often don't. I am reading from the file, and not doing anything in a Docker container at this point or attaching things via a buffer. tbd. And if I catch the error in the debugger in a container, I'm not really sure how to attach Xcode's debugger into a process in the container. I have a CLion license; I suppose I could learn that tool. I have found and corrected a couple of stack overflow issues, but I'm not sure how to re-rest the failed cases in order to close them. I'll re-open this issue until I have my end resolved. |
Apologies for the late reply!
Using Docker might be necessary to fully capture the execution environment where the crash happened, but you should not have to deal with it directly. https://google.github.io/oss-fuzz/advanced-topics/reproducing/ has crash repro instructions, if you haven't come across them already.
Unfortunately I don't have experience with Xcode nor CLion, but the Python scripts in the repro instructions also produce binaries in an output directory that you might be able to execute directly (e.g.,
This sounds great - you actually don't have to do anything here; the system will automatically retest periodically and auto-close the bugs if the crash no longer reproduces.
Sounds good, and please let me know if you run into any issues. I can connect you with others who might be able to help. |
FYI, the issue in this comment above is still not fixed, and is tracked here: tidy-html5:tidy_fuzzer: Heap-buffer-overflow in prvTidyEncodeCharToUTF8Bytes I posted PR #1138 to fix it, but there is also a fix in PR #1008. |
OSS-Fuzz is free fuzzing infrastructure for automatically identifying security vulnerabilities and stability bugs in open source projects.
We believe tidy-html5 is an important part of the open source ecosystem (all integrated projects can be found here), and as such we have recently integrated a few fuzz targets that we developed for tidy-html5 into OSS-Fuzz. Once integrated, OSS-Fuzz will continuously fuzz this project, alert when it finds bugs, and verify the fixes.
Would any of the contributors be interested in becoming a contact person for receiving any bug reports?
Since some of these bugs may be security vulnerabilities, we have a disclosure policy where bugs are first reported to the maintainers before being publicly released after a certain deadline (see the link for the complete details).
Ideally, these fuzz targets should also reside in the main project repository, so they are updated together with API changes. Let me know if you're also interested in integrating these targets in this repository (since this is additional work on top of your volunteer time to open-source, we are also offering integration rewards, more details here).
Thank you!
The text was updated successfully, but these errors were encountered: