-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow OCR destroys PDF files #79
Comments
HI @kwisatz and thanks for reporting this. Unfortunately you didn't attach any Nexcloud logfiles to your mentioned issue so it's really hard to say whats going wrong under the hood. What i could imagine is that the Please decrease your loglevel to 2, reproduce the issue and paste some snippets of your Btw: the app always produces a new file version so the original files aren't deleted. They can be rolled back by using the file history, see README.md for further details. |
Signed-off-by: Robin Windey <[email protected]>
Signed-off-by: Robin Windey <[email protected]>
Hi,
The original PDF file is restorable via history, but the current file is corrupt. It is empty. Currently every Scan i upload, needs to be restored via the History. Furthermore i receive lots of errors, that Imagick is not able to create a preview of the file, which is clear, because the file has 0 bytes. |
Thanks for your feedback. Unfortunately it seems like we have an error inside the logging function so that currently the contents of My proposal:
Does that make sense to you? |
Sounds plausible to me @R0Wi. Sorry I haven't been able to provide any detailed logs so far, we're pretty busy at the moment. I'll try to supply some this afternoon in case I can actually add some value by logs different from those that @StefCGN supplied. |
Signed-off-by: Robin Windey <[email protected]>
Signed-off-by: Robin Windey <[email protected]>
Signed-off-by: Robin Windey <[email protected]>
Signed-off-by: Robin Windey <[email protected]>
Signed-off-by: Robin Windey <[email protected]>
Signed-off-by: Robin Windey <[email protected]>
Signed-off-by: Robin Windey <[email protected]>
Fix is now available in versions |
Hi, First test produces the following error: `[workflow_ocr] Warnung: OCRmyPDF succeeded with warning(s): A decompression bomb error was encountered while executing the pipeline. Use the argument --max-image-mpixels to raise the maximum image pixel limit. The above exception was the direct cause of the following exception: Traceback (most recent call last): And this one: |
Second Test:
|
Okay so it seems like
error message is always the last error message written if the app doesn't receive any output from
So it might be, that the default settings are set too low for your file. Could you try to manually execute the I can also offer to inspect your PDF file if it's possible to send it over. EDIT: could be related to ocrmypdf/OCRmyPDF#413 |
Here's output from my tests with the latest release. I think the message is pretty obvious. {"reqId":"AlD9sEoaxJK2iB2vd2xk","level":2,"time":"2021-12-09T08:40:12+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCRmyPDF succeeded with warning(s): sh: 1: ocrmypdf: not found, ","userAgent":"--","version":"22.2.2.0"}
{"reqId":"AlD9sEoaxJK2iB2vd2xk","level":3,"time":"2021-12-09T08:40:12+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCR for file /08ece906-d7b8-1035-9c8d-97c38bc36d8d/files/form-experiments.pdf not possible. Message: OCRmyPDF did not produce any output","userAgent":"--","version":"22.2.2.0"}
{"reqId":"AifTKq4NuPN1ARrxKBAx","level":2,"time":"2021-12-09T08:40:13+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCRmyPDF succeeded with warning(s): sh: 1: ocrmypdf: not found, ","userAgent":"--","version":"22.2.2.0"}
{"reqId":"AifTKq4NuPN1ARrxKBAx","level":3,"time":"2021-12-09T08:40:13+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCR for file /08ece906-d7b8-1035-9c8d-97c38bc36d8d/files/Projects/someProject/CITP88D.PDF not possible. Message: OCRmyPDF did not produce any output","userAgent":"--","version":"22.2.2.0"}
{"reqId":"AlD9sEoaxJK2iB2vd2xk","level":2,"time":"2021-12-09T08:40:13+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCRmyPDF succeeded with warning(s): sh: 1: ocrmypdf: not found, ","userAgent":"--","version":"22.2.2.0"}
{"reqId":"AlD9sEoaxJK2iB2vd2xk","level":3,"time":"2021-12-09T08:40:13+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCR for file /08ece906-d7b8-1035-9c8d-97c38bc36d8d/files/Projects/someProject/AFF_SECT.PDF not possible. Message: OCRmyPDF did not produce any output","userAgent":"--","version":"22.2.2.0"}
{"reqId":"AifTKq4NuPN1ARrxKBAx","level":2,"time":"2021-12-09T08:40:13+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCRmyPDF succeeded with warning(s): sh: 1: ocrmypdf: not found, ","userAgent":"--","version":"22.2.2.0"}
{"reqId":"AifTKq4NuPN1ARrxKBAx","level":3,"time":"2021-12-09T08:40:13+00:00","remoteAddr":"","user":"08ece906-d7b8-1035-9c8d-97c38bc36d8d","app":"workflow_ocr","method":"","url":"--","message":"OCR for file /08ece906-d7b8-1035-9c8d-97c38bc36d8d/files/Projects/someProject/CITP88FR.PDF not possible. Message: OCRmyPDF did not produce any output","userAgent":"--","version":"22.2.2.0"} I know this is a little off-topic but where or when should In any case it seems to after installing |
I think, the document is "too big", because its a house plan with drawings. The output is: I dont know, why this is occuring. I am working on a FreeBSD platform inside a TrueNAS system The file is there, but i think, it is looking in a wrong path. Perhaps, do you know, how to correct that? Now, a new PDF is generated, but the text is not selectable. The following error shows in Nextcloud Log: |
@kwisatz the README mentiones that @StefCGN seems like a problem inside your server setup but i can't say whats wrong. The only thing i would test is executing the |
@R0Wi As a Nextcloud user who can install apps through the Web UI, I would expect a mention of dependencies here: |
I see, you're right 👍 Will update the docs in the next release. |
Olé olé :) I just downloaded the configs directory coming from Github into /tessdata/ and the PDF is created without any errors :) Will open up an issue with tesseract! Thanks for pointing me somehow into the right direction. |
After postCreate ran Workflow OCR on PDF files on our instance, they all have a size of 0kb. I've posted details here: nextcloud/server#30059
After deleting the flow, PDFs no longer are getting corrupted.
The text was updated successfully, but these errors were encountered: