Skip to content

Improving Windows with PyInstaller - Ocrmypdf Distribution Not Found #659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gabemorris12 opened this issue Oct 21, 2020 · 15 comments
Open

Comments

@gabemorris12
Copy link

Describe the bug
Pyinstaller was used to successfully create an executable file, but upon running the .exe, it crashes.

To Reproduce
To reproduce the error, make sure you flag --consoled when running the pyinstaller command. Then run the executable from the command prompt. Then it will raise pkg_resources.DistributionNotFound error.

Expected behavior
I expected all dependencies to be present.

Screenshots
image

System (please complete the following information):

  • OS: windows 10
  • Python version: 3.8.3
  • OCRmyPDF version: 10.3.3

Additional context
The .py script runs perfectly fine. There's just a problem with pyinstaller not finding the distribution.

@jbarlow83
Copy link
Collaborator

I think you'll have to ask for support from the PyInstaller people, since I don't know much about it.

ocrmypdf's use of cffi to access a C library, and its use of third party executables like Tesseract and Ghostscript, may be problematic for packaging.

If you can identify a specific issue with ocrmypdf that is an obstacle to generating this installer, please open a PR.

@gabemorris12
Copy link
Author

@jbarlow83 I it to run! I should have gone to pyinstaller first, so sorry about that. The fix can be found here: pyinstaller/pyinstaller#4809

I had to create hook files for both ocrmypdf module and the pikepdf module by using the --additional-hooks-dir HooksFolderPath flag. Thanks for the quick response.

@jbarlow83
Copy link
Collaborator

Would you mind sharing your full instructions? Maybe I'll be able to automate it one day.

@gabemorris12
Copy link
Author

gabemorris12 commented Oct 22, 2020

@jbarlow83
You have to make two different .py scripts in a folder within your project. For my example, I made a folder called Hooks, and these are the two scripts that I made:
hook-ocrmypdf.py

from PyInstaller.utils.hooks import collect_all

datas, binaries, hiddenimports = collect_all('ocrmypdf')

Having this hook alone will allow the app to run; however, it is not enough. In your program, once it attempts to execute the ocrmypdf.ocr() function, it will raise another error. I cannot remember what error it was exactly, but it was an error relating to a missing dependency within pikepdf. So the other script that you have to make is this:

hook-pikepdf.py

from PyInstaller.utils.hooks import collect_all

datas, binaries, hiddenimports = collect_all('pikepdf')

Now when you package the main script with pyinstaller, you must include this flag: --additional-hooks-dir HooksFolderPath.

@jbarlow83 jbarlow83 changed the title Ocrmypdf Distribution Not Found Improving Windows with PyInstaller - Ocrmypdf Distribution Not Found Oct 28, 2020
@woaidianqian
Copy link

pyinstaller pdf识别.py --additional-hooks-dir HooksFolderPath
image
it works!

@koolkunz
Copy link

koolkunz commented Apr 10, 2023

Getting this error after following the instructions above:

Traceback (most recent call last):
File "test.py", line 30, in
File "ocrmypdf\api.py", line 324, in ocr
plugin_manager = get_plugin_manager(plugins)
File "ocrmypdf_plugin_manager.py", line 104, in get_plugin_manager
return OcrmypdfPluginManager(
File "ocrmypdf_plugin_manager.py", line 45, in init
self.setup_plugins()
File "ocrmypdf_plugin_manager.py", line 69, in setup_plugins
for module in sorted(
TypeError: '<' not supported between instances of 'FrozenImporter' and 'FileFinder'

can anyone help?

@jbarlow83
Copy link
Collaborator

@koolkunz Look up how to use pyinstaller with a project that uses pluggy as a plugin manager. pytest is an example of such a project. You may also need add ocrmypdf/builtin_plugins to the list of dynamic plugins that will be used - ocrmypdf installs some of its own plugins by default so that others can override them.

@koolkunz
Copy link

koolkunz commented Apr 21, 2023

@koolkunz Look up how to use pyinstaller with a project that uses pluggy as a plugin manager. pytest is an example of such a project. You may also need add ocrmypdf/builtin_plugins to the list of dynamic plugins that will be used - ocrmypdf installs some of its own plugins by default so that others can override them.

pyinstaller --copy-metadata pikepdf --copy-metadata ocrmypdf --collect-submodules ocrmypdf --collect-datas ocrmypdf.data sample.py

This is all I had to do to get it working^.

But now I have another small issue, that is if I am running my GUI application in no console mode, ocrmypdf keeps opening multiple console windows of tesseract (I think equal to the number of jobs being processed) and ghostscript executables with no output.

I already have logging set to -1:
configure_logging(verbosity=-1, progress_bar_friendly=False)

Is there any other way to suppress these console windows from showing up at all?

edit: Added Screenshot
Screenshot (203)

@insinfo
Copy link

insinfo commented Apr 27, 2023

@koolkunz
could you make this available on google drive so i can download and use it on my windows notebook? I was in need of making some PDF Searchable. In my notebook I already have the latest version of tesseract v5.3.0, ghostscript 10.01.1 and python 3.10.8

@koolkunz
Copy link

@koolkunz could you make this available on google drive so i can download and use it on my windows notebook? I was in need of making some PDF Searchable. In my notebook I already have the latest version of tesseract v5.3.0, ghostscript 10.01.1 and python 3.10.8

I'm still working on it and unless you want to distribute the exe you can just use ocrmypdf through simple python code or even just directly in the command line.

@zeekias
Copy link

zeekias commented May 25, 2023

I have the same problem, someone could help me?

@zeekias
Copy link

zeekias commented May 25, 2023

How can i disable the tesseract prompt?

@lugi777
Copy link

lugi777 commented Aug 1, 2024

@koolkunz Have you found a way to prevent the console windows showing up?

@koolkunz
Copy link

koolkunz commented Aug 1, 2024

@koolkunz Have you found a way to prevent the console windows showing up?

Unfortunately no, ultimately I just went with the console window option while building the exe using pynistaller

@Brisk4t
Copy link

Brisk4t commented Apr 7, 2025

@koolkunz Look up how to use pyinstaller with a project that uses pluggy as a plugin manager. pytest is an example of such a project. You may also need add ocrmypdf/builtin_plugins to the list of dynamic plugins that will be used - ocrmypdf installs some of its own plugins by default so that others can override them.

pyinstaller --copy-metadata pikepdf --copy-metadata ocrmypdf --collect-submodules ocrmypdf --collect-datas ocrmypdf.data sample.py

This is all I had to do to get it working^.

But now I have another small issue, that is if I am running my GUI application in no console mode, ocrmypdf keeps opening multiple console windows of tesseract (I think equal to the number of jobs being processed) and ghostscript executables with no output.

I already have logging set to -1: configure_logging(verbosity=-1, progress_bar_friendly=False)

Is there any other way to suppress these console windows from showing up at all?

edit: Added Screenshot Screenshot (203)

If anyone does stumble into this extremely niche issue, a hotfix you can use is to monkey patch the subprocess.Popen() function that OCRmyPDF and its submodules (Tesseract and GhostScript) use to prevent them from opening new windows.

# Patch OCRmyPDF to suppress all consoles
if platform.system() == "Windows":
    # Patch subprocess to prevent window popup on Windows
    class NoWindowPopen(subprocess.Popen):
        def __init__(self, *args, **kwargs):
            kwargs['creationflags'] = kwargs.get('creationflags', 0) | subprocess.CREATE_NO_WINDOW
            super().__init__(*args, **kwargs)

    subprocess.Popen = NoWindowPopen

import ocrmypdf

Note that its important that this block runs before importing ocrmypdf

A more robust solution would be to add a flag to the .ocr function call that then passes on the flag to .Popen()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants