Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: [tesseract] read_params_file: Can't open txt #209

Closed
dev-code-davis opened this issue Jan 11, 2018 · 9 comments
Closed

ERROR: [tesseract] read_params_file: Can't open txt #209

dev-code-davis opened this issue Jan 11, 2018 · 9 comments

Comments

@dev-code-davis
Copy link

Hi, when OCRying a PDF I always get the following error for nearly every page:

ERROR - 56: [tesseract] read_params_file: Can't open txt

I tried to google it, but nothing comes up (txt part especially). Maybe someone has faced with something similar. What does txt stand for? Some text format?

Command called:

nice -19 env LANG=lv_LV.utf8 ocrmypdf -l lav+rus --rotate-pages --force-ocr --pdf-renderer tesseract --output-type pdf --sidecar - /tmp/input.pdf /tmp/output556.pdf

Thanks.

@jbarlow83
Copy link
Collaborator

I suspect you have an old/unsupported version of Tesseract. What is tesseract --version?

Your installation could also be missing the file $tessroot/share/tessdata/configs/txt where $tessroot varies from platform to platfom.

@pankajkumar2433
Copy link

I am running on CentOS 8.

Version is tesseract 5.0.0-beta-20210815.
Earlier it was tesseract 4.1.1

In both the versions this issue is coming.

What to do ?

@jbarlow83
Copy link
Collaborator

Does tessdata/configs/txt exist as mentioned above?

@pankajkumar2433
Copy link

No it doesn't exist.
From where to get it ?

I have installed tesseract from the source (tar.gz).

@jbarlow83
Copy link
Collaborator

I recommend using a version of tesseract provided by your Linux distribution or some third party packager that has solved these issues, rather than installing one from source. A source build is a special case.

@pankajkumar2433
Copy link

I installed as below->

dnf config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_8/
rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
sudo dnf install tesseract

tessdata/configs/txt is present in "/usr/share/tesseract/4/tessdata/configs"

Issue is still coming.

@jbarlow83
Copy link
Collaborator

You may need to set the environment variable TESSDATA_PREFIX=/usr/share/tesseract/4 and ensure that your PATH env var gives priority to the new tesseract.

@pankajkumar2433
Copy link

Awesome!!
It's working.

@surya-1729
Copy link

Hello,

I have the similar response

pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't open tessedit_char_blacklist=,;: Error: Tesseract (legacy) engine requested, but components are not present in external/tesstrain/data/eng_pcb/eng_pcb.traineddata!! Failed loading language 'eng_pcb' Tesseract couldn't load any languages! Could not initialize tesseract.")

tesseract --version:
tesseract -v
tesseract 4.1.1
leptonica-1.82.0
libgif 5.1.9 : libjpeg 8d (libjpeg-turbo 2.1.1) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.8

I am using best float tessdata files from: https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata

also tried some of possibilities in #209

I am looking for the source of the issue ---> could someone help if understood the source. so I can work further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants