models/attention_ocr at master · alsrgv/models

History

Name		Name	Last commit message	Last commit date
parent directory ..
python		python
README.md		README.md

README.md

Attention-based Extraction of Structured Information from Street View Imagery

A TensorFlow model for real-world image text extraction problems.

This folder contains the code needed to train a new Attention OCR model on the FSNS dataset dataset to transcribe street names in France. You can also use it to train it on your own data.

More details can be found in our paper:

"Attention-based Extraction of Structured Information from Street View Imagery"

Contacts

Authors: Zbigniew Wojna [email protected], Alexander Gorban [email protected]

Pull requests: alexgorban

Requirements

Install the TensorFlow library (instructions). For example:

virtualenv --system-site-packages ~/.tensorflow
source ~/.tensorflow/bin/activate
pip install --upgrade pip
pip install --upgrade tensorflow_gpu

At least 158GB of free disk space to download the FSNS dataset:

cd models/attention_ocr/python/datasets
aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt
cd ..

16GB of RAM or more; 32GB is recommended.
train.py works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980.

How to use this code

To run all unit tests:

python -m unittest discover -p  '*_test.py'

To train from scratch:

python train.py

To train a model using pre-trained Inception weights as initialization:

wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
tar xf inception_v3_2016_08_28.tar.gz
python train.py --checkpoint_inception=inception_v3.ckpt

To fine tune the Attention OCR model using a checkpoint:

wget http://download.tensorflow.org/models/attention_ocr_2017_05_17.tar.gz
tar xf attention_ocr_2017_05_17.tar.gz
python train.py --checkpoint=model.ckpt-399731

Disclaimer

This code is a modified version of the internal model we used for our paper. Currently it reaches 83.79% full sequence accuracy after 400k steps of training. The main difference between this version and the version used in the paper - for the paper we used a distributed training with 50 GPU (K80) workers (asynchronous updates), the provided checkpoint was created using this code after ~6 days of training on a single GPU (Titan X) (it reached 81% after 24 hours of training), the coordinate encoding is missing TODO(alexgorban@).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention_ocr

attention_ocr

README.md

Attention-based Extraction of Structured Information from Street View Imagery

Contacts

Requirements

How to use this code

Disclaimer

Files

attention_ocr

Directory actions

More options

Directory actions

More options

Latest commit

History

attention_ocr

Folders and files

parent directory

README.md

Attention-based Extraction of Structured Information from Street View Imagery

Contacts

Requirements

How to use this code

Disclaimer