pyclstm

Installation

Requirements:
  • A recent version of Eigen (>= 3.3) with development headers
  • A C++ compiler (g++ is recommended)
  • Cython

Installation:

$ pip install git+https://github.com/jbaiter/clstm.git@cython

Example Usage

Training:

Refer to run_uw3_500.py in the root directory for a more comprehensive example.

import pyclstm
ocr = pyclstm.ClstmOcr()
ocr.prepare_training(
    graphemes=graphemes,  # A list of characters the engine is supposed to recognize
)

# line_img can be an image loaded with PIL/Pillow or a numpy array
for line_img, ground_truth in training_data:
    ocr.train(line_img, ground_truth)
ocr.save("my_model.clstm")

Recognition:

import pyclstm
ocr = pyclstm.ClstmOcr()
ocr.load("my_model.clstm")
text = ocr.recognize(line_img)

API Reference

class pyclstm.ClstmOcr

An OCR engine based on CLSTM, operating on line images.

Use this class to either train your own OCR model or to load a pre-trained model from disk.

For training, set your parameters with prepare_training(), and then iteratively supply a line image (PIL.Image.Image or numpy.ndarray) and the ground truth for the line to train(). Once finished with training, call save() to persist the trained model to disk.

For prediction, two methods are available. The simplest, recognize() takes a line image (see above) and returns the recognized text as a string. If more information about the recognized text is needed, use recognize_chars(), which returns a generator that yields CharPrediction objects that contain information about each character (x-offset, confidence and recognized character).

aligned(self)

Get the aligned output of the last trained sample.

Return type:unicode
load(self, str fname)

Load a pre-trained model from disk.

Parameters:fname (str) – Path to pre-trained model on disk
prepare_training(self, lexicon, int num_hidden=100, float learning_rate=0.0001, float momentum=0.9)

Prepare training by setting the lexicon and hyperparameters.

Parameters:
  • lexicon (iterable of str/unicode) – Iterable of characters that are to be recognized by the OCR model, must not have duplicates
  • num_hidden (int) – Number of hidden units in the LSTM layers, larger values require more storage/memory and take longer for training and recognition, so try to find a good performance/cost tradeoff.
  • learning_rate (float) – Learning rate for the model training
  • momentum (float) – Momentum for the model training
recognize(self, img)

Recognize the text on the line image.

Parameters:img (PIL.Image.Image/numpy.ndarray) – The line image for the ground truth
Returns:The recognized text for the line
Return type:unicode
recognize_chars(self, img)
Recognize the characters on the line, along with their position
and confidence.
Parameters:img (PIL.Image.Image/numpy.ndarray) – The line image for the ground truth
Returns:The recognized text for the line, represented as information about its composing characters.
Return type:generator that yield CharPrediction
save(self, str fname)

Save the model to disk.

Parameters:fname (str) – Path to store model in
train(self, img, unicode text)

Train the model with a line image and its ground truth.

Parameters:
Returns:

The recognized text for the line image, can be used to estimate error against the ground truth (via levenshtein())

Return type:

unicode

pyclstm.levenshtein(unicode a, unicode b) → double

Determine the Levenshtein-distance between to unicode strings.

Return type:int

Indices and tables