pyclstm¶

Installation¶

Requirements:

A recent version of Eigen (>= 3.3) with development headers
A C++ compiler (g++ is recommended)
Cython

Installation:

$ pip install git+https://github.com/jbaiter/clstm.git@cython

Example Usage¶

Training:

Refer to run_uw3_500.py in the root directory for a more comprehensive example.

import pyclstm
ocr = pyclstm.ClstmOcr()
ocr.prepare_training(
    graphemes=graphemes,  # A list of characters the engine is supposed to recognize
)

# line_img can be an image loaded with PIL/Pillow or a numpy array
for line_img, ground_truth in training_data:
    ocr.train(line_img, ground_truth)
ocr.save("my_model.clstm")

Recognition:

import pyclstm
ocr = pyclstm.ClstmOcr()
ocr.load("my_model.clstm")
text = ocr.recognize(line_img)

API Reference¶

class pyclstm.ClstmOcr¶

An OCR engine based on CLSTM, operating on line images.

Use this class to either train your own OCR model or to load a pre-trained model from disk.

For training, set your parameters with prepare_training(), and then iteratively supply a line image (PIL.Image.Image or numpy.ndarray) and the ground truth for the line to train(). Once finished with training, call save() to persist the trained model to disk.

For prediction, two methods are available. The simplest, recognize() takes a line image (see above) and returns the recognized text as a string. If more information about the recognized text is needed, use recognize_chars(), which returns a generator that yields CharPrediction objects that contain information about each character (x-offset, confidence and recognized character).

aligned(self)¶

Get the aligned output of the last trained sample.

Return type:	unicode

load(self, str fname)¶

Load a pre-trained model from disk.

Parameters:	fname (str) – Path to pre-trained model on disk

prepare_training(self, lexicon, int num_hidden=100, float learning_rate=0.0001, float momentum=0.9)¶

Prepare training by setting the lexicon and hyperparameters.

Parameters:

lexicon (iterable of str/unicode) – Iterable of characters that are to be recognized by the OCR model, must not have duplicates
num_hidden (int) – Number of hidden units in the LSTM layers, larger values require more storage/memory and take longer for training and recognition, so try to find a good performance/cost tradeoff.
learning_rate (float) – Learning rate for the model training
momentum (float) – Momentum for the model training

recognize(self, img)¶

Recognize the text on the line image.

Parameters:	img (`PIL.Image.Image`/`numpy.ndarray`) – The line image for the ground truth
Returns:	The recognized text for the line
Return type:	unicode

recognize_chars(self, img)¶

Recognize the characters on the line, along with their position: and confidence.

Parameters:	img (`PIL.Image.Image`/`numpy.ndarray`) – The line image for the ground truth
Returns:	The recognized text for the line, represented as information about its composing characters.
Return type:	generator that yield `CharPrediction`

save(self, str fname)¶

Save the model to disk.

Parameters:	fname (str) – Path to store model in

train(self, img, unicode text)¶

Train the model with a line image and its ground truth.

Parameters:	img (`PIL.Image.Image`/`numpy.ndarray`) – The line image for the ground truth text (unicode) – The ground truth text for the line image
Returns:	The recognized text for the line image, can be used to estimate error against the ground truth (via `levenshtein()`)
Return type:	unicode

pyclstm.levenshtein(unicode a, unicode b) → double¶

Determine the Levenshtein-distance between to unicode strings.

Return type:	int

pyclstm¶

Installation¶

Example Usage¶

API Reference¶

Indices and tables¶