pyclstm¶
Installation¶
- Requirements:
- A recent version of Eigen (>= 3.3) with development headers
- A C++ compiler (g++ is recommended)
- Cython
Installation:
$ pip install git+https://github.com/jbaiter/clstm.git@cython
Example Usage¶
Training:
Refer to run_uw3_500.py
in the root directory for a more comprehensive
example.
import pyclstm
ocr = pyclstm.ClstmOcr()
ocr.prepare_training(
graphemes=graphemes, # A list of characters the engine is supposed to recognize
)
# line_img can be an image loaded with PIL/Pillow or a numpy array
for line_img, ground_truth in training_data:
ocr.train(line_img, ground_truth)
ocr.save("my_model.clstm")
Recognition:
import pyclstm
ocr = pyclstm.ClstmOcr()
ocr.load("my_model.clstm")
text = ocr.recognize(line_img)
API Reference¶
-
class
pyclstm.
ClstmOcr
¶ An OCR engine based on CLSTM, operating on line images.
Use this class to either train your own OCR model or to load a pre-trained model from disk.
For training, set your parameters with
prepare_training()
, and then iteratively supply a line image (PIL.Image.Image
ornumpy.ndarray
) and the ground truth for the line totrain()
. Once finished with training, callsave()
to persist the trained model to disk.For prediction, two methods are available. The simplest,
recognize()
takes a line image (see above) and returns the recognized text as a string. If more information about the recognized text is needed, userecognize_chars()
, which returns a generator that yieldsCharPrediction
objects that contain information about each character (x-offset, confidence and recognized character).-
load
(self, str fname)¶ Load a pre-trained model from disk.
Parameters: fname (str) – Path to pre-trained model on disk
-
prepare_training
(self, lexicon, int num_hidden=100, float learning_rate=0.0001, float momentum=0.9)¶ Prepare training by setting the lexicon and hyperparameters.
Parameters: - lexicon (iterable of str/unicode) – Iterable of characters that are to be recognized by the OCR model, must not have duplicates
- num_hidden (int) – Number of hidden units in the LSTM layers, larger values require more storage/memory and take longer for training and recognition, so try to find a good performance/cost tradeoff.
- learning_rate (float) – Learning rate for the model training
- momentum (float) – Momentum for the model training
-
recognize
(self, img)¶ Recognize the text on the line image.
Parameters: img ( PIL.Image.Image
/numpy.ndarray
) – The line image for the ground truthReturns: The recognized text for the line Return type: unicode
-
recognize_chars
(self, img)¶ - Recognize the characters on the line, along with their position
- and confidence.
Parameters: img ( PIL.Image.Image
/numpy.ndarray
) – The line image for the ground truthReturns: The recognized text for the line, represented as information about its composing characters. Return type: generator that yield CharPrediction
-
train
(self, img, unicode text)¶ Train the model with a line image and its ground truth.
Parameters: - img (
PIL.Image.Image
/numpy.ndarray
) – The line image for the ground truth - text (unicode) – The ground truth text for the line image
Returns: The recognized text for the line image, can be used to estimate error against the ground truth (via
levenshtein()
)Return type: - img (
-