OCR Toolbox

Per installation
Per month
Per 1000 API calls

This tools in this Optical Character Recognition (OCR) plugin that allows other apps to detect and read text in images. Using the integrated deep learning-based text detection model EAST and running on the Tesseract OCR open-source engine, this easy-to-use plugin brings OCR capabilities to apps requiring image text detection and recognition.

#ocr #text #tools


  • FindText: Detects text in input images using the EAST text detection model
  • ProcessEastResult: A conversion tool that transforms the output tensor of an EAST model into standard VisionAppster data formats
  • TesseractOcr: Reads text in an input image using the Tesseract engine

Release details:

Current version
Last updated
Component ID
Apr 8, 2020, 11:24 AM
Sep 27, 2020, 10:27 PM
tool Documentation

Tesseract OCR

Reads text in the given image.

This tool uses the Tesseract Optical Character Recognition (OCR) engine, which utilizes training data files for different languages and scripts. The engine is able to recognize characters and read texts with the character sets of the languages that these data files are trained for. The English trained data file is included in the component and used by default. This means that the output text contains only characters used in English texts. In ambiguous cases, it may also tend to return words belonging to English language rather than those belonging to other languages having the same character set. Data files for other languages are available in https://github.com/tesseract-ocr/tessdata_best and https://github.com/tesseract-ocr/tessdata_fast. New data files can also be trained and customized using Tesseract training tools. To add a new data file, copy it in the directory components/com.visionappster.tools.ocr/1/resources in the VisionAppster installation. This tool supports also the compressed archive format used by Tesseract for the data files. See the language parameter for selecting the data files to be loaded.


  • image – The input image.

  • The – segmentation mode used with the Tesseract engine.

    • Automatic Automatic page segmentation, but no OSD or OCR.
    • Automatic With OSD Automatic page segmentation with orientation and script detection (OSD).
    • Automatic With OCR Fully automatic page segmentation, but no OSD.
    • Single Column Assume a single column of text of variable sizes.
    • Single Vertical Block Assume a single uniform block of vertically aligned text.
    • Single Block Assume a single uniform block of text. (Default.)
    • Single Line Treat the image as a single text line.
    • Single Word Treat the image as a single word.
    • Circled Word Treat the image as a single word in a circle.
    • Single Character Treat the image as a single character.
    • Sparse Text Find as much text as possible in no particular order.
    • Sparse Text With OSD Sparse text with orientation and script detection.
    • Raw Line Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
  • autoInvertThreshold – The Tesseract engine generally recognizes only dark text on light background. In this tool, the text reading is first attempted with the original image and if the resulting confidence falls below this value, a new attempt is made with an inverted image. The result with the higher confidence is then returned.

  • language – Defines the language training data files to be loaded for Tesseract engine. The name of the data file excluding the file extension is used. Multile languages are defined with a string of the form [~]<lang>[+[~]<lang>]* E.g. hin+eng will load Hindi and English. Languages may internally specify that they want to be loaded with one or more other languages, so the ~ sign is available to override that. E.g. if hin was set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide the applicable language, and there is more chance of hallucinating incorrect words.

  • languagePath – The path where the language training data files are loaded from. The default path is a resource path pointing to the internal resources folder of the installed com.visionappster.tools.ocr component.

  • engine – The engine version that Tesseract uses. It has two engines, the legacy Tesseract engine and the new LSTM line recognizer engine. There is rarely a reason to change the default value. In some extreme cases, using the Tesseract Only value to force the old engine version to be used may lead to better results.

    • Tesseract Only Use only the old Tesseract engine.
    • LSTM Only Use only the new LSTM line recognizer engine.
    • Combined Run the LSTM line recognizer but allow fallback to the old Tesseract engine if it fails.
    • Default Allow the language specific configurations in the data files to specify the used engine or if none is specified, use the default one.


  • text – The recognized UTF-8 encoded text.

  • confidence – The average confidence of the recognized words in the returned text in the scale [0,1].

Process EAST result

Processes the detection results of the EAST text detection ONNX model, which can be used in Run ONNX model tool with the model path res://com.visionappster.tools.ocr/1/resources/east_text_detection.onnx.

The tool converts the result coordinates to world coordinates. It also filters out the detection results with a confidence lower than specified in the confidenceThreshold parameter.

Note that the outputs of this tool should be generally further processed by combining the overlapping detection results. This can be done with the Prune Matches tool.


  • score – The score output tensor of the EAST text detection model. This is the first output when the model is used with the Run ONNX model tool.

  • confidence – The confidence output tensor of the EAST text detection model. This is the latter output when the model is used with the Run ONNX model tool.

  • image – The same image that is fed to Run ONNX model tool as a tensor. The ONNX model returns pixel coordinates, which this tool converts to world coordinates using this image.

  • confidenceThreshold – The detection results with confidences lower than this threshold are filtered out.


  • frame – A coordinate frame for each detection. Each frame is a 4-by-4 matrix totaling in a 4N-by-4 matrix.

  • size – The size (width, height) of each detection in the corresponding frame. An N-by-2 matrix.

  • confidence – The confidence of each detection in the scale [0,1]. An N-by-1 matrix.