Optical Character Recognition

Per installation
Per month
Per 1000 API calls

Optical Character Recognition (OCR) technology is the automation of data extraction from printed or written text from a scanned document or image file and its conversion into a machine-readable form that can be used for data processing tasks like editing or searching.

This OCR API allows you to automatically identify individual words or entire pages of text in use cases ranging from the data entry of various business documents to the verification of passports and other forms of identification, identifying license plates, traffic sign recognition, and much more.

#ocr #text


  • Data entry for business documents like business cards / checks / the invoices and receipts
  • Automatic verification of IDs like passports
  • Rendering scanned documents searchable
  • Reading license plates
  • Traffic sign recognition

Release details:

Current version
Last updated
Component ID
Apr 8, 2020, 11:12 AM
Sep 27, 2020, 10:27 PM
api Documentation

Read texts

Finds and reads texts in the image.

The API supports finding the individual text words in the image and reading the text in each one separately or reading the text in the whole image as a page.

Finding the bounding boxes for the individual text words uses the EAST text detector (documentation available in github).

The text reading uses the Tesseract Optical Character Recognition (OCR) engine, which utilizes training data files for different languages and scripts. The engine is able to recognize characters and read texts in the languages that these data files are trained for. The current API version always uses the English trained data included in the component. This means that it shows only characters used in English texts in the text reading results. In ambiguos cases, it may also tend to return words belonging to English language rather than those belonging to other languages having the same character set. Support for the trained data for other languages and scripts is included in the future versions.

The position information of the text boxes is returned also for other scripts, even though the texts in them cannot be read correctly.

Tesseract OCR engine generally recognizes only dark text on light background. However, if the text reading confidence does not meet the acceptance criteria with the original image, this API attempts to read the text with the inverted image and returns the result with the higher confidence.


  • image – Image containing the texts to be read.

  • target – The text reading target.

    • Value "words" finds the separate words in the image and returns the bounding boxes and the text reading results separately for each one of them. Only horizontally orientated words are found, even though they can be rotated in different angles.
    • Value "page" treats the image as a text page and returns the whole text read from it. The text lines are separated by newline characters. This mode requires the text to be more strictly horizontally oriented, only slight rotation is allowed.

    Any other value defaults to "page".


  • texts – A JSON object containing the detected texts with the following format.

      "objects": [
          "confidence": 0.91,
          "name": "text",
          "polygon": [
          "rectangle": {
            "h": 176,
            "w": 800,
            "x": 0,
            "y": 0
          "text": "Hello world!"

objects : Objects An array of the text detection results. If the target parameter is "page", it contains exactly one detection result for the whole page. If the target paremeter is "words", it contains separate detection results for each detected word in an unspecified order.

confidence : Confidence of a single text reading result in scale [0, 1]. The higher the better.

name : Name of the detection, this is always "text".

polygon : The corner points of the text box as integers. The box may be rotated slightly. The order is top-left, top-right, bottom-right, bottom-left. If the target parameter is "page", the corner points of the whole image are returned.

rectangle : A bounding box aligned to the X and Y axes for the text. If the target parameter is "page", the whole image bounding box is returned.

text : The recognized text reading result.