Finds and reads texts in the image.
The API supports finding the individual text words in the image and
reading the text in each one separately or reading the text in the
whole image as a page.
Finding the bounding boxes for the individual text words uses the
EAST text detector (documentation available in github).
The text reading uses the Tesseract Optical Character Recognition
(OCR) engine, which utilizes training data files for different
languages and scripts. The engine is able to recognize characters
and read texts in the languages that these data files are trained
for. The current API version always uses the English trained data
included in the component. This means that it shows only characters
used in English texts in the text reading results. In ambiguos cases,
it may also tend to return words belonging to English language rather
than those belonging to other languages having the same character
set. Support for the trained data for other languages and scripts is
included in the future versions.
The position information of the text boxes is returned also for other
scripts, even though the texts in them cannot be read correctly.
Tesseract OCR engine generally recognizes only dark text on light
background. However, if the text reading confidence does not meet the
acceptance criteria with the original image, this API attempts
to read the text with the inverted image and returns the result
with the higher confidence.
image – Image containing the texts to be read.
target – The text reading target.
- Value "words" finds the separate words in the image and returns
the bounding boxes and the text reading results separately for
each one of them. Only horizontally orientated words are found,
even though they can be rotated in different angles.
- Value "page" treats the image as a text page and returns the
whole text read from it. The text lines are separated by newline
characters. This mode requires the text to be more strictly
horizontally oriented, only slight rotation is allowed.
Any other value defaults to "page".
texts – A JSON object containing the detected texts with the following format.
"text": "Hello world!"
: Objects An array of the text detection results. If the target
parameter is "page", it contains exactly one detection result
for the whole page. If the target paremeter is "words", it
contains separate detection results for each detected word
in an unspecified order.
: Confidence of a single text reading result in scale [0, 1].
The higher the better.
: Name of the detection, this is always "text".
: The corner points of the text box as integers. The box may be
rotated slightly. The order is top-left, top-right, bottom-right,
bottom-left. If the target parameter is "page", the corner
points of the whole image are returned.
: A bounding box aligned to the X and Y axes for the text. If the
target parameter is "page", the whole image bounding box is
: The recognized text reading result.