Per Seat: €3.00
Monthly Subscription: €1.00
Usage Based API: €0.00
This toolbox in this Optical Character Recognition (OCR) plugin that allows other apps to detect and read text in images More info...
This tools in this Optical Character Recognition (OCR) plugin that allows other apps to detect and read text in images. Using the integrated deep learning-based text detection model EAST and running on the Tesseract OCR open-source engine, this easy-to-use plugin brings OCR capabilities to apps requiring image text detection and recognition.
Processes the detection results of the EAST text detection
ONNX model,
which can be used in Run ONNX model tool with
the model path
res://com.visionappster.tools.ocr/1/resources/east_text_detection.onnx
.
The tool converts the result coordinates to
world coordinates. It also filters
out the detection results with a confidence lower than specified in
the confidenceThreshold
parameter.
Note that the outputs of this tool should be generally further processed by combining the overlapping detection results. This can be done with the Prune Matches tool.
score
– The score output tensor of the
EAST text detection
model. This is the first output
when the model is used with the
Run ONNX model tool.
confidence
– The confidence output tensor of the
EAST text detection
model. This is the latter output
when the model is used with the
Run ONNX model tool.
image
– The same image that is fed to
Run ONNX model tool as a tensor.
The ONNX model returns pixel coordinates, which this tool
converts to world coordinates
using this image.
confidenceThreshold
– The detection results with confidences
lower than this threshold are filtered out.
frame
– A coordinate frame
for each detection. Each frame is a 4-by-4 matrix totaling in a
4N-by-4 matrix.
size
– The size (width, height) of each detection in the
corresponding frame. An N-by-2 matrix.
confidence
– The confidence of each detection in the scale [0,1].
An N-by-1 matrix.
Reads text in the given image.
This tool uses the Tesseract Optical Character Recognition
(OCR) engine, which utilizes training data files for different
languages and scripts. The engine is able to recognize characters
and read texts with the character sets of the languages that these
data files are trained for. The English trained data file is included
in the component and used by default. This means that the output text
contains only characters used in English texts. In ambiguous cases,
it may also tend to return words belonging to English language rather
than those belonging to other languages having the same character
set. Data files for other languages are available in
https://github.com/tesseract-ocr/tessdata_best and
https://github.com/tesseract-ocr/tessdata_fast. New data files can
also be trained and customized using Tesseract training tools.
To add a new data file, copy it in the directory
components/com.visionappster.tools.ocr/1/resources
in the VisionAppster installation. This tool supports also the
compressed archive format used by Tesseract for the data files.
See the language
parameter for selecting the data files to be
loaded.
image
– The input image.
The
– segmentation mode used with the Tesseract engine.
Automatic
Automatic page segmentation, but no OSD or OCR.Automatic With OSD
Automatic page segmentation with
orientation and script detection (OSD).Automatic With OCR
Fully automatic page segmentation, but no
OSD.Single Column
Assume a single column of text of variable
sizes.Single Vertical Block
Assume a single uniform block of
vertically aligned text.Single Block
Assume a single uniform block of text.
(Default.)Single Line
Treat the image as a single text line.Single Word
Treat the image as a single word.Circled Word
Treat the image as a single word in a circle.Single Character
Treat the image as a single character.Sparse Text
Find as much text as possible in no particular
order.Sparse Text With OSD
Sparse text with orientation and script
detection.Raw Line
Treat the image as a single text line, bypassing
hacks that are Tesseract-specific.autoInvertThreshold
–
The Tesseract engine generally recognizes only dark text on light
background. In this tool, the text reading is first attempted with
the original image and if the resulting confidence falls below
this value, a new attempt is made with an inverted image.
The result with the higher confidence is then returned.
language
– Defines the language training data files to be
loaded for Tesseract engine. The name of the data file
excluding the file extension is used. Multile languages are
defined with a string of the form
[~]<lang>[+[~]<lang>]*
E.g. hin+eng
will load Hindi and English. Languages may
internally specify that they want to be loaded with one or more
other languages, so the ~ sign is available to override that.
E.g. if hin
was set to load eng
by default, then hin+~eng
would force loading only hin
. The number of loaded languages is
limited only by memory, with the caveat that loading additional
languages will impact both speed and accuracy, as there is more
work to do to decide the applicable language, and there is more
chance of hallucinating incorrect words.
languagePath
– The path where the language training data
files are loaded from. The default path is a
resource path
pointing to the internal resources
folder of the installed
com.visionappster.tools.ocr
component.
engine
– The engine version that Tesseract uses. It has two
engines, the legacy Tesseract
engine and the new LSTM
line
recognizer engine. There is rarely a reason to change the default value.
In some extreme cases, using the Tesseract Only
value to force the old
engine version to be used may lead to better results.
Tesseract Only
Use only the old Tesseract engine.LSTM Only
Use only the new LSTM line recognizer engine.Combined
Run the LSTM line recognizer but allow fallback to
the old Tesseract engine if it fails.Default
Allow the language specific configurations in the
data files to specify the used engine or if none is specified,
use the default one.text
– The recognized UTF-8 encoded text.
confidence
– The average confidence of the recognized words
in the returned text in the scale [0,1].