Optical Character Recognition (OCR) Filter
The FilterOpticalCharacterRecognition
is a pluggable filter that extracts text from image frames using Optical Character Recognition (OCR). It supports multiple OCR backends and offers flexible configuration for language support, output, and debug logging.
Features
-
Dual OCR Engine Support
Choose between: -
Multi-language OCR
Use theocr_language
option to specify one or more language codes (e.g.,['en', 'fr']
). -
Output to JSON
Extracted text is written to a newline-delimited JSON file at the path specified byoutput_json_path
. -
Debug Mode
Enablingdebug: true
will increase logging verbosity for troubleshooting and transparency. -
Frame-level Skipping
Add the metadata flagskip_ocr: true
to individual frames to bypass OCR processing. -
Custom Tesseract Path
You can specify a customtesseract_cmd
binary path if using the Tesseract engine (defaults to a bundled AppImage). -
Safe Streaming Output
Results are flushed to disk immediately after processing each frame.NoteThis may lead to heavy I/O operations. A configurable flushing strategy is planned for future releases.
Example Output
Each processed frame will produce a JSON line similar to:
{
"frame_id": "abc123",
"texts": ["Detected text line 1", "Detected text line 2"]
}
When to Use
This filter is ideal for any pipeline that requires reading printed or handwritten text from images, such as:
- Scanned documents
- Signboards or product packaging in photos
- Scene text in videos
Configuration Reference
Key | Type | Default | Description |
---|---|---|---|
ocr_engine | string | "easyocr" | OCR engine to use: "tesseract" or "easyocr" |
ocr_language | string[] | ["en"] | Language codes for OCR |
output_json_path | string | "./output/ocr_results.json" | Path to save output results |
debug | boolean | false | Enable debug logging |
tesseract_cmd | string | Packaged AppImage path | Path to Tesseract binary |