Skip to main content

Optical Character Recognition (OCR) Filter

The FilterOpticalCharacterRecognition is a pluggable filter that extracts text from image frames using Optical Character Recognition (OCR). It supports multiple OCR backends and offers flexible configuration for language support, output, and debug logging.

Features

  • Dual OCR Engine Support
    Choose between:

  • Multi-language OCR
    Use the ocr_language option to specify one or more language codes (e.g., ['en', 'fr']).

  • Output to JSON
    Extracted text is written to a newline-delimited JSON file at the path specified by output_json_path.

  • Debug Mode
    Enabling debug: true will increase logging verbosity for troubleshooting and transparency.

  • Frame-level Skipping
    Add the metadata flag skip_ocr: true to individual frames to bypass OCR processing.

  • Custom Tesseract Path
    You can specify a custom tesseract_cmd binary path if using the Tesseract engine (defaults to a bundled AppImage).

  • Safe Streaming Output
    Results are flushed to disk immediately after processing each frame.

    Note

    This may lead to heavy I/O operations. A configurable flushing strategy is planned for future releases.

Example Output

Each processed frame will produce a JSON line similar to:

{
"frame_id": "abc123",
"texts": ["Detected text line 1", "Detected text line 2"]
}

When to Use

This filter is ideal for any pipeline that requires reading printed or handwritten text from images, such as:

  • Scanned documents
  • Signboards or product packaging in photos
  • Scene text in videos

Configuration Reference

KeyTypeDefaultDescription
ocr_enginestring"easyocr"OCR engine to use: "tesseract" or "easyocr"
ocr_languagestring[]["en"]Language codes for OCR
output_json_pathstring"./output/ocr_results.json"Path to save output results
debugbooleanfalseEnable debug logging
tesseract_cmdstringPackaged AppImage pathPath to Tesseract binary