ocrd_detectron2

PyPI version Python test

ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models

Introduction

This offers OCR-D compliant workspace processors for document layout analysis with models trained on Detectron2, which implements Faster R-CNN, Mask R-CNN, Cascade R-CNN, Feature Pyramid Networks and Panoptic Segmentation, among others.

In trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of models may be difficult, and needs configuration. Class labels (really PAGE-XML region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).

Only meant for (coarse) page segmentation into regions – no text lines, no reading order, no orientation.

Installation

Create and activate a virtual environment as usual.

To install Python dependencies:

make deps

Which is the equivalent of:

pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only

To install this module, then do:

make install

Which is the equivalent of:

pip install .

Usage

OCR-D processor interface ocrd-detectron2-segment

To be used with PAGE-XML documents in an OCR-D annotation workflow.

Usage: ocrd-detectron2-segment [OPTIONS]

  Detect regions with Detectron2 models

  > Use detectron2 to segment each page into regions.

  > Open and deserialize PAGE input files and their respective images.
  > Fetch a raw and a binarized image for the page frame (possibly
  > cropped and deskewed).

  > Feed the raw image into the detectron2 predictor that has been used
  > to load the given model. Then, depending on the model capabilities
  > (whether it can do panoptic segmentation or only instance
  > segmentation, whether the latter can do masks or only bounding
  > boxes), post-process the predictions:

  > - panoptic segmentation: take the provided segment label map, and
  >   apply the segment to class label map,
  > - instance segmentation: find an optimal non-overlapping set (flat
  >   map) of instances via non-maximum suppression,
  > - both: avoid overlapping pre-existing top-level regions (incremental
  >   segmentation).

  > Then extend / shrink the surviving masks to fully include / exclude
  > connected components in the foreground that are on the boundary.

  > (This describes the steps when ``postprocessing`` is `full`. A value
  > of `only-nms` will omit the morphological extension/shrinking, while
  > `only-morph` will omit the non-maximum suppression, and `none` will
  > skip all postprocessing.)

  > Finally, find the convex hull polygon for each region, and map its
  > class id to a new PAGE region type (and subtype).

  > (Does not annotate `ReadingOrder` or `TextLine`s or `@orientation`.)

  > Produce a new output file by serialising the resulting hierarchy.

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  --profile                       Enable profiling
  --profile-file                  Write cProfile stats to this file. Implies --profile
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME
  -L, --list-resources            List names of processor resources
  -J, --dump-json                 Dump tool description as JSON and exit
  -D, --dump-module-dir           Output the 'module' directory with resources for this processor
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "operation_level" [string - "page"]
    hierarchy level which to predict and assign regions for
    Possible values: ["page", "table"]
   "categories" [array - REQUIRED]
    maps each category (class index) of the model to a PAGE region
    type (and @type or @custom if separated by colon), e.g.
    ['TextRegion:paragraph', 'TextRegion:heading',
    'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet;
    categories with an empty string will be skipped during prediction
   "model_config" [string - REQUIRED]
    path name of model config
   "model_weights" [string - REQUIRED]
    path name of model weights
   "min_confidence" [number - 0.5]
    confidence threshold for detections
   "postprocessing" [string - "full"]
    which postprocessing steps to enable: by default, applies a custom
    non-maximum suppression (to avoid overlaps) and morphological
    operations (using connected component analysis on the binarized
    input image to shrink or expand regions)
    Possible values: ["full", "only-nms", "only-morph", "none"]
   "debug_img" [string - "none"]
    paint an AlternativeImage which blends the input image
    and all raw decoded region candidates
    Possible values: ["none", "instance_colors", "instance_colors_only", "category_colors"]
   "device" [string - "cuda"]
    select computing device for Torch (e.g. cpu or cuda:0); will fall
    back to CPU if no GPU is available

Example:

# download one preconfigured model:
ocrd resmgr download ocrd-detectron2-segment TableBank_X152.yaml
ocrd resmgr download ocrd-detectron2-segment TableBank_X152.pth
# run it (setting model_config, model_weights and categories):
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '["TableRegion"]' -P model_config TableBank_X152.yaml -P model_weights TableBank_X152.pth -P min_confidence 0.1
# run it (equivalent, with presets file)
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -p presets_TableBank_X152.json -P min_confidence 0.1 
# download all preconfigured models
ocrd resmgr download ocrd-detectron2-segment "*"

Debugging

If you mistrust your model, and/or this tool’s additional postprocessing, try playing with the runtime parameters:

Models

Some of the following models have already been registered as known file resources, along with parameter presets to use them conveniently.

To get a list of registered models available for download, do:

ocrd resmgr list-available -e ocrd-detectron2-segment

To get a list of already installed models and presets, do:

ocrd resmgr list-installed -e ocrd-detectron2-segment

To download a registered model (i.e. a config file and the respective weights file), do:

ocrd resmgr download ocrd-detectron2-segment NAME.yaml
ocrd resmgr download ocrd-detectron2-segment NAME.pth

To download more models (registered or other), see:

ocrd resmgr download --help

To use a model, do:

ocrd-detectron2-segment -P model_config NAME.yaml -P model_weights NAME.pth -P categories '[...]' ...
ocrd-detectron2-segment -p NAME.json ... # equivalent, with presets file

To add (i.e. register) a new model, you first have to find:

Assuming you have done so, then proceed as follows:

# from local file path
ocrd resmgr download -n path/to/model/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n path/to/model/weights.pth ocrd-detectron2-segment NAME.pth
# from single file URL
ocrd resmgr download -n https://path.to/model/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n https://path.to/model/weights.pth ocrd-detectron2-segment NAME.pth
# from zip file URL
ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/weights.pth ocrd-detectron2-segment NAME.pth
# create corresponding preset file
echo '{"model_weights": "NAME.pth", "model_config": "NAME.yml", "categories": [...]}' > NAME.json
# install preset file so it can be used everywhere (not just in CWD):
ocrd resmgr download -n NAME.json ocrd-detectron2-segment NAME.json
# now the new model can be used just like the preregistered models
ocrd-detectron2-segment -p NAME.json ...

What follows is an overview of the preregistered models (i.e. available via resmgr).

Note: These are just examples, no exhaustive search was done yet!

Note: The filename suffix (.pth vs .pkl) of the weight file does matter!

TableBank

X152-FPN config weights ["TableRegion"]

TableBank

X152-FPN config weights ["TableRegion"]

PubLayNet

R50-FPN config weights ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
R101-FPN config weights ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
X101-FPN config weights ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

PubLayNet

R50-FPN config weights ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
R101-FPN config weights ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

LayoutParser

provides different model variants of various depths for multiple datasets:

See here for an overview, and here for the model files. You will have to adapt the label map to conform to PAGE-XML region (sub)types accordingly.

PubLaynet finetuning

(pre-trained on PubLayNet, fine-tuned on a custom, non-public GT corpus of 500 pages 20th century magazines)

X101-FPN config weights ["TextRegion:caption","ImageRegion","TextRegion:page-number","TableRegion","TextRegion:heading","TextRegion:paragraph"]

DocBank

X101-FPN archive

Proposed mappings:

Testing

To install Python dependencies and download some models:

make deps-test

Which is the equivalent of:

pip install -r requirements-test.txt
make models-test

To run the tests, then do:

make test

You can inspect the results under test/assets/*/data under various new OCR-D-SEG-* fileGrps. (Again, it is recommended to use OCR-D Browser.)

Finally, to remove the test data, do:

make clean

Test results

These tests are integrated as a Github Action. Its results can be viewed here.