OCR-D wrapper for detectron2 based segmentation models
This offers OCR-D compliant workspace processors for document layout analysis with models trained on Detectron2, which implements Faster R-CNN, Mask R-CNN, Cascade R-CNN, Feature Pyramid Networks and Panoptic Segmentation, among others.
In trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of models may be difficult, and needs configuration. Class labels (really PAGE-XML region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).
Only meant for (coarse) page segmentation into regions – no text lines, no reading order, no orientation.
Create and activate a virtual environment as usual.
To install Python dependencies:
make deps
Which is the equivalent of:
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only
To install this module, then do:
make install
Which is the equivalent of:
pip install .
ocrd-detectron2-segment
To be used with PAGE-XML documents in an OCR-D annotation workflow.
Usage: ocrd-detectron2-segment [OPTIONS]
Detect regions with Detectron2 models
> Use detectron2 to segment each page into regions.
> Open and deserialize PAGE input files and their respective images.
> Fetch a raw and a binarized image for the page frame (possibly
> cropped and deskewed).
> Feed the raw image into the detectron2 predictor that has been used
> to load the given model. Then, depending on the model capabilities
> (whether it can do panoptic segmentation or only instance
> segmentation, whether the latter can do masks or only bounding
> boxes), post-process the predictions:
> - panoptic segmentation: take the provided segment label map, and
> apply the segment to class label map,
> - instance segmentation: find an optimal non-overlapping set (flat
> map) of instances via non-maximum suppression,
> - both: avoid overlapping pre-existing top-level regions (incremental
> segmentation).
> Then extend / shrink the surviving masks to fully include / exclude
> connected components in the foreground that are on the boundary.
> (This describes the steps when ``postprocessing`` is `full`. A value
> of `only-nms` will omit the morphological extension/shrinking, while
> `only-morph` will omit the non-maximum suppression, and `none` will
> skip all postprocessing.)
> Finally, find the convex hull polygon for each region, and map its
> class id to a new PAGE region type (and subtype).
> (Does not annotate `ReadingOrder` or `TextLine`s or `@orientation`.)
> Produce a new output file by serialising the resulting hierarchy.
Options:
-I, --input-file-grp USE File group(s) used as input
-O, --output-file-grp USE File group(s) used as output
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
--profile Enable profiling
--profile-file Write cProfile stats to this file. Implies --profile
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
or JSON file path
-P, --param-override KEY VAL Override a single JSON object key-value pair,
taking precedence over --parameter
-m, --mets URL-PATH URL or file path of METS to process
-w, --working-dir PATH Working directory of local workspace
-l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
Log level
-C, --show-resource RESNAME Dump the content of processor resource RESNAME
-L, --list-resources List names of processor resources
-J, --dump-json Dump tool description as JSON and exit
-D, --dump-module-dir Output the 'module' directory with resources for this processor
-h, --help This help message
-V, --version Show version
Parameters:
"operation_level" [string - "page"]
hierarchy level which to predict and assign regions for
Possible values: ["page", "table"]
"categories" [array - REQUIRED]
maps each category (class index) of the model to a PAGE region
type (and @type or @custom if separated by colon), e.g.
['TextRegion:paragraph', 'TextRegion:heading',
'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet;
categories with an empty string will be skipped during prediction
"model_config" [string - REQUIRED]
path name of model config
"model_weights" [string - REQUIRED]
path name of model weights
"min_confidence" [number - 0.5]
confidence threshold for detections
"postprocessing" [string - "full"]
which postprocessing steps to enable: by default, applies a custom
non-maximum suppression (to avoid overlaps) and morphological
operations (using connected component analysis on the binarized
input image to shrink or expand regions)
Possible values: ["full", "only-nms", "only-morph", "none"]
"debug_img" [string - "none"]
paint an AlternativeImage which blends the input image
and all raw decoded region candidates
Possible values: ["none", "instance_colors", "instance_colors_only", "category_colors"]
"device" [string - "cuda"]
select computing device for Torch (e.g. cpu or cuda:0); will fall
back to CPU if no GPU is available
Example:
# download one preconfigured model:
ocrd resmgr download ocrd-detectron2-segment TableBank_X152.yaml
ocrd resmgr download ocrd-detectron2-segment TableBank_X152.pth
# run it (setting model_config, model_weights and categories):
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '["TableRegion"]' -P model_config TableBank_X152.yaml -P model_weights TableBank_X152.pth -P min_confidence 0.1
# run it (equivalent, with presets file)
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -p presets_TableBank_X152.json -P min_confidence 0.1
# download all preconfigured models
ocrd resmgr download ocrd-detectron2-segment "*"
If you mistrust your model, and/or this tool’s additional postprocessing, try playing with the runtime parameters:
debug_img
to some value other than none
, e.g. instance_colors_only
.
This will generate an image which overlays the raw predictions with the raw image
using Detectron2’s internal visualiser. The parameter settings correspond to its
ColorMode.
The AlternativeImages will have @comments="debug"
, and will also be referenced in the METS,
which allows convenient browsing with OCR-D Browser.
(For example, open the Page View and Image View side by side, and navigate to your output
fileGrp on each.)full
via only-nms
(first stage)
or only-morph
(second stage) to none
.min_confidence
to get more candidates, raise to get fewer.Some of the following models have already been registered as known file resources, along with parameter presets to use them conveniently.
To get a list of registered models available for download, do:
ocrd resmgr list-available -e ocrd-detectron2-segment
To get a list of already installed models and presets, do:
ocrd resmgr list-installed -e ocrd-detectron2-segment
To download a registered model (i.e. a config file and the respective weights file), do:
ocrd resmgr download ocrd-detectron2-segment NAME.yaml
ocrd resmgr download ocrd-detectron2-segment NAME.pth
To download more models (registered or other), see:
ocrd resmgr download --help
To use a model, do:
ocrd-detectron2-segment -P model_config NAME.yaml -P model_weights NAME.pth -P categories '[...]' ...
ocrd-detectron2-segment -p NAME.json ... # equivalent, with presets file
To add (i.e. register) a new model, you first have to find:
Assuming you have done so, then proceed as follows:
# from local file path
ocrd resmgr download -n path/to/model/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n path/to/model/weights.pth ocrd-detectron2-segment NAME.pth
# from single file URL
ocrd resmgr download -n https://path.to/model/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n https://path.to/model/weights.pth ocrd-detectron2-segment NAME.pth
# from zip file URL
ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/weights.pth ocrd-detectron2-segment NAME.pth
# create corresponding preset file
echo '{"model_weights": "NAME.pth", "model_config": "NAME.yml", "categories": [...]}' > NAME.json
# install preset file so it can be used everywhere (not just in CWD):
ocrd resmgr download -n NAME.json ocrd-detectron2-segment NAME.json
# now the new model can be used just like the preregistered models
ocrd-detectron2-segment -p NAME.json ...
What follows is an overview of the preregistered models (i.e. available via resmgr
).
Note: These are just examples, no exhaustive search was done yet!
Note: The filename suffix (.pth vs .pkl) of the weight file does matter!
X152-FPN config | weights | ["TableRegion"] |
X152-FPN config | weights | ["TableRegion"] |
R50-FPN config | weights | ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"] |
R101-FPN config | weights | ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"] |
X101-FPN config | weights | ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"] |
R50-FPN config | weights | ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"] |
R101-FPN config | weights | ["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"] |
provides different model variants of various depths for multiple datasets:
["Background","TextRegion","ImageRegion","TableRegion","MathsRegion","SeparatorRegion","LineDrawingRegion"]
See here for an overview, and here for the model files. You will have to adapt the label map to conform to PAGE-XML region (sub)types accordingly.
(pre-trained on PubLayNet, fine-tuned on a custom, non-public GT corpus of 500 pages 20th century magazines)
X101-FPN config | weights | ["TextRegion:caption","ImageRegion","TextRegion:page-number","TableRegion","TextRegion:heading","TextRegion:paragraph"] |
X101-FPN archive
Proposed mappings:
["TextRegion:header", "TextRegion:credit", "TextRegion:caption", "TextRegion:other", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:floating", "TextRegion:paragraph", "TextRegion:endnote", "TextRegion:heading", "TableRegion", "TextRegion:heading"]
(using only predefined @type
)["TextRegion:abstract", "TextRegion:author", "TextRegion:caption", "TextRegion:date", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:list", "TextRegion:paragraph", "TextRegion:reference", "TextRegion:heading", "TableRegion", "TextRegion:title"]
(using @custom
as well)To install Python dependencies and download some models:
make deps-test
Which is the equivalent of:
pip install -r requirements-test.txt
make models-test
To run the tests, then do:
make test
You can inspect the results under test/assets/*/data
under various new OCR-D-SEG-*
fileGrps.
(Again, it is recommended to use OCR-D Browser.)
Finally, to remove the test data, do:
make clean
These tests are integrated as a Github Action. Its results can be viewed here.