The template is a configuration file (JSON) that contains rules, flags, and thresholds to modify the recognition process.
With a template, you can:
Templates use rules with conditions to identify objects based on their properties, such as:
For example:
Each rule may contain properties or set flags to be applied to an object in order to modify the recognition and tagging algorithm. For example:
Templates can define objects known before running the recognition. For example, if a document contains an address section in the top-left corner of the first page, the user can predefine a text object (P tag) by a bounding box.
This is a useful feature when using another layout recognition tool that generates partial results or when the application provides methods to define these predefined objects.
The information provided to the template can be for example Page number, Bounding box, Object type (image, table, text, list, etc.).
The Configuration Process of the templates are usually configured on a small set of documents. Once the configuration is finalized, it can be applied to a large number of similar documents.
Typical Use Cases are Bank statements, Invoices, Any document type created from the same template but containing different data.
To create a template, use PDFix Desktop. This application provides an interactive interface to design templates and view instant results.
Following example defines a template rule for defining the H1 heading by font size, name and color.
{
"template": {
"statement": "$if",
"heading": "h1",
"query": {
"$and": [
{
"$0_font_size": [
{
"$gte": "36"
}
]
},
{
"$0_font_name": {
"$regex": "TimesNewRomanPS-BoldMT"
}
},
{
"$0_fill_color": [
"0",
"0",
"0"
]
}
],
"param": [
"pde_text"
]
},
}
}