This page describes use cases and code examples for creating a template configuration.
For more information about the template configuration, click here to access detailed documentation.
The complete set of thresholds and triggers is documented on GitHub.
Here is a set of the most common triggers:
on/off
{
"template": {
"pagemap": [
{
"table_detect_sect": 0
}
]
}
}
on/off
{
"template": {
"pagemap": [
{
"table_detect_sect": 0
}
]
}
}
0 - built-in
1 - content-based
2 - x-y sort
{
"template": {
"pagemap": [
{
"rd_sort": 2
}
]
}
}
Regular expressions and text functions are used to identify and mark specific text content, assigning properties and flags as needed.
Define a custom bullet of the Unicode ββΈβ (\u2738).
{
"template": {
"pagemap_regex": [
{
"regex_bullet": "^[\\u2022\\u...\\u2738]$"
}
]
}
}
Define conditions for text on page 1 with a minimum font size of 36 and the font name matching the regular expression TimesNewRomanPS-BoldMT.
{
"template": {
"text_update": [
{
"heading": "h1",
"query": {
"$and": [
{
"$0_font_size": [
{
"$gte": "36"
}
]
},
{
"$0_font_name": {
"$regex": "TimesNewRomanPS-BoldMT"
}
},
{
"$_page_num": {
"$eq": 1
}
}
],
"param": [
"pde_text"
]
}
}
]
}
}
Page Object Functions are executed on low-level PDF page objects, such as paths, images, and text chunks, prior to any layout recognition.
{
"template": {
"path_object_process": [
{
"query": {
"$and": [
{
"$0_fill_color": [
"255",
"255",
"255"
]
}
],
"param": [
"pds_path"
]
},
"flag": "artifact"
}
]
}
}
{
"template": {
"element_update": [
{
"flag": "header",
"query": {
"$and": [
{
"$0_bottom": {
"$gte": "698.32"
}
}
],
"param": [
"pde_element"
]
},
"statement": "$if"
}
]
}
}