Captured field

Overview

To enhance the usefulness of response confidence, Ocrolus maps it to different statistically relevant estimations that can assist client-side automation logic.

Model Name: CapturedField

Details of a particular field, as captured from the submitted document.

Properties

bbox

Type
List of numbers
Required
true
Min. Item Count
4
Max. Item Count
4

The coordinates of the bounding box (i.e. rectangle) that encloses the original page's text data, in units of pixels. If the document was provided as a PDF, we assume a DPI of 300. Coordinates are given in the order of distances from the left, top, bottom, and right edges. The coordinates of (0, 0) represent the top-left corner of the document.

children

Type
Map of arrays
Required
true
Min. Item Count
0

Fields that are nested within this one, if any. This field follows the same structure as the owning CapturedDocument's fields attribute. Used with certain form types that have complicated structures. Will be empty if this field does not have any children.

confidence

Type
Double (double)
Required
true
Min. Value
0
Max. Value
1

Our model's confidence in this field's captured data, ranged between 0 and 1. Higher values indicate better predictions.

page_index

Type
Integer
Required
true
Min. Value
0

The zero-indexed page of the original Document from which the captured data was extracted. This index is relative to the uploaded file, not to the Document within the file. This value will come from one of the indexes in the parent object's page_indexes field.

value

Type
String
Required
true

The value of this form field as captured from the original document. Will always be a string, regardless of the semantics of the field. May be empty if the original form field was empty.

precision

Type Double
double
Required
true
Min. Value
0
Max. Value
1

An estimation to statistically interpret the confidence number output by the model and evaluation of the likelihood of the provided value to be correct. This estimation can be used by the client to filter responses, potentially increasing the system's perceived precision. However, this approach may result in discarding some suggestions.

For example, if the desired error rate is less than 5%, a threshold of 0.95 can be set on the precision, and any responses falling below this threshold can be discarded.

In general, it is possible to trade off precision for recall. Relaxing the precision requirement can lead to an increase in recall.

recall

Type Double
double
Required
true
Min. Value
0
Max. Value
1

An estimation to statistically interpret the confidence number generated by the model in terms of the proportion of responses that are as precise or better than the current response. This information can be helpful in filtering for the most accurate responses.

For example, to accept 80% of responses and subject the remaining 20% to further review, we can accept any response with a recall score below 0.80 and review those above the threshold.

precision_tags

Type
List of strings
Required
true

Ocrolus adds tags to responses when a static threshold is not applicable or would vary too much across different fields.

  • Optimal_f1_score: This tag indicates that the precision is higher than what one would choose to optimize for both precision and recall. It's also known as the F1 score. This tag can be useful for filtering output if there is no specific error rate goal.

  • Targeted_95_precision_threshold: This is the deprecated tag and has been replaced by the precision tag. This tag will be present when the precision is higher than 0.95.

  • Targeted_98_precision_threshold: This is the deprecated tag and has been replaced by the precision tag. This tag will be present when the precision is higher than 0.98.