Validation summary
Validation is a set of logic-based rules applied to extracted data. These rules confirm whether the values meet format expectations, fall within acceptable ranges, or align correctly with other fields. Validations are useful for improving trust in automation, reducing manual reviews, and identifying edge cases in document data.
Validation rules are helpful in both Complete and Instant workflows. In the Complete flow, validations act as alerts when there's a potential issue with the data. In the Instant flow, where accuracy may be lower than in the Complete flow, validations are even more critical. These checks serve as an additional signal beyond confidence scores to help you determine whether the response is consumable.
It plays a crucial role in Capture by ensuring the accuracy, consistency, and reliability of extracted data. Through validations, we check the extracted information against predefined rules, data types, and specified parameters to confirm its correctness. This process helps identify and correct errors and ensures that the data used in downstream processes is accurate. It is designed for the persona who is responsible for implementing, managing, or integrating Capture outputs.
Validation results are sent through the following API endpoints:
Info
The validation rule doesn’t apply on the bank statement APIs. The validation summary feature is currently only available via the Ocrolus API and will be supported in the Ocrolus Dashboard in a future release.
{
"status": 200,
"response": {
"pk": 20394203,
"uuid": "b8723b9e-b40e-4b98-a615-a2e1bcd84ea0",
"name": "validation 8",
"created": "2025-02-19T10:44:17Z",
"created_ts": "2025-02-19T10:44:17Z",
"verified_pages_count": 13,
"book_status": "ACTIVE",
"id": 58246386,
"forms": [
{
"pk": 63365407,
"uuid": "519b29b2-0431-4d14-b560-b7ca853df7e9",
"uploaded_doc_pk": 12344343,
"form_type": "W2",
"form_config_pk": 23423423,
"tables": [],
"attribute_data": null,
"raw_fields": {
"box9": {
"value": "",
"is_empty": true,
"alias_used": null,
"source_filename": "combined-all.pdf",
"confidence": 0.828,
"bbox_value": "",
"formatted_value": null,
"localized_formatted_value": null,
"field_validations": [],
"passes_validation": true
},
"year": {
"value": "2021",
"is_empty": false,
"alias_used": null,
"source_filename": "combined-all.pdf",
"confidence": 0.986,
"bbox_value": "2021",
"formatted_value": "2021",
"localized_formatted_value": "2021",
"field_validations": [
{
"uuid": "e85b710a-ed33-4c07-a1be-30257c550ee2",
"message": "Year should be between 2010 and the Current year",
"passes_validation": true
},
{
"uuid": "f2bda243-662c-4c8b-b3b7-7b060eedf58c",
"message": "Value should be an integer",
"passes_validation": true
}
],
"passes_validation": true
},
"doc_validations": {
"validations": [
{
"uuid": "2ba647ec-dc9b-4c92-9959-cba4bb0ca4a3",
"message": "Employer Name should not be same as Box EF Employee Name",
"passes_validation": true
},
{
"uuid": "9cd6fce9-74b2-4ddf-bc09-0a6a8fe92822",
"message": "Medicare Wages And Tips (Box 5) Should Be Greater Than Box 1. Also, Medicare Wages And Tips (Box 5) Should Be Greater Than Box 3",
"passes_validation": true
}
Types of validation
The following are the validation types:
Field validation
Field-level validations include rules that are based on individual fields or a combination of fields. If a field has multiple validations, the system returns validation results for each rule as well as an overall field validation status.
Example:
- The State field in any document should be one of the 50 U.S. states (if the country is US).
- A Social Security Number should be either 5 or 9 digits.
- In a W2, Box 1 should be greater than 250 and less than 400,000.
- In paystubs, the Pay Date should be between 10 and 42 days after the Pay Period Start Date.
Document validation
Document-level validations are applied to a field but require a formula involving two or more fields within the same document.
Example:
- In Form 1065, the Total Balance Due (Line 27) should equal the sum of Lines 23 through 26.
- In a Social Security Card, the Guardian Name should not be the same as Beneficiary Name.
- In a Form 1040, the Total Tax (Line 15) should equal the sum of Line 13 and Line 14.
Validation components
Each validation includes three parts:
- UUID: A unique identifier (in development) for each validation rule. Clients can use UUIDs to match and track validations important to them. UUIDs are stable and intended for automation, unlike messages, which may change.
- Message: A human-readable explanation of the validation rule applied. It’s useful for understanding the rule, but is not intended for automation or logic-based scripting.
- Flag (
passes_validation
): For each of the validations, the keypasses_validation
denotes whether the field has passed the validation test or not. This flag has the following attributes:- Returns
TRUE
if the validation passes or if no rules are applied. - Returns
FALSE
if the validation fails. - If the field is
NULL
,passes_validation
defaults toTRUE
.
- Returns
In addition to validation statuses for each rule applied to a field, there is also a combined validation status for the entire field. If any validation rule fails, the overall validation status of that field is marked as FALSE
.
Normalization
Ocrolus uses a normalization process to standardize data values across the Capture workflow. The purpose is to ensure that the extracted data is stored and presented in a consistent format, compatible with both global standards and client-specific formatting requirements.
Each normalized field includes:
- Raw value (
bboxValue
): The unmodified value as parsed from the document. For example, 01/01/2023. - Normalized value (
formattedValue
): The globally standardized format. For example, 2023-01-01. - Localized normalized value (
localizedFormattedValue
): A client-specific representation, aligned with regional preferences or organizational formatting rules. For example, January 1, 2023.
To know more, see the Normalization documentation for supported data types and formatting rules.
Updated 5 days ago