Authenticity score

In addition to comprehensive fraud signals revealing the context and method of tampering, we provide a singular Authenticity score, indicating the likelihood that a document is genuine.

The Authenticity score ranges from 0-100. The score weighs the context of what was tampered with and our confidence in the signal.

We categorize scores as follows:

  • 0-29: VERY LOW authenticity
  • 30-49: LOW authenticity
  • 50-79: MEDIUM authenticity
  • 80+: HIGH authenticity

Reason codes are returned to offer transparency into the score, and can double as high-level signals that allow for simple logical rules to determine the flow of the document. The score should be a helpful signal to prioritize document flow.

You may find that below or above a certain threshold can be rejected or approved without manual review.

Authenticity score in the Dashboard

The Authenticity Score column on the Book List gives the lowest Authenticity Score for any document in that book, allowing users to home in on fraud. The column is sortable, enabling users to streamline their workflow by directing their attention to severe fraud (low scores) that should be rejected promptly, or authentic documents (high scores) that can be approved with minimal review. Likewise, the column on the Book Overview page gives the lowest score of any document in a given upload.

Authenticity score is also displayed on the Detect tab on the document detail page with additional details including reason codes.

Authenticity score is also displayed on the Detect tab on the document detail page with additional details including reason codes.

In the Detect tab of the document detail, we display reason codes associated with the score. With each reason code, we share our confidence in that finding.

Any reason codes that dramatically impact scores are highlighted in bold red text. The reason codes serve as high-level signals and can also help with your decision.

Our confidence that account info has been tampered is high.

Our confidence that account info has been tampered is high.

Examples

  • VERY LOW

  • LOW

  • MEDIUM

  • HIGH

Choosing a threshold

The score is derived from an assessment of both the severity of identified signals and the corresponding confidence levels.

As a general guideline, scores below 50 are categorized as low authenticity. However, if you are less concerned with balance tampering, for example, you might consider 35 and under to be low authenticity. Conversely, if any instance of tampering is grounds for rejection, a threshold below 70 might be deemed as low authenticity.

Authenticity score in the API

Authenticity score is returned at the document level. The response includes the numerical score and reason codes.

"form_authenticity": {
            "version": "1.0",
            "score": 70,
            "reason_codes": [
              {
                "code": "110-H",
                "confidence": "HIGH",
                "description": "bank statement account info tampered"
              },
              {
                "code": "120-M",
                "confidence": "MEDIUM",
                "description": "bank statement balance info tampered"
              }
            ]
          }

Each reason code is comprised of a distinctive identifier code (e.g., 110-H), a descriptive label such as bank statement account info tampered, and an assigned confidence level (HIGH/MEDIUM/LOW) indicating our degree of confidence in the respective signal. In this instance, we express high confidence in detecting tampering with account information and moderate confidence regarding potential tampering with balance information.

Using Authenticity score to optimize your workflow

Dashboard Users

In only 3 clicks, you can navigate to visualizations and signals outlining your worst fraud:

  1. Sort by Authenticity score in the dashboard to find high-risk documents that need urgent review.
  2. Click on Books with low scores.
  3. Then, click on documents within that book with the lowest scores to review detailed findings.

A similar approach can be used to identify low risk documents that can be moved forward with minimal review.

A weekly or daily workflow may involve filtering for date range to include documents from the previous day or previous week, then sorting by score to prioritize which documents need review.

Books are filtered for the current week and then sorted by Authenticity Score in ascending order.

Books are filtered for the current week and then sorted by Authenticity Score in ascending order.

API Users

The numerical reason codes returned in form_authenticity, allow for very simple knockout rules or workflows to be added.

The score itself can be used to automate workflow based on specified thresholds e.g. 30 or below (which corresponds to tampering of identifying information), or 40 or below (which includes tampering of balance/earnings information).

Additionally, specific codes can be designated for automated rejection or for triggering urgent review email notifications.

In the given example, specified AUTO_REJECT_CODES would be set for auto-rejection, while specified URGENT_REVIEW_CODES would prompt automatic emails to designated accounts for urgent review. Finally, non-parsable documents (specified by IMAGE_CODES) would be sent to analysts for further review.

AUTO_REJECT_CODES = {100-H, 100-M, 110-H, 110-M, 005-H, 005-M}
URGENT_REVIEW_CODES = {100-L, 110-L, 120-H, 120-M}
IMAGE_CODES = {007-H, 007-M}

if reason_codes & AUTO_REJECT_CODES:
	reject_document(doc_uuid) 
elif reason_codes & URGENT_REVIEW_CODES:
	send_to(doc_uuid, [[email protected], [email protected]], priority=”High”) 
elif reason_codes & IMAGE_CODES: 
	#document is an image
	send_to(doc_uuid, [[email protected], [email protected]], priority=”Medium”)  


FAQs

  • How do False Positives affect the score?
    Because the authenticity score has confidence built in, true positives should have a lower score and therefore can be prioritized in the flow.
  • Why do some docs not have scores?
    We will not backfill the score, so the score will only be populated for documents uploaded or rerun on or after November 15, 2023.
  • Why am I not seeing scores of 0 or 100?
    Because we cannot be 100% sure of the authenticity of a document, we do not return scores of 0 or 100. The lowest possible score is 10. The highest possible score on an image is 80, and the highest possible score on an e-pdf is 90.
  • Why are some reason codes in bold on the dashboard?
    Some reason codes that warrant special attention such as “bank statement account number tampered: high confidence” are shown in red bold font.
  • Why do I see an asterisk next to some documents?
    An asterisk indicates that the document is non-parsable.