Anthropic Document Parser

Parses documents using Anthropic's Claude models to extract structured data that aligns with a JSON schema you provide. This node is helpful when you need to transform unstructured or semi-structured documents into validated, machine-readable JSON without building a bespoke parser.

This parser excels at ad hoc processing or low-volume workflows where you want maximum flexibility and minimal setup. Claude infers the document structure dynamically, so you can iterate quickly when formats vary, though each document may take longer to process as a result.

For high-volume or repeatable formats (invoices, purchase orders, statements, etc.), consider the Azure DI PDF Reader instead. Azure Document Intelligence lets you train custom models or use prebuilt templates, delivering faster parsing and predictable performance at scale.

Revision History

1.0.0.0 Initial Release

Properties

Connection

Type: Connection Input
Supplies the Anthropic API key, Claude model, and response settings shared across all requests.

ApiKey
Type: Password
The Anthropic API key used to authenticate requests.

Model
Type: String
The name of the Claude model that should process the request.
Default: claude-sonnet-4-20250514

MaxOutputTokens
Type: Integer
The maximum number of tokens to return from the Anthropic response.

ContinueOnMaxTokens
Type: Boolean
Determines whether the node should automatically request additional completions when Claude stops due to the maximum token limit.

Action

Type: Enum Input
Controls the node behaviour:

  • CreateSchema – Generate a sample JSON schema that matches the uploaded document.
  • CreateValidationScript – Produce a C# validation script scaffold based on the supplied schema.
  • ParseDocument – Parse the document into structured JSON using the schema and optional validation logic.

Schema

Type: JSON Input
JSON schema that defines the expected output contract. Required for ParseDocument and CreateValidationScript.

ValidationScript

Type: Multiline Text Input
Optional C# code that performs additional validation after schema validation completes.

DocumentName

Type: String Input
Logical name of the document being parsed, including the extension (for example, Statement.pdf).

Document

Type: File Input
The document payload to send to Anthropic. Accepts files, byte arrays, streams, file paths, or raw strings.

Response

Type: JSON Output
Structured JSON returned by Anthropic after parsing completes.

Remarks

Actions

CreateSchema

Requests that Claude infer a schema for the supplied document. Use this as a starting point and refine the schema manually as needed.

CreateValidationScript

Generates a C# script that enforces custom validation rules based on your schema. You can edit the generated code to add business-specific logic.

ParseDocument

Parses the document using the provided Schema, validates the JSON, applies CustomProperties assertions (dot-path matches), and finally runs the optional ValidationScript.

Prompt Customization

Use Custom Properties to override or extend the default prompts:

Key Behaviour
prompt.parse.overwrite Replaces the default parse prompt.
prompt.parse.append Appends instructions to the parse prompt.
prompt.schema.overwrite Replaces the default schema prompt.
prompt.schema.append Appends instructions to the schema prompt.
prompt.script.overwrite Replaces the default validation script prompt.
prompt.script.append Appends to the validation script prompt.
prompt.system.append Appends content to the system prompt.

Validation-specific custom properties should omit these prompt keys to avoid being filtered out.

Example Usage

Use this node in workflows that process uploads such as invoices, claims, or onboarding forms:

  1. Capture a file from storage, email, or a user upload.
  2. Provide the document, schema, and optional validation script to Anthropic Document Parser.
  3. Route the structured JSON into downstream workflow logic for approvals, storage, or analytics.

Comparison: Anthropic Document Parser vs DI Pdf Reader

Feature / Criteria Anthropic Document Parser DI Pdf Reader
Setup Required Minimal – Claude infers structure per document. Requires Azure setup plus either a custom or prebuilt model.
Best For Ad hoc, flexible, low-volume parsing. High-volume, recurring document types.
Performance Slower per document (LLM based). Fast once the model is configured.
Accuracy Strong for loosely structured or varied documents. High for consistent documents with known layouts.
Custom Rules Supports dot-path assertions and C# validation scripts. Enforce via workflow logic or Azure DI settings.
Supported File Types Claude supports PDF, TXT, CSV, DOCX, JSON, and more. Optimized for PDFs and common image formats (JPG, PNG).
Training Required None. Yes (custom training or selecting a prebuilt template).
Scalability Lower – processing time grows with document size. High – tuned for batch and parallel processing.
Pricing Considerations Token-based usage that scales with document length. Predictable Azure usage-based pricing, ideal for volume.
Schema Flexibility Edit or swap schemas instantly. Changing structure requires retraining or a new model.

Links