Anthropic Document Parser
Parses documents using Anthropic's Claude models to extract structured data that aligns with a JSON schema you provide. This node is helpful when you need to transform unstructured or semi-structured documents into validated, machine-readable JSON without building a bespoke parser.
This parser excels at ad hoc processing or low-volume workflows where you want maximum flexibility and minimal setup. Claude infers the document structure dynamically, so you can iterate quickly when formats vary, though each document may take longer to process as a result.
For high-volume or repeatable formats (invoices, purchase orders, statements, etc.), consider the Azure DI PDF Reader instead. Azure Document Intelligence lets you train custom models or use prebuilt templates, delivering faster parsing and predictable performance at scale.
Revision History
1.0.0.0 Initial Release
Properties
Connection
Type: Connection Input
Supplies the Anthropic API key, Claude model, and response settings shared across all requests.
ApiKey
Type: Password
The Anthropic API key used to authenticate requests.
Model
Type: String
The name of the Claude model that should process the request.
Default: claude-sonnet-4-20250514
MaxOutputTokens
Type: Integer
The maximum number of tokens to return from the Anthropic response.
ContinueOnMaxTokens
Type: Boolean
Determines whether the node should automatically request additional completions when Claude stops due to the maximum token limit.
Action
Type: Enum Input
Controls the node behaviour:
- CreateSchema – Generate a sample JSON schema that matches the uploaded document.
- CreateValidationScript – Produce a C# validation script scaffold based on the supplied schema.
- ParseDocument – Parse the document into structured JSON using the schema and optional validation logic.
Schema
Type: JSON Input
JSON schema that defines the expected output contract. Required for ParseDocument and CreateValidationScript.
ValidationScript
Type: Multiline Text Input
Optional C# code that performs additional validation after schema validation completes.
DocumentName
Type: String Input
Logical name of the document being parsed, including the extension (for example, Statement.pdf).
Document
Type: File Input
The document payload to send to Anthropic. Accepts files, byte arrays, streams, file paths, or raw strings.
Response
Type: JSON Output
Structured JSON returned by Anthropic after parsing completes.
Remarks
Actions
CreateSchema
Requests that Claude infer a schema for the supplied document. Use this as a starting point and refine the schema manually as needed.
CreateValidationScript
Generates a C# script that enforces custom validation rules based on your schema. You can edit the generated code to add business-specific logic.
ParseDocument
Parses the document using the provided Schema, validates the JSON, applies CustomProperties assertions (dot-path matches), and finally runs the optional ValidationScript.
Prompt Customization
Use Custom Properties to override or extend the default prompts:
| Key | Behaviour |
|---|---|
prompt.parse.overwrite |
Replaces the default parse prompt. |
prompt.parse.append |
Appends instructions to the parse prompt. |
prompt.schema.overwrite |
Replaces the default schema prompt. |
prompt.schema.append |
Appends instructions to the schema prompt. |
prompt.script.overwrite |
Replaces the default validation script prompt. |
prompt.script.append |
Appends to the validation script prompt. |
prompt.system.append |
Appends content to the system prompt. |
Validation-specific custom properties should omit these prompt keys to avoid being filtered out.
Example Usage
Use this node in workflows that process uploads such as invoices, claims, or onboarding forms:
- Capture a file from storage, email, or a user upload.
- Provide the document, schema, and optional validation script to Anthropic Document Parser.
- Route the structured JSON into downstream workflow logic for approvals, storage, or analytics.
Comparison: Anthropic Document Parser vs DI Pdf Reader
| Feature / Criteria | Anthropic Document Parser | DI Pdf Reader |
|---|---|---|
| Setup Required | Minimal – Claude infers structure per document. | Requires Azure setup plus either a custom or prebuilt model. |
| Best For | Ad hoc, flexible, low-volume parsing. | High-volume, recurring document types. |
| Performance | Slower per document (LLM based). | Fast once the model is configured. |
| Accuracy | Strong for loosely structured or varied documents. | High for consistent documents with known layouts. |
| Custom Rules | Supports dot-path assertions and C# validation scripts. | Enforce via workflow logic or Azure DI settings. |
| Supported File Types | Claude supports PDF, TXT, CSV, DOCX, JSON, and more. | Optimized for PDFs and common image formats (JPG, PNG). |
| Training Required | None. | Yes (custom training or selecting a prebuilt template). |
| Scalability | Lower – processing time grows with document size. | High – tuned for batch and parallel processing. |
| Pricing Considerations | Token-based usage that scales with document length. | Predictable Azure usage-based pricing, ideal for volume. |
| Schema Flexibility | Edit or swap schemas instantly. | Changing structure requires retraining or a new model. |