DI Pdf Reader

Provides integration with Azure Document Intelligence AI models to read and extract structured data from PDF files. This node is ideal for processing large volumes of recurring document types - such as invoices, forms, and statements - using trained models for fast, consistent results.

If you're working with a high volume of documents, or expect to run the same workflow repeatedly with similar layouts, the DI Pdf Reader is the recommended option. It leverages a trained model, which reduces processing time and improves accuracy once set up.

For low-volume, flexible, or one-off parsing, where setting up a custom model may be unnecessary, consider using the OpenAI Document Parser instead. It requires less setup but uses a general-purpose model that may be slower per document.

Revision History

1.0.0.0 Initial Release

Properties

Connection

Type: Connection Input
The connection profile for your Azure Document Intelligence models.

DiModelName
Type: String
The name of the model you intend to use to interpret the PDF. Both custom and prebuilt models are supported.

DiModelEndpoint
Type: String
The endpoint of the Document Intelligence model.

DiModelKey
Type: Password
The key for the Document Intelligence model.

ReturnRawData
Type: boolean
Toggle to return raw data in the response payload.

PdfFile

Type: Object Input
The PDF file to be analyzed and converted.

Response

Type: JSON Output
The contents of the PDF represented as a structured JSON object.

Setting up a Document Intelligenc model

Navigate to Azure AI Foundry and create a new Document intelligence resource.
Copy the Endpoint and Key into your Flowgear DI Pdf Reader connection.
Enter the custom model name or the id of one of the prebuilt models.

Comparison: OpenAI Document Parser vs DI Pdf Reader

Feature / Criteria	OpenAI Document Parser	DI Pdf Reader
Setup Required	Minimal - schema and validation logic can be auto-generated from the document	Requires Azure setup and a trained custom model (prebuilt models also supported)
Best For	Ad hoc, flexible, low-volume parsing	High-volume, recurring document types
Performance	Slower per document (runs via OpenAI LLM)	Fast once model is trained
Accuracy	Good for loosely structured or varied documents	High for consistent, structured documents
Custom Rules	Supports dot-path validation and C# validation scripts	Must be enforced downstream (e.g., in workflow logic)
Supported File Types	Limited - PDF, TXT, CSV, DOCX, JSON, etc.	Primarily PDFs, but also supports image files (JPG, PNG)
Training Required	None	Always required (custom model training via Azure Document Intelligence)
Scalability	Lower - slower processing time per document	High - optimized for batch operations and parallel workflows
Pricing Considerations	May consume more tokens per run depending on size and structure	Predictable usage-based Azure pricing, better for high-volume scenarios
Schema Flexibility	Schema can be edited or swapped at any time	Changes require retraining or a new model