DI Pdf Reader
Provides integration with Azure Document Intelligence AI models to read and extract structured data from PDF files. This node is ideal for processing large volumes of recurring document types - such as invoices, forms, and statements - using trained models for fast, consistent results.
If you're working with a high volume of documents, or expect to run the same workflow repeatedly with similar layouts, the DI Pdf Reader is the recommended option. It leverages a trained model, which reduces processing time and improves accuracy once set up.
For low-volume, flexible, or one-off parsing, where setting up a custom model may be unnecessary, consider using the OpenAI Document Parser instead. It requires less setup but uses a general-purpose model that may be slower per document.
Revision History
1.0.0.0 Initial Release
Properties
Connection
Type: Connection Input
The connection profile for your Azure Document Intelligence models.
DiModelName
Type: String
The name of the model you intend to use to interpret the PDF. Both custom and prebuilt models are supported.
DiModelEndpoint
Type: String
The endpoint of the Document Intelligence model.
DiModelKey
Type: Password
The key for the Document Intelligence model.
ReturnRawData
Type: boolean
Toggle to return raw data in the response payload.
PdfFile
Type: Object Input
The PDF file to be analyzed and converted.
Response
Type: JSON Output
The contents of the PDF represented as a structured JSON object.
Setting up a Document Intelligenc model
- Navigate to Azure AI Foundry and create a new
Document intelligence
resource. - Copy the
Endpoint
andKey
into your Flowgear DI Pdf Reader connection. - Enter the custom model name or the id of one of the prebuilt models.
Comparison: OpenAI Document Parser vs DI Pdf Reader
Feature / Criteria | OpenAI Document Parser | DI Pdf Reader |
---|---|---|
Setup Required | Minimal - schema and validation logic can be auto-generated from the document | Requires Azure setup and a trained custom model (prebuilt models also supported) |
Best For | Ad hoc, flexible, low-volume parsing | High-volume, recurring document types |
Performance | Slower per document (runs via OpenAI LLM) | Fast once model is trained |
Accuracy | Good for loosely structured or varied documents | High for consistent, structured documents |
Custom Rules | Supports dot-path validation and C# validation scripts | Must be enforced downstream (e.g., in workflow logic) |
Supported File Types | Limited - PDF, TXT, CSV, DOCX, JSON, etc. | Primarily PDFs, but also supports image files (JPG, PNG) |
Training Required | None | Always required (custom model training via Azure Document Intelligence) |
Scalability | Lower - slower processing time per document | High - optimized for batch operations and parallel workflows |
Pricing Considerations | May consume more tokens per run depending on size and structure | Predictable usage-based Azure pricing, better for high-volume scenarios |
Schema Flexibility | Schema can be edited or swapped at any time | Changes require retraining or a new model |