Parquet Convert

The Parquet Convert Node transforms structured data from either JSON or XML files into the highly efficient, column-oriented Parquet format. It automatically analyses the input data to build a schema and allows you to apply various compression methods to the output file. To learn more about Parquet files and their uses, see their website.

Revision History

1.0.0.0 Initial release
1.0.0.1 Fixed empty field handling.

Properties

Action

Type: List Input
Selects the input data format to convert from.

JsonToParquet - Converts JSON data into a Parquet file.
XmlToParquet - Converts XML data into a Parquet file.

Input

Type: File Input
The source JSON or XML file you want to convert. This property expects a file provided as a byte array.

CompressionMethod

Type: List Input
Specifies the compression algorithm to use for the output Parquet file.

None (Default)
Snappy
Gzip
Lzo
Brotli
LZ4
Zstd
Lz4Raw

ParquetOutput

Type: File Output
The resulting Parquet file, provided as a byte array.

Remarks

Schema Inference

The Node automatically determines the data types for your Parquet file by scanning the values in your source Input. Note the following behaviours:

If a column in your data contains mixed data types (e.g. both numbers and text), the Node will treat the entire column as a string to ensure no data is lost.
For the Node to correctly interpret nested data as a Struct Field (a nested object), every record must contain an object at that position, and all of those objects must have the exact same field names. If the fields differ between records, the Node will attempt to create a Map Field, which is currently unsupported.

Known Issues

Unsupported Features

Currently, the Node does not support List Fields (arrays) or Map Fields (objects with dynamic keys). If your JSON or XML input contains these structures, the Node will show an error message. If you would like support for these data formats in a future update, please contact support to let us know.