Edit page in Livemark
(2024-11-07 15:17)

Parquet Format

Frictionless supports reading and writing Parquet files.

pip install frictionless[parquet]
pip install 'frictionless[parquet]' # for zsh shell

Reading Data

You can read a Parquet file:

from frictionless import Resource

resource = Resource('table.parq')
print(resource.read_rows())
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]

Writing Data

You can write a dataset to Parquet:

from frictionless import Resource

resource = Resource('table.csv')
target = resource.write('table-output.parq')
print(target)
print(target.read_rows())
{'name': 'table-output',
 'type': 'table',
 'path': 'table-output.parq',
 'scheme': 'file',
 'format': 'parq',
 'mediatype': 'application/parquet'}
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]

Reference

formats.ParquetControl (class)

formats.ParquetControl (class)

Parquet control representation. Control class to set params for Parquet read/write class.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, columns: Optional[List[str]] = None, categories: Optional[Any] = None, filters: Optional[Any] = False) -> None

Parameters

  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • columns (Optional[List[str]])
  • categories (Optional[Any])
  • filters (Optional[Any])

formats.parquetControl.columns (property)

A list of columns to load. By selecting columns, we can only access parts of file that we are interested in and skip columns that are not of interest. Default value is None.

Signature

Optional[List[str]]

formats.parquetControl.categories (property)

List of columns that should be returned as Pandas Category-type column. The second example specifies the number of expected labels for that column. For example: categories=['col1'] or categories={'col1': 12}

Signature

Optional[Any]

formats.parquetControl.filters (property)

Specifies the condition to filter data(row-groups). For example: [('col3', 'in', [1, 2, 3, 4])])

Signature

Optional[Any]

formats.parquetControl.to_python (method)

Convert to options