Frictionless supports reading and writing Parquet files.
pip install frictionless[parquet]
pip install 'frictionless[parquet]' # for zsh shell
You can read a Parquet file:
from frictionless import Resource
resource = Resource('table.parq')
print(resource.read_rows())
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]
You can write a dataset to Parquet:
from frictionless import Resource
resource = Resource('table.csv')
target = resource.write('table-output.parq')
print(target)
print(target.read_rows())
{'name': 'table-output',
'type': 'table',
'path': 'table-output.parq',
'scheme': 'file',
'format': 'parq',
'mediatype': 'application/parquet'}
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]
Parquet control representation. Control class to set params for Parquet read/write class.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, columns: Optional[List[str]] = None, categories: Optional[Any] = None, filters: Optional[Any] = False) -> None
A list of columns to load. By selecting columns, we can only access parts of file that we are interested in and skip columns that are not of interest. Default value is None.
Optional[List[str]]
List of columns that should be returned as Pandas Category-type column. The second example specifies the number of expected labels for that column. For example: categories=['col1'] or categories={'col1': 12}
Optional[Any]
Specifies the condition to filter data(row-groups). For example: [('col3', 'in', [1, 2, 3, 4])])
Optional[Any]
Convert to options