Edit page in Livemark
(2024-11-07 15:17)

Multipart Scheme

You can read and write files split into chunks with Frictionless.

Reading Data

You can read using Package/Resource, for example:

from pprint import pprint
from frictionless import Resource

resource = Resource(path='chunk1.csv', extrapaths=['chunk2.csv'])
pprint(resource.read_rows())
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]

Writing Data

A similiar approach can be used for writing:

from frictionless import Resource

resource = Resource(path='table.json')
resource.write('table{number}.json', scheme="multipart", control={"chunkSize": 1000000})

Configuration

There is a Control to configure how Frictionless reads files using this scheme. For example:

from frictionless import Resource
from frictionless.plugins.multipart import MultipartControl

control = MultipartControl(chunk_size=1000000)
resource = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']])
resource.write('table{number}.json', scheme="multipart", control=control)

Reference

schemes.MultipartControl (class)

schemes.MultipartControl (class)

Multipart control representation

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, chunk_size: int = 100000000) -> None

Parameters

  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • chunk_size (int)

schemes.multipartControl.chunk_size (property)

Specifies chunk size for the multipart file.

Signature

int