You can read and write files split into chunks with Frictionless.
You can read using Package/Resource
, for example:
from pprint import pprint
from frictionless import Resource
resource = Resource(path='chunk1.csv', extrapaths=['chunk2.csv'])
pprint(resource.read_rows())
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]
A similiar approach can be used for writing:
from frictionless import Resource
resource = Resource(path='table.json')
resource.write('table{number}.json', scheme="multipart", control={"chunkSize": 1000000})
There is a Control
to configure how Frictionless reads files using this scheme. For example:
from frictionless import Resource
from frictionless.plugins.multipart import MultipartControl
control = MultipartControl(chunk_size=1000000)
resource = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']])
resource.write('table{number}.json', scheme="multipart", control=control)
Multipart control representation
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, chunk_size: int = 100000000) -> None
Specifies chunk size for the multipart file.
int