The Data Package is a core Frictionless Data concept meaning a set of resources with additional metadata provided. You can read Data Package Standard for more information.
Let's create a data package:
from frictionless import Package, Resource
package = Package('table.csv') # from a resource path
package = Package('tables/*') # from a resources glob
package = Package(['tables/chunk1.csv', 'tables/chunk2.csv']) # from a list
package = Package('package/datapackage.json') # from a descriptor path
package = Package({'resources': {'path': 'table.csv'}}) # from a descriptor
package = Package(resources=[Resource(path='table.csv')]) # from arguments
As you can see it's possible to create a package providing different kinds of sources which will be detected to have some type automatically (e.g. whether it's a glob or a path). It's possible to make this step more explicit:
from frictionless import Package, Resource
package = Package(resources=[Resource(path='table.csv')]) # from arguments
package = Package('datapackage.json') # from a descriptor
The standards support a great deal of package metadata which is possible to have with Frictionless Framework too:
from frictionless import Package, Resource
package = Package(
name='package',
title='My Package',
description='My Package for the Guide',
resources=[Resource(path='table.csv')],
# it's possible to provide all the official properties like homepage, version, etc
)
print(package)
{'name': 'package',
'title': 'My Package',
'description': 'My Package for the Guide',
'resources': [{'name': 'table',
'type': 'table',
'path': 'table.csv',
'scheme': 'file',
'format': 'csv',
'mediatype': 'text/csv'}]}
If you have created a package, for example, from a descriptor you can access this properties:
from frictionless import Package
package = Package('datapackage.json')
print(package.name)
# and others
test-tabulator
And edit them:
from frictionless import Package
package = Package('datapackage.json')
package.name = 'new-name'
package.title = 'New Title'
package.description = 'New Description'
# and others
print(package)
{'name': 'new-name',
'title': 'New Title',
'description': 'New Description',
'resources': [{'name': 'first-resource',
'type': 'table',
'path': 'table.xls',
'scheme': 'file',
'format': 'xls',
'mediatype': 'application/vnd.ms-excel',
'schema': {'fields': [{'name': 'id', 'type': 'number'},
{'name': 'name', 'type': 'string'}]}},
{'name': 'number-two',
'type': 'table',
'path': 'table-reverse.csv',
'scheme': 'file',
'format': 'csv',
'mediatype': 'text/csv',
'schema': {'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'}]}}]}
The core purpose of having a package is to provide an ability to have a set of resources. The Package class provides useful methods to manage resources:
from frictionless import Package, Resource
package = Package('datapackage.json')
print(package.resources)
print(package.resource_names)
package.add_resource(Resource(name='new', data=[['key1', 'key2'], ['val1', 'val2']]))
resource = package.get_resource('new')
print(package.has_resource('new'))
package.remove_resource('new')
[{'name': 'first-resource',
'type': 'table',
'path': 'table.xls',
'scheme': 'file',
'format': 'xls',
'mediatype': 'application/vnd.ms-excel',
'schema': {'fields': [{'name': 'id', 'type': 'number'},
{'name': 'name', 'type': 'string'}]}}, {'name': 'number-two',
'type': 'table',
'path': 'table-reverse.csv',
'scheme': 'file',
'format': 'csv',
'mediatype': 'text/csv',
'schema': {'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'}]}}]
['first-resource', 'number-two']
True
As any of the Metadata classes the Package class can be saved as JSON or YAML:
from frictionless import Package
package = Package('tables/*')
package.to_json('datapackage.json') # Save as JSON
package.to_yaml('datapackage.yaml') # Save as YAML
Package representation This class is one of the cornerstones of of Frictionless framework. It manages underlaying resource and provides an ability to describe a package. ```python package = Package(resources=[Resource(path="data/table.csv")]) package.get_resoure('table').read_rows() == [ {'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'},
(*, source: Optional[Any] = None, control: Optional[Control] = None, basepath: Optional[str] = None, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, homepage: Optional[str] = None, profile: Optional[str] = None, licenses: List[Dict[str, Any]] = NOTHING, sources: List[Dict[str, Any]] = NOTHING, contributors: List[Dict[str, Any]] = NOTHING, keywords: List[str] = NOTHING, image: Optional[str] = None, version: Optional[str] = None, created: Optional[str] = None, resources: List[Resource] = NOTHING, dataset: Optional[Dataset] = None, dialect: Optional[Dialect] = None, detector: Optional[Detector] = None) -> None
# TODO: add docs
Optional[Any]
# TODO: add docs
Optional[Control]
# TODO: add docs
Optional[str]
A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters.
Optional[str]
Type of the package
ClassVar[Union[str, None]]
A Package title according to the specs It should a human-oriented title of the resource.
Optional[str]
A Package description according to the specs It should a human-oriented description of the resource.
Optional[str]
A URL for the home on the web that is related to this package. For example, github repository or ckan dataset address.
Optional[str]
A fully-qualified URL that points directly to a JSON Schema that can be used to validate the descriptor
Optional[str]
The license(s) under which the package is provided.
List[Dict[str, Any]]
The raw sources for this data package. It MUST be an array of Source objects. Each Source object MUST have a title and MAY have path and/or email properties.
List[Dict[str, Any]]
The people or organizations who contributed to this package. It MUST be an array. Each entry is a Contributor and MUST be an object. A Contributor MUST have a title property and MAY contain path, email, role and organization properties.
List[Dict[str, Any]]
An Array of string keywords to assist users searching. For example, ['data', 'fiscal']
List[str]
An image to use for this data package. For example, when showing the package in a listing.
Optional[str]
A version string identifying the version of the package. It should conform to the Semantic Versioning requirements and should follow the Data Package Version pattern.
Optional[str]
The datetime on which this was created. The datetime must conform to the string formats for RFC3339 datetime,
Optional[str]
A list of resource descriptors. It can be dicts or Resource instances
List[Resource]
It returns reference to dataset of which catalog the package is part of. If package is not part of any catalog, then it is set to None.
Optional[Dataset]
# TODO: add docs
Optional[Dialect]
# TODO: add docs
Optional[Detector]
A basepath of the package The normpath of the resource is joined `basepath` and `/path`
Optional[str]
Return names of resources
List[str]
Return names of resources
List[str]
Add new resource to the package
(resource: Union[Resource, str]) -> Resource
Analyze the resources of the package This feature is currently experimental, and its API may change without warning.
(*, detailed: bool = False)
Remove all the resources
Dereference underlaying metadata If some of underlaying metadata is provided as a string it will replace it by the metadata object
Describe the given source as a package
(source: Optional[Any] = None, *, stats: bool = False, **options: Any)
Extract rows
(*, name: Optional[str] = None, filter: Optional[types.IFilterFunction] = None, process: Optional[types.IProcessFunction] = None, limit_rows: Optional[int] = None) -> types.ITabularData
Flatten the package Parameters spec (str[]): flatten specification
(spec: List[str] = [name, path])
Get resource by name
(name: str) -> Resource
Get table resource by name (raise if not table)
(name: str) -> TableResource
Check if a resource is present
(name: str) -> bool
Check if a table resource is present
(name: str) -> bool
Infer metadata
(*, stats: bool = False) -> None
Publish package to any supported data portal
(target: Any = None, *, control: Optional[Control] = None) -> PublishResult
Remove resource by name
(name: str) -> Resource
Set resource by name
(resource: Resource) -> Optional[Resource]
Create a copy of the package
(**options: Any) -> Self
Generate ERD(Entity Relationship Diagram) from package resources and exports it as .dot file Based on: - https://github.com/frictionlessdata/frictionless-py/issues/1118
(path: Optional[str] = None) -> str
Transform package
(: Package, pipeline: Pipeline)
Update resource
(name: str, descriptor: types.IDescriptor) -> Resource
Validate package
(: Package, checklist: Optional[Checklist] = None, *, name: Optional[str] = None, parallel: bool = False, limit_rows: Optional[int] = None, limit_errors: int = 1000)