The Data Package is a core Frictionless Data concept meaning a set of resources with additional metadata provided. You can read Data Package Standard for more information.
Let's create a data package:
from frictionless import Package, Resource
package = Package('table.csv') # from a resource path
package = Package('tables/*') # from a resources glob
package = Package(['tables/chunk1.csv', 'tables/chunk2.csv']) # from a list
package = Package('package/datapackage.json') # from a descriptor path
package = Package({'resources': {'path': 'table.csv'}}) # from a descriptor
package = Package(resources=[Resource(path='table.csv')]) # from arguments
As you can see it's possible to create a package providing different kinds of sources which will be detected to have some type automatically (e.g. whether it's a glob or a path). It's possible to make this step more explicit:
from frictionless import Package, Resource
package = Package(resources=[Resource(path='table.csv')]) # from arguments
package = Package('datapackage.json') # from a descriptor
The standards support a great deal of package metadata which is possible to have with Frictionless Framework too:
from frictionless import Package, Resource
package = Package(
name='package',
title='My Package',
description='My Package for the Guide',
resources=[Resource(path='table.csv')],
# it's possible to provide all the official properties like homepage, version, etc
)
print(package)
{'name': 'package',
'title': 'My Package',
'description': 'My Package for the Guide',
'resources': [{'path': 'table.csv'}]}
If you have created a package, for example, from a descriptor you can access this properties:
from frictionless import Package
package = Package('datapackage.json')
print(package.name)
# and others
test-tabulator
And edit them:
from frictionless import Package
package = Package('datapackage.json')
package.name = 'new-name'
package.title = 'New Title'
package.description = 'New Description'
# and others
print(package)
{'name': 'new-name',
'title': 'New Title',
'description': 'New Description',
'resources': [{'name': 'first-resource',
'path': 'table.xls',
'schema': {'fields': [{'name': 'id', 'type': 'number'},
{'name': 'name', 'type': 'string'}]}},
{'name': 'number-two',
'path': 'table-reverse.csv',
'schema': {'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'}]}}]}
The core purpose of having a package is to provide an ability to have a set of resources. The Package class provides useful methods to manage resources:
from frictionless import Package, Resource
package = Package('datapackage.json')
print(package.resources)
print(package.resource_names)
package.add_resource(Resource(name='new', data=[['key1', 'key2'], ['val1', 'val2']]))
resource = package.get_resource('new')
print(package.has_resource('new'))
package.remove_resource('new')
[{'name': 'first-resource',
'path': 'table.xls',
'schema': {'fields': [{'name': 'id', 'type': 'number'},
{'name': 'name', 'type': 'string'}]}}, {'name': 'number-two',
'path': 'table-reverse.csv',
'schema': {'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'}]}}]
['first-resource', 'number-two']
True
As any of the Metadata classes the Package class can be saved as JSON or YAML:
from frictionless import Package
package = Package('tables/*')
package.to_json('datapackage.json') # Save as JSON
package.to_yaml('datapackage.yaml') # Save as YAML
Package representation This class is one of the cornerstones of of Frictionless framework. It manages underlaying resource and provides an ability to describe a package. ```python package = Package(resources=[Resource(path="data/table.csv")]) package.get_resoure('table').read_rows() == [ {'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'},
(source: Optional[Any] = None, control: Optional[Control] = None, *, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, homepage: Optional[str] = None, profiles: List[Union[IProfile, str]] = [], licenses: List[dict] = [], sources: List[dict] = [], contributors: List[dict] = [], keywords: List[str] = [], image: Optional[str] = None, version: Optional[str] = None, created: Optional[str] = None, resources: List[Union[Resource, str]] = [], basepath: Optional[str] = None, detector: Optional[Detector] = None, dialect: Optional[Dialect] = None, catalog: Optional[Catalog] = None)
A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters.
Optional[str]
A Package title according to the specs It should a human-oriented title of the resource.
Optional[str]
A Package description according to the specs It should a human-oriented description of the resource.
Optional[str]
A URL for the home on the web that is related to this package. For example, github repository or ckan dataset address.
Optional[str]
A strings identifying the profiles of this descriptor. For example, `fiscal-data-package`.
List[Union[IProfile, str]]
The license(s) under which the package is provided.
List[dict]
The raw sources for this data package. It MUST be an array of Source objects. Each Source object MUST have a title and MAY have path and/or email properties.
List[dict]
The people or organizations who contributed to this package. It MUST be an array. Each entry is a Contributor and MUST be an object. A Contributor MUST have a title property and MAY contain path, email, role and organization properties.
List[dict]
An Array of string keywords to assist users searching. For example, ['data', 'fiscal']
List[str]
An image to use for this data package. For example, when showing the package in a listing.
Optional[str]
A version string identifying the version of the package. It should conform to the Semantic Versioning requirements and should follow the Data Package Version pattern.
Optional[str]
The datetime on which this was created. The datetime must conform to the string formats for RFC3339 datetime,
Optional[str]
A list of resource descriptors. It can be dicts or Resource instances
List[Resource]
It returns reference to catalog of which the package is part of. If package is not part of any catalog, then it is set to None.
Optional[Catalog]
A basepath of the package The normpath of the resource is joined `basepath` and `/path`
Optional[str]
Return names of resources
List[str]
Return names of resources
List[str]
Add new resource to the package
(resource: Union[Resource, str]) -> Resource
Analyze the resources of the package This feature is currently experimental, and its API may change without warning.
(: Package, *, detailed=False)
Remove all the resources
Describe the given source as a package
(source: Optional[Any] = None, *, stats: bool = False, **options)
Extract package rows
(: Package, *, limit_rows: Optional[int] = None, process: Optional[IProcessFunction] = None, filter: Optional[IFilterFunction] = None, stream: bool = False)
Flatten the package Parameters spec (str[]): flatten specification
(spec=[name, path])
Get resource by name
(name: str) -> Resource
Check if a resource is present
(name: str) -> bool
Infer package's attributes
(*, sample=True, stats=False)
Publish package to any supported data portal
(target: Any = None, *, control: Optional[Control] = None) -> Any
Remove resource by name
(name: str) -> Resource
Set resource by name
(resource: Resource) -> Optional[Resource]
Create a copy of the package
Generate ERD(Entity Relationship Diagram) from package resources and exports it as .dot file Based on: - https://github.com/frictionlessdata/frictionless-py/issues/1118
(path: Optional[str] = None) -> str
Transform package
(: Package, pipeline: Pipeline)
Update resource
(name: str, descriptor: IDescriptor) -> Resource
Validate package
(: Package, checklist: Optional[Checklist] = None, *, limit_errors: int = 1000, limit_rows: Optional[int] = None, parallel: bool = False)