Edit page in Livemark
(2024-11-07 15:17)

Package Class

The Data Package is a core Frictionless Data concept meaning a set of resources with additional metadata provided. You can read Data Package Standard for more information.

Creating Package

Let's create a data package:

from frictionless import Package, Resource

package = Package('table.csv') # from a resource path
package = Package('tables/*') # from a resources glob
package = Package(['tables/chunk1.csv', 'tables/chunk2.csv']) # from a list
package = Package('package/datapackage.json') # from a descriptor path
package = Package({'resources': {'path': 'table.csv'}}) # from a descriptor
package = Package(resources=[Resource(path='table.csv')]) # from arguments

As you can see it's possible to create a package providing different kinds of sources which will be detected to have some type automatically (e.g. whether it's a glob or a path). It's possible to make this step more explicit:

from frictionless import Package, Resource

package = Package(resources=[Resource(path='table.csv')]) # from arguments
package = Package('datapackage.json') # from a descriptor

Describing Package

The standards support a great deal of package metadata which is possible to have with Frictionless Framework too:

from frictionless import Package, Resource

package = Package(
    name='package',
    title='My Package',
    description='My Package for the Guide',
    resources=[Resource(path='table.csv')],
    # it's possible to provide all the official properties like homepage, version, etc
)
print(package)
{'name': 'package',
 'title': 'My Package',
 'description': 'My Package for the Guide',
 'resources': [{'name': 'table',
                'type': 'table',
                'path': 'table.csv',
                'scheme': 'file',
                'format': 'csv',
                'mediatype': 'text/csv'}]}

If you have created a package, for example, from a descriptor you can access this properties:

from frictionless import Package

package = Package('datapackage.json')
print(package.name)
# and others
test-tabulator

And edit them:

from frictionless import Package

package = Package('datapackage.json')
package.name = 'new-name'
package.title = 'New Title'
package.description = 'New Description'
# and others
print(package)
{'name': 'new-name',
 'title': 'New Title',
 'description': 'New Description',
 'resources': [{'name': 'first-resource',
                'type': 'table',
                'path': 'table.xls',
                'scheme': 'file',
                'format': 'xls',
                'mediatype': 'application/vnd.ms-excel',
                'schema': {'fields': [{'name': 'id', 'type': 'number'},
                                      {'name': 'name', 'type': 'string'}]}},
               {'name': 'number-two',
                'type': 'table',
                'path': 'table-reverse.csv',
                'scheme': 'file',
                'format': 'csv',
                'mediatype': 'text/csv',
                'schema': {'fields': [{'name': 'id', 'type': 'integer'},
                                      {'name': 'name', 'type': 'string'}]}}]}

Resource Management

The core purpose of having a package is to provide an ability to have a set of resources. The Package class provides useful methods to manage resources:

from frictionless import Package, Resource

package = Package('datapackage.json')
print(package.resources)
print(package.resource_names)
package.add_resource(Resource(name='new', data=[['key1', 'key2'], ['val1', 'val2']]))
resource = package.get_resource('new')
print(package.has_resource('new'))
package.remove_resource('new')
[{'name': 'first-resource',
 'type': 'table',
 'path': 'table.xls',
 'scheme': 'file',
 'format': 'xls',
 'mediatype': 'application/vnd.ms-excel',
 'schema': {'fields': [{'name': 'id', 'type': 'number'},
                       {'name': 'name', 'type': 'string'}]}}, {'name': 'number-two',
 'type': 'table',
 'path': 'table-reverse.csv',
 'scheme': 'file',
 'format': 'csv',
 'mediatype': 'text/csv',
 'schema': {'fields': [{'name': 'id', 'type': 'integer'},
                       {'name': 'name', 'type': 'string'}]}}]
['first-resource', 'number-two']
True

Saving Descriptor

As any of the Metadata classes the Package class can be saved as JSON or YAML:

from frictionless import Package
package = Package('tables/*')
package.to_json('datapackage.json') # Save as JSON
package.to_yaml('datapackage.yaml') # Save as YAML

Reference

Package (class)

Package (class)

Package representation This class is one of the cornerstones of of Frictionless framework. It manages underlaying resource and provides an ability to describe a package. ```python package = Package(resources=[Resource(path="data/table.csv")]) package.get_resoure('table').read_rows() == [ {'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'},

Signature

(*, source: Optional[Any] = None, control: Optional[Control] = None, basepath: Optional[str] = None, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, homepage: Optional[str] = None, profile: Optional[str] = None, licenses: List[Dict[str, Any]] = NOTHING, sources: List[Dict[str, Any]] = NOTHING, contributors: List[Dict[str, Any]] = NOTHING, keywords: List[str] = NOTHING, image: Optional[str] = None, version: Optional[str] = None, created: Optional[str] = None, resources: List[Resource] = NOTHING, dataset: Optional[Dataset] = None, dialect: Optional[Dialect] = None, detector: Optional[Detector] = None) -> None

Parameters

  • source (Optional[Any])
  • control (Optional[Control])
  • basepath (Optional[str])
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • homepage (Optional[str])
  • profile (Optional[str])
  • licenses (List[Dict[str, Any]])
  • sources (List[Dict[str, Any]])
  • contributors (List[Dict[str, Any]])
  • keywords (List[str])
  • image (Optional[str])
  • version (Optional[str])
  • created (Optional[str])
  • resources (List[Resource])
  • dataset (Optional[Dataset])
  • dialect (Optional[Dialect])
  • detector (Optional[Detector])

package.source (property)

# TODO: add docs

Signature

Optional[Any]

package.control (property)

# TODO: add docs

Signature

Optional[Control]

package._basepath (property)

# TODO: add docs

Signature

Optional[str]

package.name (property)

A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters.

Signature

Optional[str]

package.type (property)

Type of the package

Signature

ClassVar[Union[str, None]]

package.title (property)

A Package title according to the specs It should a human-oriented title of the resource.

Signature

Optional[str]

package.description (property)

A Package description according to the specs It should a human-oriented description of the resource.

Signature

Optional[str]

package.homepage (property)

A URL for the home on the web that is related to this package. For example, github repository or ckan dataset address.

Signature

Optional[str]

package.profile (property)

A fully-qualified URL that points directly to a JSON Schema that can be used to validate the descriptor

Signature

Optional[str]

package.licenses (property)

The license(s) under which the package is provided.

Signature

List[Dict[str, Any]]

package.sources (property)

The raw sources for this data package. It MUST be an array of Source objects. Each Source object MUST have a title and MAY have path and/or email properties.

Signature

List[Dict[str, Any]]

package.contributors (property)

The people or organizations who contributed to this package. It MUST be an array. Each entry is a Contributor and MUST be an object. A Contributor MUST have a title property and MAY contain path, email, role and organization properties.

Signature

List[Dict[str, Any]]

package.keywords (property)

An Array of string keywords to assist users searching. For example, ['data', 'fiscal']

Signature

List[str]

package.image (property)

An image to use for this data package. For example, when showing the package in a listing.

Signature

Optional[str]

package.version (property)

A version string identifying the version of the package. It should conform to the Semantic Versioning requirements and should follow the Data Package Version pattern.

Signature

Optional[str]

package.created (property)

The datetime on which this was created. The datetime must conform to the string formats for RFC3339 datetime,

Signature

Optional[str]

package.resources (property)

A list of resource descriptors. It can be dicts or Resource instances

Signature

List[Resource]

package.dataset (property)

It returns reference to dataset of which catalog the package is part of. If package is not part of any catalog, then it is set to None.

Signature

Optional[Dataset]

package._dialect (property)

# TODO: add docs

Signature

Optional[Dialect]

package._detector (property)

# TODO: add docs

Signature

Optional[Detector]

package.basepath (property)

A basepath of the package The normpath of the resource is joined `basepath` and `/path`

Signature

Optional[str]

package.resource_names (property)

Return names of resources

Signature

List[str]

package.resource_paths (property)

Return names of resources

Signature

List[str]

package.add_resource (method)

Add new resource to the package

Signature

(resource: Union[Resource, str]) -> Resource

Parameters

  • resource (Union[Resource, str])

package.analyze (method)

Analyze the resources of the package This feature is currently experimental, and its API may change without warning.

Signature

(*, detailed: bool = False)

Parameters

  • detailed (bool)

package.clear_resources (method)

Remove all the resources

package.dereference (method)

Dereference underlaying metadata If some of underlaying metadata is provided as a string it will replace it by the metadata object

Package.describe (method) (static)

Describe the given source as a package

Signature

(source: Optional[Any] = None, *, stats: bool = False, **options: Any)

Parameters

  • source (Optional[Any]): data source
  • stats (bool)
  • options (Any)

package.extract (method)

Extract rows

Signature

(*, name: Optional[str] = None, filter: Optional[types.IFilterFunction] = None, process: Optional[types.IProcessFunction] = None, limit_rows: Optional[int] = None) -> types.ITabularData

Parameters

  • name (Optional[str])
  • filter (Optional[types.IFilterFunction]): row filter function
  • process (Optional[types.IProcessFunction]): row processor function
  • limit_rows (Optional[int]): limit amount of rows to this number

package.flatten (method)

Flatten the package Parameters spec (str[]): flatten specification

Signature

(spec: List[str] = [name, path])

Parameters

  • spec (List[str])

package.get_resource (method)

Get resource by name

Signature

(name: str) -> Resource

Parameters

  • name (str)

package.get_table_resource (method)

Get table resource by name (raise if not table)

Signature

(name: str) -> TableResource

Parameters

  • name (str)

package.has_resource (method)

Check if a resource is present

Signature

(name: str) -> bool

Parameters

  • name (str)

package.has_table_resource (method)

Check if a table resource is present

Signature

(name: str) -> bool

Parameters

  • name (str)

package.infer (method)

Infer metadata

Signature

(*, stats: bool = False) -> None

Parameters

  • stats (bool): stream files completely and infer stats

package.publish (method)

Publish package to any supported data portal

Signature

(target: Any = None, *, control: Optional[Control] = None) -> PublishResult

Parameters

  • target (Any): url e.g. "https://github.com/frictionlessdata/repository-demo" of target[CKAN/Github...]
  • control (Optional[Control]): Github control

package.remove_resource (method)

Remove resource by name

Signature

(name: str) -> Resource

Parameters

  • name (str)

package.set_resource (method)

Set resource by name

Signature

(resource: Resource) -> Optional[Resource]

Parameters

  • resource (Resource)

package.to_copy (method)

Create a copy of the package

Signature

(**options: Any) -> Self

Parameters

  • options (Any)

package.to_er_diagram (method)

Generate ERD(Entity Relationship Diagram) from package resources and exports it as .dot file Based on: - https://github.com/frictionlessdata/frictionless-py/issues/1118

Signature

(path: Optional[str] = None) -> str

Parameters

  • path (Optional[str]): target path

package.transform (method)

Transform package

Signature

(: Package, pipeline: Pipeline)

Parameters

  • pipeline (Pipeline)

package.update_resource (method)

Update resource

Signature

(name: str, descriptor: types.IDescriptor) -> Resource

Parameters

  • name (str)
  • descriptor (types.IDescriptor)

package.validate (method)

Validate package

Signature

(: Package, checklist: Optional[Checklist] = None, *, name: Optional[str] = None, parallel: bool = False, limit_rows: Optional[int] = None, limit_errors: int = 1000)

Parameters

  • checklist (Optional[Checklist])
  • name (Optional[str])
  • parallel (bool)
  • limit_rows (Optional[int])
  • limit_errors (int)