Edit page in Livemark
(2025-04-11 15:26)

Package Class

The Data Package is a core Frictionless Data concept meaning a set of resources with additional metadata provided. You can read Data Package Standard for more information.

Creating Package

Let's create a data package:

Python

from frictionless import Package, Resource

package = Package('table.csv') # from a resource path
package = Package('tables/*') # from a resources glob
package = Package(['tables/chunk1.csv', 'tables/chunk2.csv']) # from a list
package = Package('package/datapackage.json') # from a descriptor path
package = Package({'resources': {'path': 'table.csv'}}) # from a descriptor
package = Package(resources=[Resource(path='table.csv')]) # from arguments

As you can see it's possible to create a package providing different kinds of sources which will be detected to have some type automatically (e.g. whether it's a glob or a path). It's possible to make this step more explicit:

Python

from frictionless import Package, Resource

package = Package(resources=[Resource(path='table.csv')]) # from arguments
package = Package('datapackage.json') # from a descriptor

Describing Package

The standards support a great deal of package metadata which is possible to have with Frictionless Framework too:

Python

from frictionless import Package, Resource

package = Package(
    name='package',
    title='My Package',
    description='My Package for the Guide',
    resources=[Resource(path='table.csv')],
    # it's possible to provide all the official properties like homepage, version, etc
)
print(package)

{'name': 'package',
 'title': 'My Package',
 'description': 'My Package for the Guide',
 'resources': [{'name': 'table',
                'type': 'table',
                'path': 'table.csv',
                'scheme': 'file',
                'format': 'csv',
                'mediatype': 'text/csv'}]}

If you have created a package, for example, from a descriptor you can access this properties:

Python

from frictionless import Package

package = Package('datapackage.json')
print(package.name)
# and others

test-tabulator

And edit them:

Python

from frictionless import Package

package = Package('datapackage.json')
package.name = 'new-name'
package.title = 'New Title'
package.description = 'New Description'
# and others
print(package)

{'name': 'new-name',
 'title': 'New Title',
 'description': 'New Description',
 'resources': [{'name': 'first-resource',
                'type': 'table',
                'path': 'table.xls',
                'scheme': 'file',
                'format': 'xls',
                'mediatype': 'application/vnd.ms-excel',
                'schema': {'fields': [{'name': 'id', 'type': 'number'},
                                      {'name': 'name', 'type': 'string'}]}},
               {'name': 'number-two',
                'type': 'table',
                'path': 'table-reverse.csv',
                'scheme': 'file',
                'format': 'csv',
                'mediatype': 'text/csv',
                'schema': {'fields': [{'name': 'id', 'type': 'integer'},
                                      {'name': 'name', 'type': 'string'}]}}]}

Resource Management

The core purpose of having a package is to provide an ability to have a set of resources. The Package class provides useful methods to manage resources:

Python

from frictionless import Package, Resource

package = Package('datapackage.json')
print(package.resources)
print(package.resource_names)
package.add_resource(Resource(name='new', data=[['key1', 'key2'], ['val1', 'val2']]))
resource = package.get_resource('new')
print(package.has_resource('new'))
package.remove_resource('new')

[{'name': 'first-resource',
 'type': 'table',
 'path': 'table.xls',
 'scheme': 'file',
 'format': 'xls',
 'mediatype': 'application/vnd.ms-excel',
 'schema': {'fields': [{'name': 'id', 'type': 'number'},
                       {'name': 'name', 'type': 'string'}]}}, {'name': 'number-two',
 'type': 'table',
 'path': 'table-reverse.csv',
 'scheme': 'file',
 'format': 'csv',
 'mediatype': 'text/csv',
 'schema': {'fields': [{'name': 'id', 'type': 'integer'},
                       {'name': 'name', 'type': 'string'}]}}]
['first-resource', 'number-two']
True

Saving Descriptor

As any of the Metadata classes the Package class can be saved as JSON or YAML:

Python

from frictionless import Package
package = Package('tables/*')
package.to_json('datapackage.json') # Save as JSON
package.to_yaml('datapackage.yaml') # Save as YAML

Package (class)

Package representation This class is one of the cornerstones of of Frictionless framework. It manages underlaying resource and provides an ability to describe a package. ```python package = Package(resources=[Resource(path="data/table.csv")]) package.get_resoure('table').read_rows() == [ {'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'},

Signature

(*, source: Optional[Any] = None, control: Optional[Control] = None, basepath: Optional[str] = None, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, homepage: Optional[str] = None, profile: Optional[str] = None, licenses: List[Dict[str, Any]] = NOTHING, sources: List[Dict[str, Any]] = NOTHING, contributors: List[Dict[str, Any]] = NOTHING, keywords: List[str] = NOTHING, image: Optional[str] = None, version: Optional[str] = None, created: Optional[str] = None, resources: List[Resource] = NOTHING, dataset: Optional[Dataset] = None, dialect: Optional[Dialect] = None, detector: Optional[Detector] = None) -> None

Parameters

source (Optional[Any])
control (Optional[Control])
basepath (Optional[str])
name (Optional[str])
title (Optional[str])
description (Optional[str])
homepage (Optional[str])
profile (Optional[str])
licenses (List[Dict[str, Any]])
sources (List[Dict[str, Any]])
contributors (List[Dict[str, Any]])
keywords (List[str])
image (Optional[str])
version (Optional[str])
created (Optional[str])
resources (List[Resource])
dataset (Optional[Dataset])
dialect (Optional[Dialect])
detector (Optional[Detector])

package.source (property)

# TODO: add docs

Signature

Optional[Any]

package.control (property)

# TODO: add docs

Signature

Optional[Control]

package._basepath (property)

# TODO: add docs

Signature

Optional[str]

package.name (property)

A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “.”, “_” or “-” characters.

Signature

Optional[str]

package.type (property)

Type of the package

Signature

ClassVar[Union[str, None]]

package.title (property)

A Package title according to the specs It should a human-oriented title of the resource.

Signature

Optional[str]

package.description (property)

A Package description according to the specs It should a human-oriented description of the resource.

Signature

Optional[str]

package.homepage (property)

A URL for the home on the web that is related to this package. For example, github repository or ckan dataset address.

Signature

Optional[str]

package.profile (property)

A fully-qualified URL that points directly to a JSON Schema that can be used to validate the descriptor

Signature

Optional[str]

package.licenses (property)

The license(s) under which the package is provided.

Signature

List[Dict[str, Any]]

package.sources (property)

The raw sources for this data package. It MUST be an array of Source objects. Each Source object MUST have a title and MAY have path and/or email properties.

Signature

List[Dict[str, Any]]

package.contributors (property)

The people or organizations who contributed to this package. It MUST be an array. Each entry is a Contributor and MUST be an object. A Contributor MUST have a title property and MAY contain path, email, role and organization properties.

Signature

List[Dict[str, Any]]

package.keywords (property)

An Array of string keywords to assist users searching. For example, ['data', 'fiscal']

Signature

List[str]

package.image (property)

An image to use for this data package. For example, when showing the package in a listing.

Signature

Optional[str]

package.version (property)

A version string identifying the version of the package. It should conform to the Semantic Versioning requirements and should follow the Data Package Version pattern.

Signature

Optional[str]

package.created (property)

The datetime on which this was created. The datetime must conform to the string formats for RFC3339 datetime,

Signature

Optional[str]

package.resources (property)

A list of resource descriptors. It can be dicts or Resource instances

Signature

List[Resource]

package.dataset (property)

It returns reference to dataset of which catalog the package is part of. If package is not part of any catalog, then it is set to None.

Signature

Optional[Dataset]

package._dialect (property)

# TODO: add docs

Signature

Optional[Dialect]

package._detector (property)

# TODO: add docs

Signature

Optional[Detector]

package.basepath (property)

A basepath of the package The normpath of the resource is joined `basepath` and `/path`

Signature

Optional[str]

package.resource_names (property)

Return names of resources

Signature

List[str]

package.resource_paths (property)

Return names of resources

Signature

List[str]

package.add_resource (method)

Add new resource to the package

Signature

(resource: Union[Resource, str]) -> Resource

Parameters

resource (Union[Resource, str])

package.analyze (method)

Analyze the resources of the package This feature is currently experimental, and its API may change without warning.

Signature

(*, detailed: bool = False)

Parameters

detailed (bool)

package.clear_resources (method)

Remove all the resources

package.dereference (method)

Dereference underlaying metadata If some of underlaying metadata is provided as a string it will replace it by the metadata object

Package.describe (method) (static)

Describe the given source as a package

Signature

(source: Optional[Any] = None, *, stats: bool = False, **options: Any)

Parameters

source (Optional[Any]): data source
stats (bool)
options (Any)

package.extract (method)

Extract rows

Signature

(*, name: Optional[str] = None, filter: Optional[types.IFilterFunction] = None, process: Optional[types.IProcessFunction] = None, limit_rows: Optional[int] = None) -> types.ITabularData

Parameters

name (Optional[str])
filter (Optional[types.IFilterFunction]): row filter function
process (Optional[types.IProcessFunction]): row processor function
limit_rows (Optional[int]): limit amount of rows to this number

package.flatten (method)

Flatten the package Parameters spec (str[]): flatten specification

Signature

(spec: List[str] = [name, path])

Parameters

spec (List[str])

package.get_resource (method)

Get resource by name

Signature

(name: str) -> Resource

Parameters

name (str)

package.get_table_resource (method)

Get table resource by name (raise if not table)

Signature

(name: str) -> TableResource

Parameters

name (str)

package.has_resource (method)

Check if a resource is present

Signature

(name: str) -> bool

Parameters

name (str)

package.has_table_resource (method)

Check if a table resource is present

Signature

(name: str) -> bool

Parameters

name (str)

package.infer (method)

Infer metadata

Signature

(*, stats: bool = False) -> None

Parameters

stats (bool): stream files completely and infer stats

package.publish (method)

Publish package to any supported data portal

Signature

(target: Any = None, *, control: Optional[Control] = None) -> PublishResult

Parameters

target (Any): url e.g. "https://github.com/frictionlessdata/repository-demo" of target[CKAN/Github...]
control (Optional[Control]): Github control

package.remove_resource (method)

Remove resource by name

Signature

(name: str) -> Resource

Parameters

name (str)

package.set_resource (method)

Set resource by name

Signature

(resource: Resource) -> Optional[Resource]

Parameters

resource (Resource)

package.to_copy (method)

Create a copy of the package

Signature

(**options: Any) -> Self

Parameters

options (Any)

package.to_er_diagram (method)

Generate ERD(Entity Relationship Diagram) from package resources and exports it as .dot file Based on: - https://github.com/frictionlessdata/frictionless-py/issues/1118

Signature

(path: Optional[str] = None) -> str

Parameters

path (Optional[str]): target path

package.transform (method)

Transform package

Signature

(: Package, pipeline: Pipeline)

Parameters

pipeline (Pipeline)

package.update_resource (method)

Update resource

Signature

(name: str, descriptor: types.IDescriptor) -> Resource

Parameters

name (str)
descriptor (types.IDescriptor)

package.validate (method)

Validate package Parameters: checklist? (checklist): a Checklist object parallel? (bool): run in parallel if possible. Parallel execution is not possible if foreign keys are used in a resource schema. Returns: Report: validation report

Signature

(: Package, checklist: Optional[Checklist] = None, *, name: Optional[str] = None, parallel: bool = False, limit_rows: Optional[int] = None, limit_errors: int = 1000)

Parameters

checklist (Optional[Checklist])
name (Optional[str])
parallel (bool)
limit_rows (Optional[int])
limit_errors (int)

Resource Class »

« Catalog Class