Edit page in Livemark
(2022-09-19 18:33)

Resource Class

The Resource class is arguable the most important class of the whole Frictionless Framework. It's based on Data Resource Standard and Tabular Data Resource Standard

Creating Resource

Let's create a data resource:

from frictionless import Resource

resource = Resource('table.csv') # from a resource path
resource = Resource('resource.json') # from a descriptor path
resource = Resource({'path': 'table.csv'}) # from a descriptor
resource = Resource(path='table.csv') # from arguments

As you can see it's possible to create a resource providing different kinds of sources which will be detector to have some type automatically (e.g. whether it's a descriptor or a path). It's possible to make this step more explicit:

from frictionless import Resource

resource = Resource(path='data/table.csv') # from a path
resource = Resource('data/resource.json') # from a descriptor

Describing Resource

The standards support a great deal of resource metadata which is possible to have with Frictionless Framework too:

from frictionless import Resource

resource = Resource(
    name='resource',
    title='My Resource',
    description='My Resource for the Guide',
    path='table.csv',
    # it's possible to provide all the official properties like mediatype, etc
)
print(resource)
{'name': 'resource',
 'title': 'My Resource',
 'description': 'My Resource for the Guide',
 'path': 'table.csv'}

If you have created a resource, for example, from a descriptor you can access this properties:

from frictionless import Resource

resource = Resource('resource.json')
print(resource.name)
# and others
name

And edit them:

from frictionless import Resource

resource = Resource('resource.json')
resource.name = 'new-name'
resource.title = 'New Title'
resource.description = 'New Description'
# and others
print(resource)
{'name': 'new-name',
 'title': 'New Title',
 'description': 'New Description',
 'path': 'table.csv'}

Saving Descriptor

As any of the Metadata classes the Resource class can be saved as JSON or YAML:

from frictionless import Resource
resource = Resource('table.csv')
resource.to_json('resource.json') # Save as JSON
resource.to_yaml('resource.yaml') # Save as YAML

Resource Lifecycle

You might have noticed that we had to duplicate the with Resource(...) statement in some examples. The reason is that Resource is a streaming interface. Once it's read you need to open it again. Let's show it in an example:

from pprint import pprint
from frictionless import Resource

resource = Resource('capital-3.csv')
resource.open()
pprint(resource.read_rows())
pprint(resource.read_rows())
# We need to re-open: there is no data left
resource.open()
pprint(resource.read_rows())
# We need to close manually: not context manager is used
resource.close()
[{'id': 1, 'name': 'London'},
 {'id': 2, 'name': 'Berlin'},
 {'id': 3, 'name': 'Paris'},
 {'id': 4, 'name': 'Madrid'},
 {'id': 5, 'name': 'Rome'}]
[]
[{'id': 1, 'name': 'London'},
 {'id': 2, 'name': 'Berlin'},
 {'id': 3, 'name': 'Paris'},
 {'id': 4, 'name': 'Madrid'},
 {'id': 5, 'name': 'Rome'}]

At the same you can read data for a resource without opening and closing it explicitly. In this case Frictionless Framework will open and close the resource for you so it will be basically a one-time operation:

from frictionless import Resource

resource = Resource('capital-3.csv')
pprint(resource.read_rows())
[{'id': 1, 'name': 'London'},
 {'id': 2, 'name': 'Berlin'},
 {'id': 3, 'name': 'Paris'},
 {'id': 4, 'name': 'Madrid'},
 {'id': 5, 'name': 'Rome'}]

Reading Data

The Resource class is also a metadata class which provides various read and stream functions. The extract functions always read rows into memory; Resource can do the same but it also gives a choice regarding output data. It can be rows, data, text, or bytes. Let's try reading all of them:

from frictionless import Resource

resource = Resource('country-3.csv')
pprint(resource.read_bytes())
pprint(resource.read_text())
pprint(resource.read_cells())
pprint(resource.read_rows())
(b'id,capital_id,name,population\n1,1,Britain,67\n2,3,France,67\n3,2,Germany,8'
 b'3\n4,5,Italy,60\n5,4,Spain,47\n')
('id,capital_id,name,population\n'
 '1,1,Britain,67\n'
 '2,3,France,67\n'
 '3,2,Germany,83\n'
 '4,5,Italy,60\n'
 '5,4,Spain,47\n')
[['id', 'capital_id', 'name', 'population'],
 ['1', '1', 'Britain', '67'],
 ['2', '3', 'France', '67'],
 ['3', '2', 'Germany', '83'],
 ['4', '5', 'Italy', '60'],
 ['5', '4', 'Spain', '47']]
[{'id': 1, 'capital_id': 1, 'name': 'Britain', 'population': 67},
 {'id': 2, 'capital_id': 3, 'name': 'France', 'population': 67},
 {'id': 3, 'capital_id': 2, 'name': 'Germany', 'population': 83},
 {'id': 4, 'capital_id': 5, 'name': 'Italy', 'population': 60},
 {'id': 5, 'capital_id': 4, 'name': 'Spain', 'population': 47}]

It's really handy to read all your data into memory but it's not always possible if a file is really big. For such cases, Frictionless provides streaming functions:

from frictionless import Resource

with Resource('country-3.csv') as resource:
    pprint(resource.byte_stream)
    pprint(resource.text_stream)
    pprint(resource.cell_stream)
    pprint(resource.row_stream)
    for row in resource.row_stream:
      print(row)
<frictionless.resource.loader.ByteStreamWithStatsHandling object at 0x7fd73b31f6d0>
<_io.TextIOWrapper name='country-3.csv' encoding='utf-8'>
<itertools.chain object at 0x7fd73b554550>
<generator object Resource.__prepare_row_stream.<locals>.row_stream at 0x7fd73b144900>
{'id': 1, 'capital_id': 1, 'name': 'Britain', 'population': 67}
{'id': 2, 'capital_id': 3, 'name': 'France', 'population': 67}
{'id': 3, 'capital_id': 2, 'name': 'Germany', 'population': 83}
{'id': 4, 'capital_id': 5, 'name': 'Italy', 'population': 60}
{'id': 5, 'capital_id': 4, 'name': 'Spain', 'population': 47}

Scheme

The scheme also know as protocol indicates which loader Frictionless should use to read or write data. It can be file (default), text, http, https, s3, and others.

from frictionless import Resource

with Resource(b'header1,header2\nvalue1,value2', format='csv') as resource:
  print(resource.scheme)
  print(resource.to_view())
buffer
+----------+----------+
| header1  | header2  |
+==========+==========+
| 'value1' | 'value2' |
+----------+----------+

Format

The format or as it's also called extension helps Frictionless to choose a proper parser to handle the file. Popular formats are csv, xlsx, json and others

from frictionless import Resource

with Resource(b'header1,header2\nvalue1,value2.csv', format='csv') as resource:
  print(resource.format)
  print(resource.to_view())
csv
+----------+--------------+
| header1  | header2      |
+==========+==============+
| 'value1' | 'value2.csv' |
+----------+--------------+

Encoding

Frictionless automatically detects encoding of files but sometimes it can be inaccurate. It's possible to provide an encoding manually:

from frictionless import Resource

with Resource('country-3.csv', encoding='utf-8') as resource:
  print(resource.encoding)
  print(resource.path)
utf-8
country-3.csv
utf-8
data/country-3.csv

Innerpath

By default, Frictionless uses the first file found in a zip archive. It's possible to adjust this behaviour:

from frictionless import Resource

with Resource('table-multiple-files.zip', innerpath='table-reverse.csv') as resource:
  print(resource.compression)
  print(resource.innerpath)
  print(resource.to_view())
zip
table-reverse.csv
+----+-----------+
| id | name      |
+====+===========+
|  1 | '中国人'     |
+----+-----------+
|  2 | 'english' |
+----+-----------+

Compression

It's possible to adjust compression detection by providing the algorithm explicitly. For the example below it's not required as it would be detected anyway:

from frictionless import Resource

with Resource('table.csv.zip', compression='zip') as resource:
  print(resource.compression)
  print(resource.to_view())
zip
+----+-----------+
| id | name      |
+====+===========+
|  1 | 'english' |
+----+-----------+
|  2 | '中国人'     |
+----+-----------+

Dialect

The Dialect adjusts the way the parsers work. The concept is similar to the Control above. Let's use the CSV Dialect to adjust the delimiter configuration:

from frictionless import Resource
from frictionless.plugins.csv import CsvDialect

source = b'header1;header2\nvalue1;value2'
dialect = CsvDialect(delimiter=';')
with Resource(source, format='csv', dialect=dialect) as resource:
  print(resource.dialect)
  print(resource.to_view())
{'delimiter': ';'}
+----------+----------+
| header1  | header2  |
+==========+==========+
| 'value1' | 'value2' |
+----------+----------+

There are a great deal of options available for different dialects that can be found in "Formats Reference". We will list the properties that can be used with every dialect:

Schema

Please read Schema Guide for more information.

Stats

Resource's stats can be accessed with resource.stats:

from frictionless import Resource

resource = Resource('table.csv')
resource.infer(stats=True)
print(resource.stats)
{'md5': '6c2c61dd9b0e9c6876139a449ed87933',
 'sha256': 'a1fd6c5ff3494f697874deeb07f69f8667e903dd94a7bc062dd57550cea26da8',
 'bytes': 30,
 'fields': 2,
 'rows': 2}

Reference

Resource (class)

Loader (class)

Parser (class)

Resource (class)

Resource representation. This class is one of the cornerstones of of Frictionless framework. It loads a data source, and allows you to stream its parsed contents. At the same time, it's a metadata class data description. ```python with Resource("data/table.csv") as resource: resource.header == ["id", "name"] resource.read_rows() == [ {'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}, ] ```

Signature

(source: Optional[Any] = None, control: Optional[Control] = None, *, name: Optional[str] = None, type: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, homepage: Optional[str] = None, profiles: List[Union[IProfile, str]] = [], licenses: List[dict] = [], sources: List[dict] = [], path: Optional[str] = None, data: Optional[Any] = None, scheme: Optional[str] = None, format: Optional[str] = None, encoding: Optional[str] = None, mediatype: Optional[str] = None, compression: Optional[str] = None, extrapaths: List[str] = [], innerpath: Optional[str] = None, dialect: Optional[Union[Dialect, str]] = None, schema: Optional[Union[Schema, str]] = None, checklist: Optional[Union[Checklist, str]] = None, pipeline: Optional[Union[Pipeline, str]] = None, stats: Optional[Stats] = None, basepath: Optional[str] = None, detector: Optional[Detector] = None, package: Optional[Package] = None)

Parameters

  • source (Optional[Any])
  • control (Optional[Control])
  • name (Optional[str])
  • type (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • homepage (Optional[str])
  • profiles (List[Union[IProfile, str]])
  • licenses (List[dict])
  • sources (List[dict])
  • path (Optional[str])
  • data (Optional[Any])
  • scheme (Optional[str])
  • format (Optional[str])
  • encoding (Optional[str])
  • mediatype (Optional[str])
  • compression (Optional[str])
  • extrapaths (List[str])
  • innerpath (Optional[str])
  • dialect (Optional[Union[Dialect, str]])
  • schema (Optional[Union[Schema, str]])
  • checklist (Optional[Union[Checklist, str]])
  • pipeline (Optional[Union[Pipeline, str]])
  • stats (Optional[Stats])
  • basepath (Optional[str])
  • detector (Optional[Detector])
  • package (Optional[Package])

resource.name (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.type (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.title (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.description (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.homepage (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.profiles (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

List[Union[IProfile, str]]

resource.licenses (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

List[dict]

resource.sources (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

List[dict]

resource.path (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.data (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[Any]

resource.scheme (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.format (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.encoding (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.mediatype (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.compression (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.extrapaths (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

List[str]

resource.innerpath (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[str]

resource.detector (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Detector

resource.package (property)

Resource name according to the specs. It should be a slugified name of the resource.

Signature

Optional[Package]

resource.basepath (property)

A basepath of the resource The normpath of the resource is joined `basepath` and `/path`

Signature

Optional[str]

resource.buffer (property)

File's bytes used as a sample These buffer bytes are used to infer characteristics of the source file (e.g. encoding, ...).

Signature

IBuffer

resource.byte_stream (property)

Byte stream in form of a generator

Signature

IByteStream

resource.cell_stream (property)

Cell stream in form of a generator

Signature

ICellStream

resource.checklist (property)

Checklist object. For more information, please check the Checklist documentation.

Signature

(Optional[Union[Checklist, str]]) -> Optional[Checklist]

resource.closed (property)

Whether the table is closed

Signature

bool

resource.dialect (property)

File Dialect object. For more information, please check the Dialect documentation.

Signature

(Optional[Union[Dialect, str]]) -> Dialect

resource.fragment (property)

Table's lists used as fragment. These fragment rows are used internally to infer characteristics of the source file (e.g. schema, ...).

Signature

IFragment

resource.header (property)

Signature

Header

resource.labels (property)

Signature

ILabels

resource.lookup (property)

Signature

Lookup

resource.memory (property)

Whether resource is not path based

Signature

bool

resource.multipart (property)

Whether resource is multipart

Signature

bool

resource.normdata (property)

Normalized data or raise if not set

Signature

Any

resource.normpath (property)

Normalized path of the resource or raise if not set

Signature

str

resource.normpaths (property)

Normalized paths of the resource

Signature

List[str]

resource.paths (property)

All paths of the resource

Signature

List[str]

resource.pipeline (property)

Pipeline object. For more information, please check the Pipeline documentation.

Signature

(Optional[Union[Pipeline, str]]) -> Optional[Pipeline]

resource.place (property)

Stringified resource location

Signature

str

resource.remote (property)

Whether resource is remote

Signature

bool

resource.row_stream (property)

Row stream in form of a generator of Row objects

Signature

IRowStream

resource.sample (property)

Table's lists used as sample. These sample rows are used to infer characteristics of the source file (e.g. schema, ...).

Signature

ISample

resource.schema (property)

Table Schema object. For more information, please check the Schema documentation.

Signature

(Optional[Union[Schema, str]]) -> Schema

resource.stats (property)

Stats object. An object with the following possible properties: md5, sha256, bytes, fields, rows.

Signature

(Optional[Union[Stats, str]]) -> Stats

resource.tabular (property)

Whether resource is tabular

Signature

bool

resource.text_stream (property)

Text stream in form of a generator

Signature

ITextStream

resource.analyze (method)

Analyze the resource This feature is currently experimental, and its API may change without warning.

Signature

(: Resource, *, detailed=False) -> dict

Parameters

  • detailed

resource.close (method)

Close the resource as "filelike.close" does

Signature

() -> None

Resource.describe (method) (static)

Describe the given source as a resource

Signature

(source: Optional[Any] = None, *, stats: bool = False, **options)

Parameters

  • source (Optional[Any]): data source
  • stats (bool)
  • options

resource.extract (method)

Extract resource rows

Signature

(: Resource, *, limit_rows: Optional[int] = None, process: Optional[IProcessFunction] = None, filter: Optional[IFilterFunction] = None, stream: bool = False)

Parameters

  • limit_rows (Optional[int])
  • process (Optional[IProcessFunction])
  • filter (Optional[IFilterFunction])
  • stream (bool)

Resource.from_petl (method) (static)

Create a resource from PETL view

Signature

(view, **options)

Parameters

  • view
  • options

resource.infer (method)

Infer metadata

Signature

(*, sample: bool = True, stats: bool = False) -> None

Parameters

  • sample (bool)
  • stats (bool)

resource.open (method)

Open the resource as "io.open" does

Signature

(*, as_file: bool = False)

Parameters

  • as_file (bool)

resource.read_bytes (method)

Read bytes into memory

Signature

(*, size: Optional[int] = None) -> bytes

Parameters

  • size (Optional[int])

resource.read_cells (method)

Read lists into memory

Signature

(*, size: Optional[int] = None) -> List[List[Any]]

Parameters

  • size (Optional[int])

resource.read_data (method)

Read data into memory

Signature

(*, size: Optional[int] = None) -> Any

Parameters

  • size (Optional[int])

resource.read_rows (method)

Read rows into memory

Signature

(*, size=None) -> List[Row]

Parameters

  • size

resource.read_text (method)

Read text into memory

Signature

(*, size: Optional[int] = None) -> str

Parameters

  • size (Optional[int])

resource.to_copy (method)

Create a copy from the resource

Signature

(**options)

Parameters

  • options

resource.to_inline (method)

Helper to export resource as an inline data

Signature

(*, dialect=None)

Parameters

  • dialect

resource.to_pandas (method)

Helper to export resource as an Pandas dataframe

Signature

(*, dialect=None)

Parameters

  • dialect

resource.to_petl (method)

Export resource as a PETL table

Signature

(normalize=False)

Parameters

  • normalize

resource.to_snap (method)

Create a snapshot from the resource

Signature

(*, json=False)

Parameters

  • json : make data types compatible with JSON format

resource.to_view (method)

Create a view from the resource See PETL's docs for more information: https://platform.petl.readthedocs.io/en/stable/util.html#visualising-tables

Signature

(type=look, **options)

Parameters

  • type : view's type
  • options

resource.transform (method)

Transform resource

Signature

(: Resource, pipeline: Optional[Pipeline] = None)

Parameters

  • pipeline (Optional[Pipeline])

resource.validate (method)

Validate resource

Signature

(: Resource, checklist: Optional[Checklist] = None, *, limit_errors: int = 1000, limit_rows: Optional[int] = None)

Parameters

  • checklist (Optional[Checklist])
  • limit_errors (int)
  • limit_rows (Optional[int])

resource.write (method)

Write this resource to the target resource

Signature

(target: Optional[Union[Resource, Any]] = None, *, control: Optional[Control] = None, **options) -> Resource

Parameters

  • target (Optional[Union[Resource, Any]]): target or target resource instance
  • control (Optional[Control])
  • options

Loader (class)

Loader representation

Signature

(resource: Resource)

Parameters

  • resource (Resource): resource

loader.remote (property)

TODO: add docs

Signature

bool

loader.buffer (property)

Signature

IBuffer

loader.byte_stream (property)

Resource byte stream The stream is available after opening the loader

Signature

IByteStream

loader.closed (property)

Whether the loader is closed

Signature

bool

loader.resource (property)

Signature

Resource

loader.text_stream (property)

Resource text stream The stream is available after opening the loader

Signature

ITextStream

loader.close (method)

Close the loader as "filelike.close" does

Signature

() -> None

loader.open (method)

Open the loader as "io.open" does

loader.read_byte_stream (method)

Read bytes stream

Signature

() -> IByteStream

loader.read_byte_stream_analyze (method)

Detect metadta using sample

Signature

(buffer)

Parameters

  • buffer : byte buffer

loader.read_byte_stream_buffer (method)

Buffer byte stream

Signature

(byte_stream)

Parameters

  • byte_stream : resource byte stream

loader.read_byte_stream_create (method)

Create bytes stream

Signature

() -> IByteStream

loader.read_byte_stream_decompress (method)

Decompress byte stream

Signature

(byte_stream: IByteStream) -> IByteStream

Parameters

  • byte_stream (IByteStream): resource byte stream

loader.read_byte_stream_process (method)

Process byte stream

Signature

(byte_stream: IByteStream) -> ByteStreamWithStatsHandling

Parameters

  • byte_stream (IByteStream): resource byte stream

loader.read_text_stream (method)

Read text stream

loader.write_byte_stream (method)

Write from a temporary file

Signature

(path) -> Any

Parameters

  • path : path to a temporary file

loader.write_byte_stream_create (method)

Create byte stream for writing

Signature

(path) -> IByteStream

Parameters

  • path : path to a temporary file

loader.write_byte_stream_save (method)

Store byte stream

Signature

(byte_stream) -> Any

Parameters

  • byte_stream

Parser (class)

Parser representation

Signature

(resource: Resource)

Parameters

  • resource (Resource): resource

parser.requires_loader (property)

NOTE: add docs

Signature

ClassVar[bool]

parser.supported_types (property)

NOTE: add docs

Signature

ClassVar[List[str]]

parser.cell_stream (property)

Signature

ICellStream

parser.closed (property)

Whether the parser is closed

Signature

bool

parser.loader (property)

Signature

Loader

parser.resource (property)

Signature

Resource

parser.sample (property)

Signature

ISample

parser.close (method)

Close the parser as "filelike.close" does

Signature

() -> None

parser.open (method)

Open the parser as "io.open" does

parser.read_cell_stream (method)

Read list stream

Signature

() -> ICellStream

parser.read_cell_stream_create (method)

Create list stream from loader

Signature

() -> ICellStream

parser.read_cell_stream_handle_errors (method)

Wrap list stream into error handler

Signature

(cell_stream: ICellStream) -> CellStreamWithErrorHandling

Parameters

  • cell_stream (ICellStream)

parser.read_loader (method)

Create and open loader

Signature

() -> Optional[Loader]

parser.write_row_stream (method)

Write row stream from the source resource

Signature

(source: Resource) -> None

Parameters

  • source (Resource): source resource
It's a beta version of Frictionless Framework (v5). Read Frictionless Framework (v4) docs for a version that is currently installed by default by pip.