Edit page in Livemark
(2024-11-07 15:17)

Schema Class

The Table Schema is a core Frictionless Data concept meaning a metadata information regarding tabular data source. You can read Table Schema Standard for more information.

Creating Schema

Let's create a table schema:

from frictionless import Schema, fields, describe

schema = describe('table.csv', type='schema') # from a resource path
schema = Schema.from_descriptor('schema.json') # from a descriptor path
schema = Schema.from_descriptor({'fields': [{'name': 'id', 'type': 'integer'}]}) # from a descriptor

As you can see it's possible to create a schema providing different kinds of sources which will be detector to have some type automatically (e.g. whether it's a dict or a path). It's possible to make this step more explicit:

from frictionless import Schema, Field

schema = Schema(fields=[fields.StringField(name='id')]) # from fields
schema = Schema.from_descriptor('schema.json') # from a descriptor

Describing Schema

The standard support some additional schema's metadata:

from frictionless import Schema, fields

schema = Schema(
    fields=[fields.StringField(name='id')],
    missing_values=['na'],
    primary_key=['id'],
    # foreign_keys
)
print(schema)
{'fields': [{'name': 'id', 'type': 'string'}],
 'missingValues': ['na'],
 'primaryKey': ['id']}

If you have created a schema, for example, from a descriptor you can access this properties:

from frictionless import Schema

schema = Schema.from_descriptor('schema.json')
print(schema.missing_values)
# and others
['']

And edit them:

from frictionless import Schema

schema = Schema.from_descriptor('schema.json')
schema.missing_values.append('-')
# and others
print(schema)
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'}],
 'missingValues': ['', '-']}

Field Management

The Schema class provides useful methods to manage fields:

from frictionless import Schema, fields

schema = Schema.from_descriptor('schema.json')
print(schema.fields)
print(schema.field_names)
schema.add_field(fields.StringField(name='new-name'))
field = schema.get_field('new-name')
print(schema.has_field('new-name'))
schema.remove_field('new-name')
[{'name': 'id', 'type': 'integer'}, {'name': 'name', 'type': 'string'}]
['id', 'name']
True

Saving Descriptor

As any of the Metadata classes the Schema class can be saved as JSON or YAML:

from frictionless import Schema, fields
schema = Schema(fields=[fields.IntegerField(name='id')])
schema.to_json('schema.json') # Save as JSON
schema.to_yaml('schema.yaml') # Save as YAML

Reading Cells

During the process of data reading a resource uses a schema to convert data:

from frictionless import Schema, fields

schema = Schema(fields=[fields.IntegerField(name='integer'), fields.StringField(name='string')])
cells, notes = schema.read_cells(['3', 'value'])
print(cells)
[3, 'value']

Writing Cells

During the process of data writing a resource uses a schema to convert data:

from frictionless import Schema, fields

schema = Schema(fields=[fields.IntegerField(name='integer'), fields.StringField(name='string')])
cells, notes = schema.write_cells([3, 'value'])
print(cells)
[3, 'value']

Creating Field

Let's create a field:

from frictionless import fields

field = fields.IntegerField(name='name')
print(field)
{'name': 'name', 'type': 'integer'}

Usually we work with fields which were already created by a schema:

from frictionless import describe

resource = describe('table.csv')
field = resource.schema.get_field('id')
print(field)
{'name': 'id', 'type': 'integer'}

Field Types

Frictionless Framework supports all the Table Schema Standard field types along with an ability to create custom types.

For some types there are additional properties available:

from frictionless import describe

resource = describe('table.csv')
field = resource.schema.get_field('id') # it's an integer
print(field.bare_number)
True

See the complete reference at Tabular Fields.

Reading Cell

During the process of data reading a schema uses a field internally. If needed a user can convert their data using this interface:

from frictionless import fields

field = fields.IntegerField(name='name')
cell, note = field.read_cell('3')
print(cell)
3

Writing Cell

During the process of data writing a schema uses a field internally. The same as with reading a user can convert their data using this interface:

from frictionless import fields

field = fields.IntegerField(name='name')
cell, note = field.write_cell(3)
print(cell)
3

Reference

Schema (class)

Field (class)

Schema (class)

Schema representation This class is one of the cornerstones of of Frictionless framework. It allow to work with Table Schema and its fields. ```python schema = Schema('schema.json') schema.add_fied(Field(name='name', type='string')) ```

Signature

(*, descriptor: Optional[Union[types.IDescriptor, str]] = None, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, fields: List[Field] = NOTHING, missing_values: List[str] = NOTHING, primary_key: List[str] = NOTHING, foreign_keys: List[Dict[str, Any]] = NOTHING) -> None

Parameters

  • descriptor (Optional[Union[types.IDescriptor, str]])
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • fields (List[Field])
  • missing_values (List[str])
  • primary_key (List[str])
  • foreign_keys (List[Dict[str, Any]])

schema.descriptor (property)

# TODO: add docs

Signature

Optional[Union[types.IDescriptor, str]]

schema.name (property)

A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.

Signature

Optional[str]

schema.type (property)

Type of the object

Signature

ClassVar[Union[str, None]]

schema.title (property)

A human-oriented title for the Schema.

Signature

Optional[str]

schema.description (property)

A brief description of the Schema.

Signature

Optional[str]

schema.fields (property)

A List of fields in the schema.

Signature

List[Field]

schema.missing_values (property)

List of string values to be set as missing values in the schema fields. If any of string in missing values is found in any of the field value then it is set as None.

Signature

List[str]

schema.primary_key (property)

Specifies primary key for the schema.

Signature

List[str]

schema.foreign_keys (property)

Specifies the foreign keys for the schema.

Signature

List[Dict[str, Any]]

schema.field_names (property)

List of field names

Signature

List[str]

schema.field_types (property)

List of field types

Signature

List[str]

schema.add_field (method)

Add new field to the schema

Signature

(field: Field, *, position: Optional[int] = None) -> None

Parameters

  • field (Field)
  • position (Optional[int])

schema.clear_fields (method)

Remove all the fields

Signature

() -> None

Schema.describe (method) (static)

Describe the given source as a schema

Signature

(source: Optional[Any] = None, **options: Any) -> Schema

Parameters

  • source (Optional[Any]): data source
  • options (Any)

schema.flatten (method)

Flatten the schema Parameters spec (str[]): flatten specification

Signature

(spec: List[str] = [name, type])

Parameters

  • spec (List[str])

Schema.from_jsonschema (method) (static)

Create a Schema from JSONSchema profile

Signature

(profile: Union[types.IDescriptor, str]) -> Schema

Parameters

  • profile (Union[types.IDescriptor, str]): path or dict with JSONSchema profile

schema.get_field (method)

Get field by name

Signature

(name: str) -> Field

Parameters

  • name (str)

schema.has_field (method)

Check if a field is present

Signature

(name: str) -> bool

Parameters

  • name (str)

schema.read_cells (method)

Read a list of cells (normalize/cast)

Signature

(cells: List[Any])

Parameters

  • cells (List[Any]): list of cells

schema.remove_field (method)

Remove field by name

Signature

(name: str) -> Field

Parameters

  • name (str)

schema.set_field (method)

Set field by name

Signature

(field: Field) -> Optional[Field]

Parameters

  • field (Field)

schema.set_field_type (method)

Set field type

Signature

(name: str, type: str) -> Field

Parameters

  • name (str)
  • type (str)

schema.to_excel_template (method)

Export schema as an excel template

Signature

(path: str) -> None

Parameters

  • path (str): path of excel file to create with ".xlsx" extension

schema.to_summary (method)

Summary of the schema in table format

Signature

() -> str

schema.update_field (method)

Update field

Signature

(name: str, descriptor: types.IDescriptor) -> Field

Parameters

  • name (str)
  • descriptor (types.IDescriptor)

schema.write_cells (method)

Write a list of cells (normalize/uncast)

Signature

(cells: List[Any], *, types: List[str] = [])

Parameters

  • cells (List[Any]): list of cells
  • types (List[str])

Field (class)

Field representation

Signature

(*, name: str, title: Optional[str] = None, description: Optional[str] = None, format: str = default, missing_values: List[str] = NOTHING, constraints: Dict[str, Any] = NOTHING, rdf_type: Optional[str] = None, example: Optional[str] = None, schema: Optional[Schema] = None) -> None

Parameters

  • name (str)
  • title (Optional[str])
  • description (Optional[str])
  • format (str)
  • missing_values (List[str])
  • constraints (Dict[str, Any])
  • rdf_type (Optional[str])
  • example (Optional[str])
  • schema (Optional[Schema])

field.name (property)

A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.

Signature

str

field.type (property)

Type of the field such as "boolean", "integer" etc.

Signature

ClassVar[str]

field.title (property)

A human-oriented title for the Field.

Signature

Optional[str]

field.description (property)

A brief description of the Field.

Signature

Optional[str]

field.format (property)

Format of the field to specify different value readers for the field type. For example: "default","array" etc.

Signature

str

field.missing_values (property)

List of string values to be set as missing values in the field. If any of string in missing values is found in the field value then it is set as None.

Signature

List[str]

field.constraints (property)

A dictionary with rules that constraints the data value permitted for a field.

Signature

Dict[str, Any]

field.rdf_type (property)

RDF type. Indicates whether the field is of RDF type.

Signature

Optional[str]

field.example (property)

An example of a value for the field.

Signature

Optional[str]

field.schema (property)

Schema class of which the field is part of.

Signature

Optional[Schema]

field.builtin (property)

Specifies if field is the builtin feature.

Signature

ClassVar[bool]

field.supported_constraints (property)

List of supported constraints for a field.

Signature

ClassVar[List[str]]

field.required (property)

Indicates if field is mandatory.

Signature

(bool) ->