The Table Schema is a core Frictionless Data concept meaning a metadata information regarding tabular data source. You can read Table Schema Standard for more information.
Let's create a table schema:
from frictionless import Schema, fields, describe
schema = describe('table.csv', type='schema') # from a resource path
schema = Schema.from_descriptor('schema.json') # from a descriptor path
schema = Schema.from_descriptor({'fields': [{'name': 'id', 'type': 'integer'}]}) # from a descriptor
As you can see it's possible to create a schema providing different kinds of sources which will be detector to have some type automatically (e.g. whether it's a dict or a path). It's possible to make this step more explicit:
from frictionless import Schema, Field
schema = Schema(fields=[fields.StringField(name='id')]) # from fields
schema = Schema.from_descriptor('schema.json') # from a descriptor
The standard support some additional schema's metadata:
from frictionless import Schema, fields
schema = Schema(
fields=[fields.StringField(name='id')],
missing_values=['na'],
primary_key=['id'],
# foreign_keys
)
print(schema)
{'fields': [{'name': 'id', 'type': 'string'}],
'missingValues': ['na'],
'primaryKey': ['id']}
If you have created a schema, for example, from a descriptor you can access this properties:
from frictionless import Schema
schema = Schema.from_descriptor('schema.json')
print(schema.missing_values)
# and others
['']
And edit them:
from frictionless import Schema
schema = Schema.from_descriptor('schema.json')
schema.missing_values.append('-')
# and others
print(schema)
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'}],
'missingValues': ['', '-']}
The Schema class provides useful methods to manage fields:
from frictionless import Schema, fields
schema = Schema.from_descriptor('schema.json')
print(schema.fields)
print(schema.field_names)
schema.add_field(fields.StringField(name='new-name'))
field = schema.get_field('new-name')
print(schema.has_field('new-name'))
schema.remove_field('new-name')
[{'name': 'id', 'type': 'integer'}, {'name': 'name', 'type': 'string'}]
['id', 'name']
True
As any of the Metadata classes the Schema class can be saved as JSON or YAML:
from frictionless import Schema, fields
schema = Schema(fields=[fields.IntegerField(name='id')])
schema.to_json('schema.json') # Save as JSON
schema.to_yaml('schema.yaml') # Save as YAML
During the process of data reading a resource uses a schema to convert data:
from frictionless import Schema, fields
schema = Schema(fields=[fields.IntegerField(name='integer'), fields.StringField(name='string')])
cells, notes = schema.read_cells(['3', 'value'])
print(cells)
[3, 'value']
During the process of data writing a resource uses a schema to convert data:
from frictionless import Schema, fields
schema = Schema(fields=[fields.IntegerField(name='integer'), fields.StringField(name='string')])
cells, notes = schema.write_cells([3, 'value'])
print(cells)
[3, 'value']
Let's create a field:
from frictionless import fields
field = fields.IntegerField(name='name')
print(field)
{'name': 'name', 'type': 'integer'}
Usually we work with fields which were already created by a schema:
from frictionless import describe
resource = describe('table.csv')
field = resource.schema.get_field('id')
print(field)
{'name': 'id', 'type': 'integer'}
Frictionless Framework supports all the Table Schema Standard field types along with an ability to create custom types.
For some types there are additional properties available:
from frictionless import describe
resource = describe('table.csv')
field = resource.schema.get_field('id') # it's an integer
print(field.bare_number)
True
See the complete reference at Tabular Fields.
During the process of data reading a schema uses a field internally. If needed a user can convert their data using this interface:
from frictionless import fields
field = fields.IntegerField(name='name')
cell, note = field.read_cell('3')
print(cell)
3
During the process of data writing a schema uses a field internally. The same as with reading a user can convert their data using this interface:
from frictionless import fields
field = fields.IntegerField(name='name')
cell, note = field.write_cell(3)
print(cell)
3
Schema representation This class is one of the cornerstones of of Frictionless framework. It allow to work with Table Schema and its fields. ```python schema = Schema('schema.json') schema.add_fied(Field(name='name', type='string')) ```
(*, descriptor: Optional[Union[types.IDescriptor, str]] = None, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, fields: List[Field] = NOTHING, missing_values: List[str] = NOTHING, primary_key: List[str] = NOTHING, foreign_keys: List[Dict[str, Any]] = NOTHING) -> None
# TODO: add docs
Optional[Union[types.IDescriptor, str]]
A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.
Optional[str]
Type of the object
ClassVar[Union[str, None]]
A human-oriented title for the Schema.
Optional[str]
A brief description of the Schema.
Optional[str]
A List of fields in the schema.
List[Field]
List of string values to be set as missing values in the schema fields. If any of string in missing values is found in any of the field value then it is set as None.
List[str]
Specifies primary key for the schema.
List[str]
Specifies the foreign keys for the schema.
List[Dict[str, Any]]
List of field names
List[str]
List of field types
List[str]
Add new field to the schema
(field: Field, *, position: Optional[int] = None) -> None
Remove all the fields
() -> None
Describe the given source as a schema
(source: Optional[Any] = None, **options: Any) -> Schema
Flatten the schema Parameters spec (str[]): flatten specification
(spec: List[str] = [name, type])
Create a Schema from JSONSchema profile
(profile: Union[types.IDescriptor, str]) -> Schema
Get field by name
(name: str) -> Field
Check if a field is present
(name: str) -> bool
Read a list of cells (normalize/cast)
(cells: List[Any])
Remove field by name
(name: str) -> Field
Set field by name
(field: Field) -> Optional[Field]
Set field type
(name: str, type: str) -> Field
Export schema as an excel template
(path: str) -> None
Summary of the schema in table format
() -> str
Update field
(name: str, descriptor: types.IDescriptor) -> Field
Write a list of cells (normalize/uncast)
(cells: List[Any], *, types: List[str] = [])
Field representation
(*, name: str, title: Optional[str] = None, description: Optional[str] = None, format: str = default, missing_values: List[str] = NOTHING, constraints: Dict[str, Any] = NOTHING, rdf_type: Optional[str] = None, example: Optional[str] = None, schema: Optional[Schema] = None) -> None
A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.
str
Type of the field such as "boolean", "integer" etc.
ClassVar[str]
A human-oriented title for the Field.
Optional[str]
A brief description of the Field.
Optional[str]
Format of the field to specify different value readers for the field type. For example: "default","array" etc.
str
List of string values to be set as missing values in the field. If any of string in missing values is found in the field value then it is set as None.
List[str]
A dictionary with rules that constraints the data value permitted for a field.
Dict[str, Any]
RDF type. Indicates whether the field is of RDF type.
Optional[str]
An example of a value for the field.
Optional[str]
Schema class of which the field is part of.
Optional[Schema]
Specifies if field is the builtin feature.
ClassVar[bool]
List of supported constraints for a field.
ClassVar[List[str]]
Indicates if field is mandatory.
(bool) ->