Edit page in Livemark
(2024-12-13 12:49)

Dialect Class

The Table Dialect is a core Frictionless Data concept meaning a metadata information regarding tabular data source. The Table Dialect concept give us an ability to manage table header and any details related to specific formats.

Dialect

The Dialect class instance are accepted by many classes and functions:

Resource
describe
extract
validate
and more

You just need to create a Dialect instance using desired options and pass to the classed and function from above. We will show it on this examplar table:

cat capital-3.csv

id,name
1,London
2,Berlin
3,Paris
4,Madrid
5,Rome

Header

It's a boolean flag which defaults to True indicating whether the data has a header row or not. In the following example the header row will be treated as a data row:

Python

from frictionless import Resource, Dialect

dialect = Dialect(header=False)
with Resource('capital-3.csv', dialect=dialect) as resource:
      print(resource.header.labels)
      print(resource.to_view())

[]
+--------+----------+
| field1 | field2   |
+========+==========+
| 'id'   | 'name'   |
+--------+----------+
| '1'    | 'London' |
+--------+----------+
| '2'    | 'Berlin' |
+--------+----------+
| '3'    | 'Paris'  |
+--------+----------+
| '4'    | 'Madrid' |
+--------+----------+
...

Header Rows

If header is True which is default, this parameters indicates where to find the header row or header rows for a multiline header. Let's see on example how the first two data rows can be treated as a part of a header:

Python

from frictionless import Resource, Dialect

dialect = Dialect(header_rows=[1, 2, 3])
with Resource('capital-3.csv', dialect=dialect) as resource:
    print(resource.header)
    print(resource.to_view())

['id 1 2', 'name London Berlin']
+--------+--------------------+
| id 1 2 | name London Berlin |
+========+====================+
|      3 | 'Paris'            |
+--------+--------------------+
|      4 | 'Madrid'           |
+--------+--------------------+
|      5 | 'Rome'             |
+--------+--------------------+

Header Join

If there are multiple header rows which is managed by header_rows parameter, we can set a string to be a separator for a header's cell join operation. Usually it's very handy for some "fancy" Excel files. For the sake of simplicity, we will show on a CSV file:

Python

from frictionless import Resource, Dialect

dialect = Dialect(header_rows=[1, 2, 3], header_join='/')
with Resource('capital-3.csv', dialect=dialect) as resource:
    print(resource.header)
    print(resource.to_view())

['id/1/2', 'name/London/Berlin']
+--------+--------------------+
| id/1/2 | name/London/Berlin |
+========+====================+
|      3 | 'Paris'            |
+--------+--------------------+
|      4 | 'Madrid'           |
+--------+--------------------+
|      5 | 'Rome'             |
+--------+--------------------+

Header Case

By default a header is validated in a case sensitive mode. To disable this behaviour we can set the header_case parameter to False. This option is accepted by any Dialect and a dialect can be passed to extract, validate and other functions. Please note that it doesn't affect a resulting header it only affects how it's validated:

Python

from frictionless import Resource, Schema, Dialect, fields

dialect = Dialect(header_case=False)
schema = Schema(fields=[fields.StringField(name="ID"), fields.StringField(name="NAME")])
with Resource('capital-3.csv', dialect=dialect, schema=schema) as resource:
  print(f'Header: {resource.header}')
  print(f'Valid: {resource.header.valid}')  # without "header_case" it will have 2 errors

Header: ['ID', 'NAME']
Valid: True

Comment Char

Specifies char used to comment the rows:

Python

from frictionless import Resource, Dialect

dialect = Dialect(comment_char="#")
with Resource(b'name\n#row1\nrow2', format="csv", dialect=dialect) as resource:
    print(resource.read_rows())

[{'name': 'row2'}]

Comment Rows

A list of rows to ignore:

Python

from frictionless import Resource, Dialect

dialect = Dialect(comment_rows=[2])
with Resource(b'name\nrow1\nrow2', format="csv", dialect=dialect) as resource:
    print(resource.read_rows())

[{'name': 'row2'}]

Skip Blank Rows

Ignores rows if they are completely blank.

Python

from frictionless import Resource, Dialect

dialect = Dialect(skip_blank_rows=True)
with Resource(b'name\n\nrow2', format="csv", dialect=dialect) as resource:
    print(resource.read_rows())

[{'name': 'row2'}]

Dialect (class)

Control (class)

Dialect (class)

Dialect representation

Signature

(*, descriptor: Optional[Union[types.IDescriptor, str]] = None, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, header: bool = True, header_rows: List[int] = NOTHING, header_join: str = , header_case: bool = True, comment_char: Optional[str] = None, comment_rows: List[int] = NOTHING, skip_blank_rows: bool = False, controls: List[Control] = NOTHING) -> None

Parameters

descriptor (Optional[Union[types.IDescriptor, str]])
name (Optional[str])
title (Optional[str])
description (Optional[str])
header (bool)
header_rows (List[int])
header_join (str)
header_case (bool)
comment_char (Optional[str])
comment_rows (List[int])
skip_blank_rows (bool)
controls (List[Control])

dialect.descriptor (property)

# TODO: add docs

Signature

Optional[Union[types.IDescriptor, str]]

dialect.name (property)

A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.

Signature

Optional[str]

dialect.type (property)

Type of the object

Signature

ClassVar[Union[str, None]]

dialect.title (property)

A human-oriented title for the Dialect.

Signature

Optional[str]

dialect.description (property)

A brief description of the Dialect.

Signature

Optional[str]

If true, the header will be read else header will be skipped.

Signature

bool

dialect.header_rows (property)

Specifies the row numbers for the header. Default is [1].

Signature

List[int]

dialect.header_join (property)

Separator to join text of two column's. The default value is " " and other values could be ":", "-" etc.

Signature

str

dialect.header_case (property)

If set to false, it does case insensitive matching of header. The default value is True.

Signature

bool

dialect.comment_char (property)

Specifies char used to comment the rows. The default value is None. For example: "#".

Signature

Optional[str]

dialect.comment_rows (property)

A list of rows to ignore. For example: [1, 2]

Signature

List[int]

dialect.skip_blank_rows (property)

Ignores rows if they are completely blank

Signature

bool

dialect.controls (property)

A list of controls which defines different aspects of reading data.

Signature

List[Control]

dialect.add_control (method)

Add new control to the schema

Signature

(control: Control) -> None

Parameters

control (Control)

Dialect.describe (method) (static)

Describe the given source as a dialect

Signature

(source: Optional[Any] = None, **options: Any) -> Dialect

Parameters

source (Optional[Any]): data source
options (Any)

dialect.get_control (method)

Get control by type

Signature

(type: str) -> Control

Parameters

type (str)

dialect.has_control (method)

Check if control is present

Signature

(type: str)

Parameters

type (str)

dialect.set_control (method)

Set control by type

Signature

(control: Control) -> Optional[Control]

Parameters

control (Control)

Control (class)

Control representation. This class is the base class for all the control classes that are used to set the states of various different components.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])

control.name (property)

A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.

Signature

Optional[str]

control.type (property)

Type of the control. It could be a zenodo plugin control, csv control etc. For example: "csv", "zenodo" etc

Signature

ClassVar[str]

control.title (property)

A human-oriented title for the control.

Signature

Optional[str]

control.description (property)

A brief description of the control.

Signature

Optional[str]

Schema Class »

« Resource Class

Dialect Class

Dialect

Header

Header Rows

Header Join

Header Case

Comment Char

Comment Rows

Skip Blank Rows

Reference

Dialect (class)

Control (class)

Dialect (class)

Signature

Parameters

dialect.descriptor (property)

Signature

dialect.name (property)

Signature

dialect.type (property)

Signature

dialect.title (property)

Signature

dialect.description (property)

Signature

dialect.header (property)

Signature

dialect.header_rows (property)

Signature

dialect.header_join (property)

Signature

dialect.header_case (property)

Signature

dialect.comment_char (property)

Signature

dialect.comment_rows (property)

Signature

dialect.skip_blank_rows (property)

Signature

dialect.controls (property)

Signature

dialect.add_control (method)

Signature

Parameters

Dialect.describe (method) (static)

Signature

Parameters

dialect.get_control (method)

Signature

Parameters

dialect.has_control (method)

Signature

Parameters

dialect.set_control (method)

Signature

Parameters

Control (class)

Signature

Parameters

control.name (property)

Signature

control.type (property)

Signature

control.title (property)

Signature

control.description (property)

Signature