Edit page in Livemark
(2024-09-06 14:05)

Excel Format

Excel is a very popular tabular data format that usually has xlsx (newer) and xls (older) file extensions. Frictionless supports Excel files extensively.

pip install frictionless[excel]
pip install 'frictionless[excel]' # for zsh shell

Reading Data

You can read this format using Package/Resource, for example:

from pprint import pprint
from frictionless import Resource

resource = Resource(path='table.xlsx')
pprint(resource.read_rows())
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]

Writing Data

The same is actual for writing:

from frictionless import Resource

source = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']])
target = source.write('table-output.xlsx')
print(target)
print(target.to_view())

Configuration

There is a dialect to configure how Frictionless read and write files in this format. For example:

from frictionless import Resource, formats

resource = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']])
resource.write('table-output-sheet.xls', control=formats.ExcelControl(sheet='My Table'))

Reference

formats.ExcelControl (class)

formats.ExcelControl (class)

Excel control representation. Control class to set params for Excel reader/writer.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, sheet: Union[str, int] = 1, workbook_cache: Optional[Any] = None, fill_merged_cells: bool = False, preserve_formatting: bool = False, adjust_floating_point_error: bool = False, stringified: bool = False) -> None

Parameters

  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • sheet (Union[str, int])
  • workbook_cache (Optional[Any])
  • fill_merged_cells (bool)
  • preserve_formatting (bool)
  • adjust_floating_point_error (bool)
  • stringified (bool)

formats.excelControl.sheet (property)

Name of the sheet from where to read or write data.

Signature

Union[str, int]

formats.excelControl.workbook_cache (property)

An empty dictionary which is used to handle workbook caching for remote workbooks. It stores the path to the temporary file while reading remote workbooks.

Signature

Optional[Any]

formats.excelControl.fill_merged_cells (property)

If True, it will unmerge and fill all merged cells by the visible value. Default value is False.

Signature

bool

formats.excelControl.preserve_formatting (property)

If set to True, it preserves text formatting for numeric and temporal cells. If not set, it will return all cell value as string. Default value is False.

Signature

bool

formats.excelControl.adjust_floating_point_error (property)

If True, it corrects the Excel behavior regarding floating point numbers. For example: 274.65999999999997 -> 274.66 (When True).

Signature

bool

formats.excelControl.stringified (property)

Stringifies all the cell values. Default value is False. Note that a table resource schema will still be applied and types coerced to match the schema (either provided or inferred) _after_ the rows are read as strings. To return all cells as strings then both set `stringified=True` and specify a schema that defines all fields to be of type string (see #1659).

Signature

bool