Edit page in Livemark
(2024-01-29 13:37)

Html Format

Frictionless supports parsing HTML format:

pip install frictionless[html]
pip install 'frictionless[html]' # for zsh shell

Reading Data

You can this file format using Package/Resource, for example:

from pprint import pprint
from frictionless import resources

resource = resources.TableResource(path='table1.html')
pprint(resource.read_rows())
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]

Writing Data

The same is actual for writing:

from frictionless import Resource, resources

source = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']])
target = resources.TableResource(path='table-output.html')
source.write(target)
print(target)
print(target.to_view())
{'name': 'table-output',
 'type': 'table',
 'path': 'table-output.html',
 'scheme': 'file',
 'format': 'html',
 'mediatype': 'text/html'}
+----+-----------+
| id | name      |
+====+===========+
|  1 | 'english' |
+----+-----------+
|  2 | 'german'  |
+----+-----------+

Configuration

There is a dialect to configure HTML, for example:

from frictionless import Resource, resources, formats

control=formats.HtmlControl(selector='#id')
resource = resources.TableResource(path='table1.html', control=control)
print(resource.read_rows())
[]

Reference

formats.HtmlControl (class)

formats.HtmlControl (class)

Html control representation. Control class to set params for Html reader/writer.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, selector: str = table) -> None

Parameters

  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • selector (str)

formats.htmlControl.selector (property)

Any valid css selector. Default selector is 'table'. For example: "table", "#id", ".meme" etc.

Signature

str