Frictionless supports parsing HTML format:
pip install frictionless[html]
pip install 'frictionless[html]' # for zsh shell
You can this file format using Package/Resource
, for example:
from pprint import pprint
from frictionless import resources
resource = resources.TableResource(path='table1.html')
pprint(resource.read_rows())
[{'id': 1, 'name': 'english'}, {'id': 2, 'name': '中国人'}]
The same is actual for writing:
from frictionless import Resource, resources
source = Resource(data=[['id', 'name'], [1, 'english'], [2, 'german']])
target = resources.TableResource(path='table-output.html')
source.write(target)
print(target)
print(target.to_view())
{'name': 'table-output',
'type': 'table',
'path': 'table-output.html',
'scheme': 'file',
'format': 'html',
'mediatype': 'text/html'}
+----+-----------+
| id | name |
+====+===========+
| 1 | 'english' |
+----+-----------+
| 2 | 'german' |
+----+-----------+
There is a dialect to configure HTML, for example:
from frictionless import Resource, resources, formats
control=formats.HtmlControl(selector='#id')
resource = resources.TableResource(path='table1.html', control=control)
print(resource.read_rows())
[]
Html control representation. Control class to set params for Html reader/writer.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, selector: str = table) -> None
Any valid css selector. Default selector is 'table'. For example: "table", "#id", ".meme" etc.
str