Frictionless Data, Frictionless Development

Andrew Stretton | Friday 10:30 | Room C

A common problem in Data Engineering is how to create a platform capable both of importing and exporting tabular data in numerous formats and of maintaining a change history of the data while users update and query it.

Tools like Trifacta (Google Cloud Dataprep [1]) provide a turnkey solution to part of the pipeline but the open source Frictionless Data [2] tools from OKFN can provide a simpler subset of these features tailored to your requirements.

Just as Pandas [3] is built around the Dataframe, the Frictionless Data approach uses data packages [4] consisting of a JSON table schema and a data URI. These schemata can be easily generated for any dataset and work well for a number of applications such as:

Validating new data with tools like Goodtables [5] or tableschema-py
Building a data update interface with tools such as Handontable JS [6]
Creating declarative data processing pipelines that a front end can easily interact with via datapackages pipelines [7] and kubernetes [8]
Pushing data into various databases and repository tools such as CKAN datastore [9]
Extending the schema to allow export to linked data formats such as IIIF

The talk will cover these use cases and compare with the approaches taken by other open-source data science / BI tools such as Datashape [10] with ODO [11] from Continuum and Superset [12] from AirBnB. I will aim to demonstrate that that lightweight web standards like datapackages speed up the development process.

References

https://cloud.google.com/dataprep/
http://frictionlessdata.io/tools/
http://pandas.pydata.org/
http://frictionlessdata.io/data-packages/
http://goodtables.okfnlabs.org/
https://github.com/handsontable/handsontable
https://github.com/frictionlessdata/datapackage-pipelines
https://kubernetes.io/
https://github.com/ckan/ckan/tree/master/ckanext/datastore
1. https://github.com/blaze/datashape
2. https://github.com/blaze/odo
3. https://github.com/apache/incubator-superset

Link to video | Link to slides