Frictionless Data, Frictionless Development
Andrew Stretton | Friday 10:30 | Room C
A common problem in Data Engineering is how to create a platform capable both of importing and exporting tabular data in numerous formats and of maintaining a change history of the data while users update and query it.
Tools like Trifacta (Google Cloud Dataprep ) provide a turnkey solution to part of the pipeline but the open source Frictionless Data  tools from OKFN can provide a simpler subset of these features tailored to your requirements.
Just as Pandas  is built around the Dataframe, the Frictionless Data approach uses data packages  consisting of a JSON table schema and a data URI. These schemata can be easily generated for any dataset and work well for a number of applications such as:
Validating new data with tools like Goodtables  or tableschema-py
Building a data update interface with tools such as Handontable JS 
Creating declarative data processing pipelines that a front end can easily interact with via datapackages pipelines  and kubernetes 
Pushing data into various databases and repository tools such as CKAN datastore 
Extending the schema to allow export to linked data formats such as IIIF
The talk will cover these use cases and compare with the approaches taken by other open-source data science / BI tools such as Datashape  with ODO  from Continuum and Superset  from AirBnB. I will aim to demonstrate that that lightweight web standards like datapackages speed up the development process.