Language-integrated provenance in Haskell (‹Programming› 2018 - Research Papers)

Who

Jan Stolarek, James Cheney

Track

‹Programming› 2018 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 11 Apr 2018 11:30 - 12:00 at Baie des Anges A + B - Session 1

Abstract

Scientific progress increasingly depends on data management, particularly to clean and curate data so that it can be systematically analyzed and reused. A wealth of techniques for managing and curating data (and its provenance) have been proposed, largely in the database community. In particular, a number of influential papers have proposed collecting provenance information explaining where a piece of data was copied from, or what other records were used to derive it. Most of these techniques, however, exist only as research prototypes and are not available in mainstream database systems. This means scientists must either implement such techniques themselves or (all too often) go without.

This is essentially a code reuse problem: provenance techniques currently cannot be implemented reusably, only as ad hoc, usually unmaintained extensions to standard databases. An alternative, relatively unexplored approach is to support such techniques at a higher abstraction level, using metaprogramming or reflection techniques. Can advanced programming techniques make it easier to transfer provenance research results into practice?

We build on a recent approach called language-integrated provenance, which extends language-integrated query techniques with source-to-source query translations that record provenance. In previous work, a proof of concept was developed in a research programming language called Links, which supports sophisticated Web and database programming. In this paper, we show how to adapt this approach to work in Haskell building on top of the Database-Supported Haskell (DSH) library.

Even though it seemed clear in principle that Haskell’s rich programming features ought to be sufficient, implementing language-integrated provenance in Haskell required overcoming a number of technical challenges due to interactions between these capabilities. Our implementation serves as a proof of concept showing how this combination of metaprogramming features can, for the first time, make data provenance facilities available to programmers as a library in a widely-used, general-purpose language.

In our work we were successful in implementing forms of provenance known as where-provenance and lineage. We have tested our implementation using a simple database and query set and established that the resulting queries are constructed and executed correctly on the database. Our implementation is publicly available on GitHub.

Our work makes provenance tracking available to users of DSH at little cost. Although Haskell is not widely used for scientific database development, our work also suggests how other languages or libraries might be extended to support provenance. Our work also highlights how combining Haskell’s advanced type programming features can lead to unexpected complications, which may motivate further research.

Link to Publication

http://programming-journal.org/2018/2/11

DOI

https://doi.org/10.22152/programming-journal.org/2018/2/11

Jan Stolarek

University of Edinburgh, UK

Poland

James Cheney