‹Programming› 2018
Mon 9 - Thu 12 April 2018 Nice, France
Wed 11 Apr 2018 11:30 - 12:00 at Baie des Anges A + B - Session 1

Scientific progress increasingly depends on data management, particularly to clean and curate data so that it can be systematically analyzed and reused. A wealth of techniques for managing and curating data (and its provenance) have been proposed, largely in the database community. In particular, a number of influential papers have proposed collecting provenance information explaining where a piece of data was copied from, or what other records were used to derive it. Most of these techniques, however, exist only as research prototypes and are not available in mainstream database systems. This means scientists must either implement such techniques themselves or (all too often) go without.

This is essentially a code reuse problem: provenance techniques currently cannot be implemented reusably, only as ad hoc, usually unmaintained extensions to standard databases. An alternative, relatively unexplored approach is to support such techniques at a higher abstraction level, using metaprogramming or reflection techniques. Can advanced programming techniques make it easier to transfer provenance research results into practice?

We build on a recent approach called language-integrated provenance, which extends language-integrated query techniques with source-to-source query translations that record provenance. In previous work, a proof of concept was developed in a research programming language called Links, which supports sophisticated Web and database programming. In this paper, we show how to adapt this approach to work in Haskell building on top of the Database-Supported Haskell (DSH) library.

Even though it seemed clear in principle that Haskell’s rich programming features ought to be sufficient, implementing language-integrated provenance in Haskell required overcoming a number of technical challenges due to interactions between these capabilities. Our implementation serves as a proof of concept showing how this combination of metaprogramming features can, for the first time, make data provenance facilities available to programmers as a library in a widely-used, general-purpose language.

In our work we were successful in implementing forms of provenance known as where-provenance and lineage. We have tested our implementation using a simple database and query set and established that the resulting queries are constructed and executed correctly on the database. Our implementation is publicly available on GitHub.

Our work makes provenance tracking available to users of DSH at little cost. Although Haskell is not widely used for scientific database development, our work also suggests how other languages or libraries might be extended to support provenance. Our work also highlights how combining Haskell’s advanced type programming features can lead to unexpected complications, which may motivate further research.

Wed 11 Apr

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:00
10:30
30m
Talk
Scoped Extension Methods in Dynamically-Typed Languages
Research Papers
Guillermo Polito CNRS, Camille Teruel INRIA, Stéphane Ducasse INRIA Lille, Luc Fabresse Mines Douai
Link to publication DOI
11:00
30m
Talk
Towards Zero-Overhead Disambiguation of Deep Priority Conflicts
Research Papers
Luis Eduardo de Souza Amorim Delft University of Technology, Netherlands, Michael J. Steindorfer Delft University of Technology, Eelco Visser Delft University of Technology
Link to publication DOI
11:30
30m
Talk
Language-integrated provenance in Haskell
Research Papers
Jan Stolarek University of Edinburgh, UK, James Cheney University of Edinburgh, UK
Link to publication DOI