Verifying claims on experiments

The ultimate goal of running experiments is to draw conclusions out of the results they produce. For example, if you compare sorting algorithms on arrays of various sizes, one conclusion could be that "Gnome Sort is the slowest of all methods". Such an assertion on the contents of a lab is called a claim.

But how do you check that such a claim holds? In the case of sorting algorithms, there could be multiple ways. One could go through all arrays sorted by the algorithms, and check that, for each size, Gnome Sort has the longest sorting time. Alternately, one could try to fit a regression model on the data for each individual algorithm, and compare the parameters of these models. A cruder, but perhaps equally efficient technique, could be to simply sum up the running times of each algorithm and compare them.

Two issues can arise with such a claim. First, if you don't describe precisely how you evaluate the assertion, somebody else could use a different technique and disagree with your conclusion. Second, it doesn't take long before a manual check becomes impractical. What if a claim involves computations over dozens of cells spread across many tables? What if you make many such claims over your lab?

To this end, LabPal provides facilities to:

  • write your own claims over the contents of a lab

  • evaluate these claims automatically

  • optionally, if a claim does not hold, point to elements of the lab (table cells, experiments) that cause it to be false

The Claim object

Claims in LabPal are represented by objects that inherit from a class, prosaically called Claim. A Claim object has one method, holds, which can return one of three values:

  • OK: indicates that the claim holds on the current contents of the lab

  • WARNING: signals that the claim may or may not hold, and that some elements of the lab require attention (for example, missing data points)

  • FAIL: indicates that the lab's current state does not support the claim

To create your own claim, simply create a new descendent of Claim, and implement its holds method to do whatever you want. Here is an example:

class GnomeClaim extends Claim {
    public Result holds(Laboratory lab) {
        int sum_quick = 0, sum_gnome = 0;
        // Compute sum over each algorithm
        for (Experiment e : lab.filter(new Region().add("ALGORITHM", "Quick Sort")))
          sum_quick += e.readInt("TIME");
        for (Experiment e : lab.filter(new Region().add("ALGORITHM", "Gnome Sort")))
          sum_gnome  += e.readInt("TIME");
        if (sum_quick < sum_gnome)
          return Result.OK;
        return Result.FAIL;
    }
}

Let's see what this class does. First, it iterates over all Quick Sort experiments, and sums the running time of each; it then does the same thing for Gnome Sort experiments. If the cumulative time of Quick Sort experiments is smaller than that of Gnome Sort, it returns OK; otherwise, it returns FAIL. This corresponds to one of the ways we mentioned of checking the assertion that "Gnome Sort is the slowest algorithm".

This is pretty standard code that one could have written anywhere, even without using LabPal. However as with many things in LabPal, there are advantages to encapsulating this simple bit of code into a Claim object.

Viewing claims

First, multiple claims can be collected into a lab, and the state of each of these claims can be viewed through the web interface. A lab's method add can accept any number of claims:

my_lab.add(new GnomeClaim().setName("Gnome Sort is slowest"));

Note the use of another method of Claim, setName, which is used to associate a short descriptive name to a claim.

When starting the lab, the list of all claims that have been added to it appears at the bottom of the Status page of the web interface. Clicking on any of these claims leads the user to a page specific to that claim. There, the user can see more information about that claim: its current state, and an optional longer description, specified by method setDescription. Additionally, if a claim does not hold, elements of the lab responsible for the violation can be shown in a list (more on that later).

In the case of our lab for sorting algorithms, let us run the experiments, and come back to the Status page. Claims are not evaluated automatically; therefore, at the start of a lab, all claims are in the special UNKNOWN state, represented by a white icon. This process can be triggered at any moment by the user through the Re-evaluate button at the top of the list. This causes the holds method of every claim to be called, and the state of each claim to be updated based on the lab's current contents. In the present case, the state of our claim should switch from UNKNOWN to OK, as Gnome Sort is indeed slower than Quick Sort (if that is not the case, there is a paper to write about this).

We can see how our original, informal assertion ("Gnome Sort is slowest") has been automatically (and programmatically) verified on the lab's data, reducing the need for manual computations. Using claim objects, the question is no longer to check whether the lab's data supports an assertion; rather, it has shifted into checking that the claim object's computation actually does what it pretends. In our example, this amounts to examining less than ten lines of code, instead of sifting through cells dispersed across data tables. If one accepts the claim's code, one should normally accept the conclusion it comes to.

Explaining violations

Just for fun, let us look at what happens when a claim is false. Let us reverse the previous claim, and this time assert that Quick Sort is slower than Gnome Sort. We can simply reverse the < comparison in the previous code, and change the textual description accordingly. Re-running the lab, we should now see that re-evaluating the claim leads to the FAIL result, represented by a red icon in the Status page.

Arguably, our claim is a bit coarse: it merely compares the total computing time of both algorithms on all arrays. We could refine our claim by comparing each algorithm array by array. Our holds method could look like this:

public Result holds(Laboratory lab) {
    Region reg = new Region()
        .addRange("SIZE", 1000, 10000, 1000);
    for (Region r : reg.all("SIZE")) {
        Collection<Experiment> exps = lab.filter(r);
        Experiment e_gnome = Laboratory.filterAny(exps,
            new Region().add("ALGORITHM", "Gnome Sort"));
        Experiment e_quick = Laboratory.filterAny(exps,
            new Region().add("ALGORITHM", "Quick Sort"));
        if (e_gnome.readInt("TIME") > e_quick.readInt("TIME"))
          return Result.FAIL;
    }
    return Result.OK;
}

When evaluated, this claim would declare FAIL if any of the arrays is sorted faster by Quick Sort than by Gnome Sort (note that we are still working with the "inverted" claim, which is expected to be false).

However, if our lab contains results for sorting 1,000 different arrays, merely receiving FAIL is not very helpful. There is a problem, sure --but where? Going through all the results to find the cause of the violation is out of the question. Rather, it would be nice if the Claim could not only raise an alarm, but also give us information about the causes of a violation.

Fortunately, this is possible. When a claim does not hold, it can be given one or more explanations for the fact that it is false. This is done by creating and adding an Explanation object to the claim. An explanation is just a container: you can give it a textual description, and store into it any number of objects that you deem relevant for explaining the violation. Typically, objects will be experiments, tables, or, for a finer-grained access to data elements, ProvenanceNodes from the Petit Poucet library (see the chapter Data Provenance and Traceability for more information).

Let us improve our previous claim by adding explanations for found violations. The modified code would look like this:

public Result holds(Laboratory lab) {
    Result res = Result.OK;
    Region reg = new Region()
        .addRange("SIZE", 1000, 10000, 1000);
    for (Region r : reg.all("SIZE")) {
        Collection<Experiment> exps = lab.filter(r);
        Experiment e_gnome = Laboratory.filterAny(exps,
            new Region().add("ALGORITHM", "Gnome Sort"));
        Experiment e_quick = Laboratory.filterAny(exps,
            new Region().add("ALGORITHM", "Quick Sort"));
        if (e_gnome.readInt("TIME") < e_quick.readInt("TIME")) {
            Explanation e = new Explanation("Gnome Sort is faster than Quick Sort");
            e.add(e_gnome).add(e_quick);
            add(e);
            res = Result.FAIL;
        }
    }
    return res;
}

We made two modifications to the previous version:

  1. When a violation is found, an Explanation is created. The objects associated to the explanation are the two experiments in cause (Gnome Sort and Quick Sort for the same array size).

  2. The claim no longer stops after finding the first violation. If there are multiple violations of the claim, they will all be reported.

Claims as integrity checks

TODO

Last updated