Creating regions
Very often, the principle behind our set of experiments is to execute the same basic process multiple times, and varying the input parameters each time we do it. For example, suppose an experiment Foo
has two integer input parameters, A and B. In a lab, we might want to run Foo
for all combinations of A and B between 1 and 10. Adding these experiments to the lab is straightforward, using two nested for
loops:
This works well for simple iterations including all combinations of all parameters between their respective bounds. Graphically, such combinations of n parameters form an n-dimensional "rectangle"; all the discrete n-dimensional points within its boundaries represents a valid parameter assignment.
The situation becomes more complicated if one wants to consider combinations of parameters that do not form a rectangle. To simplify this task, LabPal provides an object called a Region
.
Creating regions
A region is just that: a representation of a set of parameter assignments. Defining a region is done by instantiating an empty Region
object:
Each distinct parameter in a region is called a dimension. For each dimension, the set of possible values for that dimension must be specified. One possible way is through method addRange
:
This adds two dimensions to the region, A and B, each ranging between 1 and 10 by increments of 1. This can be represented by a two-dimensional graph such as this one:
Alternately, one can create dimensions using the add
method, which takes a dimension name, followed by any number of values. Values for a dimension can be numbers, character strings, or any object of type JsonElement
(including lists and maps).
Iterating on regions
Iterating over all points of a region can be done with method all()
, which enumerates them. Each point is another Region
object, this time with a single value for each dimension. This value can be obtained with a method called get()
; casts to type XXX can be obtained with getXXX
. For example, to create the same set of experiments as in our very first code sample, we can write:
By default, method all
enumerates points in lexicographical order. This order is defined by the order in which the dimensions have been added to the region, and the order in which values have been added to each dimension. In our example, A is the first dimension to be added, and its values are 1 to 10 in increasing order. The region iterator will start by setting A to 1, and then enumerate all regions where B=1, B=2, etc. Then it will set A to 2, and enumerate all regions where B=1, B=2, and so on.
The order in which dimensions are used can be changed by specifying them as arguments in all
. For example, writing region.all("B", "A")
will enumerate regions by first fixing B=1 and enumerating the As, then B=2, etc.
The arguments to all
do not have to list all the dimensions. When fewer dimensions are specified, the resulting enumeration will consist of regions where some dimensions have a single value, and others are still ranges of values. For example, region.all("A")
will first enumerate the region where A=1 and B is the range 1-10, then the region where A=2 and B is the range 1-10, and so on. Since the elements of this enumeration are themselves regions, one can still iterate over them using all
to get regions of even smaller size.
Irregular regions
So far, regions do not bring such a big advantage over our nested for
loops (actually one: they are objects instead of instructions). Things get different when considering regions of irregular shapes.
For example, suppose you'd like to keep in the region only elements where A is smaller than B. Graphically, this means that the region is no longer a rectangle, but rather a triangle like this:
To create such a region, we can "filter" points from an existing region using method where
. This method expects a Condition
: an object with a single method, in
, that should return true
if a given point belongs to the target region. The triangle above can be used by cutting out a portion of the original rectangle as follows:
Note how an anonymous Condition
object is passed to the where
method, whose in
method checks that A < B.
These conditions can be chained, and progressively cut out pieces of an original region in increasingly irregular shapes. For example, to obtain the region corresponding to the following picture:
one can write:
(Of course here, the two conditions could have been put into a single Condition
object combined with &&
, but the point here is that using where
, you can cut any given region without knowing how it was made.)
Method where
can accept multiple conditions; in such a case, it creates the region made of points that satisfy all of them. Therefore, an alternate syntax for the above is:
The dual of where
is or
. This is a method that accepts multiple conditions, and will create the region made of points that satisfy either condition. The following region:
can be created by the union of two subregions as follows:
Regions are objects
This may seem obvious, but the fact that regions are objects, rather than instructions, has interesting side effects. Let us illustrate this with an example.
Suppose that you created a method addToTable
that is expected to create a table out of a set of experiments. More precisely, it loops over all values of parameter A, and creates one table each for all experiments with the same value of A. If the combinations of parameters forms a square region, this can easily be achieved by providing ranges for A and B to the method:
But what if the region is not a square, but is rather commposed only of the points where A < 2B? This could be worked around:
But now the shape of the region becomes hard-coded into the method; for a different shape, you would need a different version of addToTable
. Notice also that we have to check if we added any experiments to a table, as there are values of A for which no B fulfills the condition.
Another workaround would be to create the list of experiments outside of addToTable
, and to give this method only the collection of experiments:
But now we run into another problem. We have to iterate first on all values of A, and then find all experiments in the collection that have this value, before adding them to the corresponding table. A reverse solution would be to create tables for all values of A first, iterate through the collection and add the experiment to the right table --but then again, this is starting to look like a hack. Our method, instead of getting simpler, is actually becoming more and more complicated.
But since regions are objects, this means they can be passed directly as an argument to a method, which simply iterates over points of the region without caring about how they are computed. A much more reusable version of addToTable
would hence be:
This method is much simpler, its meaning is easy to grasp, and it can work with whatever set of points one can build.
Filtering experiments
Among the various uses of regions, one is to filter sets of experiments in a lab. The Laboratory
class provides a method filter
that takes as input an arbitrary region, and returns the set of experiments in the lab that lie within that region. For example, to get all experiments where A is between 1 and 5:
Note that if an experiment has other parameters than A, their value is ignored. For an experiment to lie within a region, it must be such that if it has a parameter that is one of the regions' dimensions, its value must be one of those specified in that region's dimension.
Coupled with the possiblity of shaping complex regions, this makes it possible to filter experiments in a lab in a very flexible way.
Counting points
An interesting perk of using regions is that counting points inside is easy, through method size()
. This can also be used to count points that satisfy a condition, without having to iterate over them; it suffices to insert a where
call before size
:
Last updated