Script Generation for Grid Searches¶
The thing with scientific discovery is that, sometimes, you need to do a lot of experiments before a reasonable conclusion. These experiments require minor variations in their configuration and submission, possibly to an SGE-enabled facility for processing.
This guide explains how to use the script jgen
, which helps you in
generating multiple experiment configurations for your grid searches. The
system supposes that a single experiment is defined in a single file while
multiple experiments can be run by somehow executing sequences of these
individual configuration files.
The script jgen
takes, in its simplistic form, 3 parameters that explain:
The “combinations” of variables that one needs to scan for a search in a YAML file
A Jinja2 template file that explains the setup of each experiment
An output template that explains how to mix the parameters in your YAML file with the template and generate a bunch of experiment configurations to run
Let’s decrypt each of these inputs.
YAML Input¶
The YAML input file describes all possible combinations of parameters you want to scan. All root keys that represent lists will be combined in all possible ways to produce, each combination, a “configuration set”.
A configuration set corresponds to settings for all variables in the input
template that needs replacing. For example, if your template mentions the
variables name
and version
, then each configuration set should yield
values for both name
and version
.
For example:
name: [john, lisa]
version: [v1, v2]
This should yield to the following configuration sets:
[
{'name': 'john', 'version': 'v1'},
{'name': 'john', 'version': 'v2'},
{'name': 'lisa', 'version': 'v1'},
{'name': 'lisa', 'version': 'v2'},
]
Each key in the input file should correspond to either an object or a YAML list. If the object is a list, then we’ll iterate over it for every possible combination of elements in the lists. If the element in question is not a list, then it is considered unique and repeated for each generated configuration set. Example
name: [john, lisa]
version: [v1, v2]
text: >
hello,
world!
Should yield to the following configuration sets:
[
{'name': 'john', 'version': 'v1', 'text': 'hello, world!'},
{'name': 'john', 'version': 'v2', 'text': 'hello, world!'},
{'name': 'lisa', 'version': 'v1', 'text': 'hello, world!'},
{'name': 'lisa', 'version': 'v2', 'text': 'hello, world!'},
]
Keys starting with one _ (underscore) are treated as “unique” objects as well. Example:
name: [john, lisa]
version: [v1, v2]
_unique: [i1, i2]
Should yield to the following configuration sets:
[
{'name': 'john', 'version': 'v1', '_unique': ['i1', 'i2']},
{'name': 'john', 'version': 'v2', '_unique': ['i1', 'i2']},
{'name': 'lisa', 'version': 'v1', '_unique': ['i1', 'i2']},
{'name': 'lisa', 'version': 'v2', '_unique': ['i1', 'i2']},
]
Jinja2 Template¶
This corresponds to a file that will have variables replaced for each of the configuration sets generated by your YAML file. For example, if your template is a python file that uses the variables this way:
#/usr/bin/env python
print('My name is {{ name }}')
print('This is {{ version }}')
Then, jgen
will generate 4 output files each with combinations of name
and version
as explained above.
Output filename template¶
This is the same as the Jinja2 template, in the sense it has the same build rules, but it is just a string, describing the path in which the extrapolated configurations, when combined with the template, will be saved. It may be something like this, considering our example above:
output-dir/{{ name }}-{{ version }}.py
With all those inputs, the jgen
command will look like this:
$ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py'
Generating Aggregations¶
When you generate as many files you need to run, it is sometimes practical to
also generate an “aggregation” script, that makes running all configurations
easy. For example, one could think of a bash script that runs all of the above
generated python scripts. We call those “aggregations”. When aggregating, you
iterate over a specific variable called cfgset
, which contains the
dictionaries for each configuration set extrapolation. For example, an
aggregation would look like this:
#/usr/bin/env bash
{% for k in cfgset %}
python output-dir/{{ k.name }}-{{ k.version }}.py
{% endfor %}
Which would then generate:
#/usr/bin/env bash
python output-dir/john-v1.py
python output-dir/john-v2.py
python output-dir/lisa-v1.py
python output-dir/lisa-v2.py
With this generated bash script, you could run all configuration sets from a single command line.
The final command line for jgen
, including the generation of specific
configuration files and the aggregation would look like the following:
$ jgen variables.yaml template.py 'output-dir/{{ name }}-{{ version }}.py' run.sh 'output-dir/run.sh'
Automatic injection of variables¶
Sometimes you want to use variables that are user specific in your jinja templates; For
example, a temp directory that can be different for other users. To allow this, jgen
automatically injects bob.extension.rc
(see Global Configuration System) into your
variables. Then, you can access bob.extension.rc
using something like:
rc.variable_name
to access variables from it in your jinja templates.