Software for Liquid Argon time projection chambers
These instructions drive you to rerun a job when the input file was lost, but the output file is still available.
Let’s define:
myjob.fcl
: the file with the original FHiCL configuration for the jobMyJob
: the name of the process defined in there (found in a line like process_name: MyJob
)myjob_output_SOMETIMESTAMP.root
: the old output file produced by that jobRerunning the job is quite simple, actually, but with one caveat: the process name must be different. In fact, art will not allow two jobs with the same process name to process the same file.
Therefore, a myjob_again.fcl
file with content:
#include "myjob.fcl"
process_name: MyJobAgain
is all what’s needed1:
lar -c myjob_again.fcl -s myjob_output_SOMETIMESTAMP.root
will produce an output file myjob_output_SOMEOTHERTIMESTAMP.root
will all the content of myjob_output_SOMETIMESTAMP.root
and also the new one. Even better, the following jobs which specify input tags as “moduleName:instanceName” will still work: there may be two data products matching “moduleName:instanceName”, one produced by MyJob
and another by MyJobAgain
, but unless the process name is explicitly set (like in “MyJob:moduleName:instanceName”… but who does that?) art will silently select the most recent.
In general, additional configuration can be specified after those lines to change the details of the job, which may be quite useful. Note that no
PROLOG
can be opened after the inclusion ofmyjob.fcl
, and for the same reason additional inclusions are typically valid only before that inclusion.
The configuration above will produce an output file with both MyJob
and MyJobAgain
output data products. While this does not any harm in common cases2, they have the potential to create confusion, they take disk space and they have no purpose. In short: we should get rid of them:
#include "myjob.fcl"
process_name: MyJobAgain
source.inputCommands: [ "keep *", "drop *_*_*_MyJob" ]
The file still remembers that MyJob
was run and which data products it produced, but at least jobs will not be able to use that old data.
myjob.fcl
does not ask for timestamp to be added to output file nameIf the output file name does not include timestamps (e.g. myjob_output.root
), rerunning the job will attempt to overwrite the input file. We definitely don’t want that.
A one-line fix is to explicitly specify the output file name on command line:
lar -c myjob_again.fcl -s myjob_output.root -o myjob_output_again.root
A more systematic approach is to have the new file name in the new FHiCL configuration:
#include "myjob.fcl"
process_name: MyJobAgain
output.out1.out.fileName: "%ifb_again.root"
(%ifb
is replaced with the base name of the input file, myjob_output
; another popular option is %ifb_%p-%tc.root
, adding process name and timestamp at file closure).
Note that instead of out1
you will have to put the name of the RootOutput
instance in myjob.fcl
(find it with fhicl-dump myjob.fcl | less
).
If downstream configuration explicitly specifies the process name (MyJob
), that is going to cause trouble (typically, the old data is used instead of the new one).
It is quite uncommon for configuration to specify the process name in a input tag (as in the “MyJob:moduleName:instanceName” above).
More common is to have filters to the input or output. If a RootInput
module specifies inputCommands: [ "drop *", "keep *_*_*_MyJob" ]
, it will remove anything that is not from MyJob
from input, including MyJobAgain
data (for example, if the MyJob
data products were not dropped as recommended above, the job will silently process the old data). Whatever may happen then, it’s not what we want. Also RootOutput
mirror configuration inputCommands: [ "drop *", "keep *_*_*_MyJob" ]
will not give the desired result.
In these cases, those configuration lines need to be replaced, like in:
#include "downstreamjob.fcl"
source.inputCommands: [ "drop *", "keep *_*_*_MyJobAgain" ]
For questions or comments, contact Gianluca Petrillo .