LArSoft

Logo

Software for Liquid Argon time projection chambers

View My GitHub Profile

Rerun part or all a job on an output file of that job

Say you have run the configuration myJob.fcl, which has produced the output file myOutput.root.
If you decide you want to try and run myJob.fcl again, but for any reason you don’t want to rerun on the original input file (because it’s lost, or because you don’t want to actually rerun all the modules in myJob.fcl but only part of them), you will need to start with myOutput.root as new input file.

The new configuration can be almost a copy of myJob.fcl, but you have to tell art to completely ignore the data products you know you are going to regenerate.
Two options:

In the following we assume that the output file you are going to recycle contains all the necessary input data products. If not, you are obviously out of luck.

Note: these procedures do not guarantee that the random number sequences will be the same as in the original job. To achieve that, additional effort might be needed.

Regenerate all the data products from a job

You first need to know which is the process name of the previous job.
It should be possible to learn it from myJob.fcl, which should have an entry process_name (referred below as <ProcessName>).

Create a new FHiCL file as follows:

#include "myJob.fcl"

# change process name
process_name: <ProcessName>Again

source.inputCommands: [
  "keep *",
  "drop *_*_*_<ProcessName>"
]

This will drop from the input all data products produced with a process name <ProcessName>.

Note: even so, art will remember that <ProcessName> happened already and it will forbid the same process name to be used again. For this reason, process_name must be specified.

You will also want to make sure that:

In alternative, you can make the FHiCL file as you prefer, as long as the source.inputCommands configuration is present.

Regenerate the data products from selected modules in a job

This procedure is in principle similar to the previous one where you regenerate the whole job, but it takes a bit longer to prepare.
So, read the previous one first.
We assume we want to rerun the modules with labels module1, module2, up to moduleN

The key ingredient is an inputCommands line like:

source.inputCommands: [
  "keep *",
  "drop *_module1_*_<ProcessName>",
  "drop *_module2_*_<ProcessName>",
  # ...
  "drop *_moduleN_*_<ProcessName>"
]

This will drop from the input all data products produced by modules with labels module1, module2, etc., with a process name <ProcessName>.
The complication is that you have to also:

The following excerpt should inspire you the solution, but the actual details may vary a lot:

#include "myJob.fcl"

# change process name
process_name: MyJobAgain

# drop the data products of the modules to be rerun
source.inputCommands: [
  "keep *",
  "drop *_module1_*_<ProcessName>",
  "drop *_module2_*_<ProcessName>",
  # ...
  "drop *_moduleN_*_<ProcessName>"
]

# change which modules are being run
physics.trigger_paths: [ reco ]
physics.reco: [ "module1", "module2", ... , "moduleN" ]

The art branch name

The specification of art branches used in inputCommands of RootInput (and also in outputCommands from RootOutput are in the form:

<ClassName>_<ModuleLabel>_<InstanceLabel>_<ProcessName>

where:

If you forget where is what, a peek to a ROOT file will help:

root -lb myOutput.root
Events->GetListOfBranches()->ls()

shows all data product branches, while

Events->GetListOfBranches()->ls("*_<ProcessName>.")

will show only the data products produced by <ProcessName> (note the dot . at the end of the string).


These instructions haven’t been tested yet.
For comments, write to Gianluca Petrillo .