Software for Liquid Argon time projection chambers
Say you have run the configuration myJob.fcl
, which has produced the output file myOutput.root
.
If you decide you want to try and run myJob.fcl
again, but for any reason you don’t want to rerun on the original input file (because it’s lost, or because you don’t want to actually rerun all the modules in myJob.fcl
but only part of them), you will need to start with myOutput.root
as new input file.
The new configuration can be almost a copy of myJob.fcl
, but you have to tell art to completely ignore the data products you know you are going to regenerate.
Two options:
myJob.fcl
myJob.fcl
In the following we assume that the output file you are going to recycle contains all the necessary input data products. If not, you are obviously out of luck.
Note: these procedures do not guarantee that the random number sequences will be the same as in the original job. To achieve that, additional effort might be needed.
You first need to know which is the process name of the previous job.
It should be possible to learn it from myJob.fcl
, which should have an entry process_name
(referred below as <ProcessName>
).
Create a new FHiCL file as follows:
#include "myJob.fcl"
# change process name
process_name: <ProcessName>Again
source.inputCommands: [
"keep *",
"drop *_*_*_<ProcessName>"
]
This will drop from the input all data products produced with a process name <ProcessName>
.
Note: even so, art will remember that
<ProcessName>
happened already and it will forbid the same process name to be used again. For this reason,process_name
must be specified.
You will also want to make sure that:
--output
or specify the output file name in the new FHiCL file)FHICL_FILE_PATH
(otherwise the #include
in the previous configuration will failIn alternative, you can make the FHiCL file as you prefer, as long as the source.inputCommands
configuration is present.
This procedure is in principle similar to the previous one where you regenerate the whole job, but it takes a bit longer to prepare.
So, read the previous one first.
We assume we want to rerun the modules with labels module1
, module2
, up to moduleN
The key ingredient is an inputCommands
line like:
source.inputCommands: [
"keep *",
"drop *_module1_*_<ProcessName>",
"drop *_module2_*_<ProcessName>",
# ...
"drop *_moduleN_*_<ProcessName>"
]
This will drop from the input all data products produced by modules with labels module1
, module2
, etc., with a process name <ProcessName>
.
The complication is that you have to also:
The following excerpt should inspire you the solution, but the actual details may vary a lot:
#include "myJob.fcl"
# change process name
process_name: MyJobAgain
# drop the data products of the modules to be rerun
source.inputCommands: [
"keep *",
"drop *_module1_*_<ProcessName>",
"drop *_module2_*_<ProcessName>",
# ...
"drop *_moduleN_*_<ProcessName>"
]
# change which modules are being run
physics.trigger_paths: [ reco ]
physics.reco: [ "module1", "module2", ... , "moduleN" ]
The specification of art branches used in inputCommands
of RootInput
(and also in outputCommands
from RootOutput
are in the form:
<ClassName>_<ModuleLabel>_<InstanceLabel>_<ProcessName>
where:
<ClassName>
is related to the fully qualified C class of the stored data product; if it’s a collection (that happens almost always in LArSoft), a s
is appended.std::vector<recob::Track>
will show a <ClassName>
like recob::Tracks
; templates are specials, and the template types are put before the templated class name: for example, an association set without metadata like art::Assns<simb::MCFlux,simb::MCTruth>
becomes simb::MCFluxsimb::MCTruthvoidart::Assns
(remember that in that case the metadata is of type void
and the actual class is art::Assns<simb::MCFlux,simb::MCTruth, void>
)<ModuleLabel>
is the label used to configure the module producing the product; for example, in a configuration like gaushit: { moduleType: GausHitFinder ... }
the module label is gaushit
<InstanceLabel>
is the instance label of the data product; it is decided by the producer module, it is often hard coded and in most of the cases it has the default value of an empty label; in that case, the branch will show two underscore characters _
one after the other<ProcessName>
is the name of the job (or process) which created the data product; it is set in the configuration as the process_name
parameter.If you forget where is what, a peek to a ROOT file will help:
root -lb myOutput.root
Events->GetListOfBranches()->ls()
shows all data product branches, while
Events->GetListOfBranches()->ls("*_<ProcessName>.")
will show only the data products produced by <ProcessName>
(note the dot .
at the end of the string).
These instructions haven’t been tested yet.
For comments, write to Gianluca Petrillo .