Proof-on-demand

Quick-start

This section briefly describes how to setup a proof-on-demand system. Follow these steps:
  • Setup a root version or skip this step and let proof-on-demand setup a root version for you
  • Setup proof-on-demand via CVMFS
  • export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/
    source /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/user/atlasLocalSetup.shlocalSetupPoD
    # in a script use# source ${ATLAS_LOCAL_ROOT_BASE}/packageSetups/atlasLocalPoDSetup.sh 
  • Start a PoD server
  • pod-server start
    #or pod-server restart 
  • Submit 16 worker jobs on condor
  • pod-submit -r condor -n 16
  • Now you are ready, please don't forget to kill your jobs if you don't need them anymore and stop the server. The worker jobs are terminated automatically, if they have been idle for 30min.
  • condor_rm *_jobnumber_*
    pod-server stop
More information can be found here http://pod.gsi.de/documentation.html, some advanced topics and settings are discussed here [[]].
Advanced topics

Here you find a loose collection of advanced topics are you encounter in your day-to-day use of PoD.
Log-files and clean-up

In your home directory ~/.PoD/ you will find some working files of PoD. The log files are located in ~/.PoD/log and can fill-up quite quickly (3 files for each worker job each time you submit workers), so a regular clean up should be done regularly.

Proof

Here some notes and some working examples are presented for a quick-start with proof. There are more information about proof here: http://root.cern.ch/drupal/content/using-proof
Quick-start with TChain.Draw

You need to setup a proof session in your root script/interactive root. TChain objects know about this (global) proof session and can use them automatically without changes to the code. Functions that are "proof-aware" are TChain.Draw and TChain.Process.

To setup a proof session with PoD you just use:
TProof* proof=TProof::Open("pod://")

For local testing you can also use "lite://" as the argument. This will create local worker processes on the machine with the same number of workers as the number of CPUs.

Open a TChain and make it aware of Proof and draw a histogram:
TChain * t = new TChain("tree")
t.Add("myfiles*")
t.SetProof(kTRUE)
t.Draw("myvariable * myothervariable","somecuts > somevalue") 

The root file(s) are being processed in parallel on all worker nodes and you will see a GUI popping up (without X, you will see some text status). Unfortunately, it is not so easy to find out, if there has been an error. You can mainly do it on the GUI, if e.g. not all events were processed and you can look at the log files.
Using TSelector

The TSelector is a class that defines certain processing steps before and after the loop, and what will happen in a loop (copy from the PROOF manual):
+++ CLIENT Session +++       +++ (n) WORKERS +++
Begin()
                             SlaveBegin()
                             Init()
                                 Notify()
                                 Process()
                                 ...
                                 Process()
                                 ...
                             Init()
                                 Notify()
                                 Process()
                                 ...
                                 Process()
                                 ...
                             SlaveTerminate()
Terminate()

The simplest way to obtain a TSelector is to generate it from a TChain:
t.MakeSelector("mySelector")

This will, similar to MakeClass, produce a skeleton where the branches of the tree are already linked to member objects. You can simply fill the TSelector::Process function like this. Note that the entry in the tree has to be obtained with GetEntry explicitly. Also the entry is refering to the local entry (i.e. in the TChain it is the entry in the current file not in the whole chain).
Bool_t mySelector::Process(Long64_t entry){   
 fChain->GetTree()->GetEntry(entry);   
 hist->Fill(mc_n);
 return kTRUE;
}

Since in each worker job a new instance of the TSelector is created, the histograms, output objects, etc. need to be created for each job in TSelector::SlaveBegin (hist is defined in the .h file as TH1F hist;*):
void mySelector::SlaveBegin(TTree * /*tree*/){
 TString option = GetOption();   
 hist = new TH1F("hist","hist",100,0,100);
}

To collect all the outputs, the member TSelector::fOutput is used:
void mySelector::SlaveTerminate(){
 fOutput->Add(hist);
}

When all the jobs are done, something should be done with the output, you can retrieve it, draw it, save it etc. This can be done in TSelector::Terminate:
void mySelector::Terminate(){
 hist=(TH1F*)fOutput->FindObject("hist");   
 hist->Draw();
}

Advanced topics

Here you find a loose collection of advanced topics are you encounter in your day-to-day use of Proof.
Using datasets

Datasets are objects representing a dataset. This consists of a number of root files and a set of meta-data. For all the root files in the dataset, validation and checks can be performed and they can be processed with Proof using TSelectors:
TFileCollection * filecollection= new TFileCollection("myDataSet")  
filecollection.SetDefaultTreeName("myTree")  
proof.RegisterDataSet("myDataSet#myTree",filecollection)
proof.VerifyDataSet("myDataSet")
proof.Process("myDataSet") 

Additional complications arise, if you want to run over several datasets at the same time (i.e. let proof handle the processing of many datasets, rather than a sequential call of TProof::Process over many datasets).
Using Python

The python version of TSelector cannot properly handle callbacks to python functions, so that you cannot simply use TSelector in python. There is the class TPySelector which overcomes partially this problem.
Example python class using datasets


from ROOT import TPySelector,TH1F,kTRUE,TDSetElement,TDSet,TH1,TLorentzVector,TObjString,TFile,TString
 
class validationSelectorTest(TPySelector):
 
  # "Begin" locally, set here some parameters, so that one don't have to modify the constructor
  # for the moment basename used for the output filename and writeall controls if the histograms should also be written into one file
  def Begin(self):
    self.basefilename="test"
    self.writeall=True
    self.info("base filename",self.basefilename)
    self.info("write all",self.writeall)
    self.info("Begin")
    pass
 
  # "SlaveBegin" does prepare the histogram and inputname bookkeping and creates the global histograms only
  # local historams (i.e. local to one dataset), are initialized in the Process loop
  # There is a check for double initialisation, should not happen
  def SlaveBegin(self,tree):
    if getattr(self,"n",None) != None:
      self.error("init twice")
      raise
    self.n=0
    self.histograms={}
    self.inputNames=[]
    self.InputName=""
    self.createHistograms(globalHist=True)
    self.info("SlaveBegin")
 
  # "Init" catches the tree
  def Init(self,tree):
    self.fChain=tree
    self.info("Init")
 
  # Add all the histograms to the output
  def SlaveTerminate(self):
    for h in self.histograms:
      self.info("add to output",h,self.histograms[h])
      self.fOutput.Add(self.histograms[h])
    self.info("SlaveTerminate")

  # Get output histograms and datasetnames
  # prepare outputfiles for each dataset and sort histograms for each dataset
  # also write out all histograms into one file
  def Terminate(self):
    hists=[]
    strings=[]
    for o in self.fOutput:
      self.info("Get output object",o)
      if isinstance(o,TH1):
        hists.append(o)
      if isinstance(o,TObjString):
        strings.append(str(o.GetString().Data()))
    histbygroup={}
    histbygroup[""]=[]
    for s in strings:
      histbygroup[s]=[]
    for h in hists:
      name=h.GetName()
      added=False
      for s in strings:
        if name[:len(s)]==s:
          histbygroup[s].append(h)
          added=True
          break
      if not added:
        histbygroup[""].append(h)
    self.info("histograms by group",histbygroup)
    self.info("all histograms",hists)
 
    if self.writeall:
      filename=self.basefilename+"_all.root"
      self.info("write all histograms",hists,"in",filename)
      f=TFile(filename,"RECREATE")
      for h in hists:
        h.Write()
        self.info("write",filename,h)
      f.Close()
 
    for hg in histbygroup:
      filename=self.basefilename+hg+".root"
      self.info("write histograms of group",hg,histbygroup[hg],"in",filename)
      f=TFile(filename,"RECREATE")
      for h in histbygroup[hg]:
        h.Write()
        self.info("write",filename,hg,h)
      f.Close()
 
    #self.fOutput.FindObject("hist"+self.InputName).Draw()
 
  def info(self,*msg):
    print self.GetName(),"of",self.ClassName(),
    print time.time(),
    for m in msg:
      print m,
    print

  def warning(self,*msg):
    print self.GetName(),"of",self.ClassName(),
    print time.time(),"WARNING",
    for m in msg:
      print m,
    print
 
  def error(self,*msg):
    print self.GetName(),"of",self.ClassName(),
    print time.time(),"ERROR",
    for m in msg:
      print m,
    print
 
  # create a single histogram, add dataset name to the start of the name
  def createHistogram(self,histtype,*args):
    if self.InputName+args[0] in self.histograms:
      self.warning("histogram",self.InputName+args[0],"already exists")
    h=histtype(self.InputName+args[0],self.InputName+args[1],*args[2:])
    h.SetDirectory(0)
    self.histograms[self.InputName+args[0]]=h
 
  # create global histograms without adding the dataset name
  def createHistogramGlobal(self,histtype,*args):
    if args[0] in self.histograms:
      self.warning("global histogram",args[0],"already exists")
    h=histtype(args[0],args[1],*args[2:])
    h.SetDirectory(0)
    self.histograms[args[0]]=h
 
  # get histograms, check local, then global
  def getHistogram(self,name):
    if self.InputName+name in self.histograms:
      return self.histograms[self.InputName+name]
    elif  name in self.histograms:
      return self.histograms[name]
 
  # get local histograms, for speed issues
  def getHistogramLocal(self,name):
    return self.histograms[self.InputName+name]
 
  # get global histgorams, for speed issues
  def getHistogramGlobal(self,name):
    return self.histograms[name]
 
  # list all the histograms here, that should be created, global and local
  def createHistograms(self,globalHist=False):
    if globalHist:
      pass
    else:
     self.createHistogram(TH1F,"N","N",5000,0,5000)
    self.info("added",self.histograms)
    #elem=self.fInput.FindObject("PROOF_CurrentElement")
    #if elem:
    #  fCurrent=elem.Value()
    #  print fCurrent.TestBit(TDSetElement.kNewRun),fCurrent.TestBit(TDSetElement.kNewPacket)
 
  # decide which dataset is current processed, note sure if it hase to be checkev every event
  def setInputName(self):
    returnname=""
    if self.fInput:
      elem=self.fInput.FindObject("PROOF_CurrentElement")
      if elem:
        fCurrent=elem.Value()
        if fCurrent.TestBit(TDSetElement.kNewRun) or True:
          returnname=elem.Value().GetDataSet()
          returnname=returnname.replace(".","_")
          #print self.n, returnname,fCurrent.TestBit(TDSetElement.kNewRun)
        else:
          return
      else:
        return
    else:
      return
    self.InputName=returnname
    if returnname in self.inputNames:
     pass
    else:
      self.inputNames.append(returnname)
      self.createHistograms()
      self.fOutput.Add(TObjString(returnname))
 
  # process, need to get the entry, need to set the dataset name (=InputName)
  def Process(self, entry):
    self.n+=1
    self.setInputName()
    self.fChain.GetTree().GetEntry(entry)
    #print self.n,self.histograms,self.histograms[self.InputName+"N"],self.InputName+"N"
    self.getHistogram("N").Fill(self.n)
    return kTRUE

-- DucBaoTa - 20 Nov 2013
Topic revision: r1 - 20 Nov 2013, DucBaoTa
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback