and analysis framework Motivation to build a new data structure and analysis framework Kdata We had Edw II data analysis dispersed between Ana and Era 2 experts full time analysis ID: 793889
Download The PPT/PDF document "EDELWEISS data structure" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
EDELWEISS data structureand analysis framework
Slide2Motivation to build a new data structure and analysis framework (Kdata)We had:
Edw
-II data analysis dispersed between
Ana and Era2 experts (full time analysis)Each with their own code single(few local)-user / single-programmer2010 A. Cox and I struggling to find, to access and to analyze Edw2 dataCoincidence (Muon-Veto/Bolometer) study as diploma work
Benjamin Schmidt
Era
Root based, but difficult access, no server with most recent code/data…
Saclay
AnaFortran, Paw and C, No paw support, French comments in code/data…
lack of documentation
Task:Get the data
J
. Cham
Slide3Short term facilitate data accessBuild flexible event based data structure
Single combined
HLA-file:
muon-veto and bolometer dataMake code and data easily availableDocumentationLong term establish a common collaboration-wide analysis and data storage toolShare tasks (calibration, template creation, …) / Remove barriers (documentation)Allow for upgrade to 100’s of detectors – develop automatic processing schemeBenjamin Schmidt
Motivation to build a new data structure and analysis framework (Kdata)
Slide4The general picture – The idea All software modules Benjamin Schmidt
KDS
data structure
KPTA
pulse trace analysis
Kamping
Raw
Amp
HLA
Analysis:
KDataPy
KQPA
DAQ
KSamba
ampToHLA
A bit special:
Standalone code
Extensive use of templates
Slide5Specific known - unknown requirements during Kdata development
Requirements Edw-3:
10 -> 40
detectorsLarger workload for debugging, calibration and analysisNew detector design (channel number/specifics initially unknown)New electronics (some specifics unknown)1st time resolved ionization signals (trace length?, num traces?)Change in analog amplifiers -> signal shape?, trace length?, sampling?new efforts to optimize signal
treatment neededIntegrate muon
-veto in bolo DAQBenjamin Schmidt
Slide6The idea:Build a data storage and analysis framework use ROOTfor event-based physics dataFast I/OSupport for LHC lifetimeData compression
Statistics tools
Well known
C++ class library for data encapsulationKeep it modularKeep it flexible and generalTry to keep it simple Keep fully split tree (library independent)Document itMake it easily accessibleBenjamin SchmidtEvent based data sorage
Kdata - implementation
repository
https://
edwdev-ik.fzk.de/SVN_Repository_for_the_KIT_Dark_Matter_Group/KData.html
Kdata event structure in detailUse ROOT types
No nested
arrays
Kdata library not needed to read dataLong livety of data guaranteedKdata coded consistent to ROOT and taligent coding style:Easier to read/collaborate/check codeFor example:classes defined in header .h; implemented in .cxxvariables start with small f (fChannelName; fAmp; fExtra; …)functions
start with capital letter GetChannelName(); GetTrace
();…Kds completely implemented with Get…() and Set…() methodsTab completion (ipython, root session)
Benjamin Schmidt
Slide8Kdata event structure in detailROOT TTree with single event branch
Event with flexible structure:
Variable sized
TClonesArrays for Bolometer-, BoloPulse-, PulseAnalysis-, Samba- and MuonModule informationAllows to change in hardware number of bolos/number of channels per bolo… without code change in “kds” (data structure source code)!Requires some effort to get to know, thoughBenjamin Schmidt
Slide9Kdata event structureLogic Layout:Benjamin Schmidt
TTree
KEvent
KBoloPulseRecord
= Channel
KPulseAnalysisRecord
KSambaRecord
KMuonModuleRecords
KBolometerRecord
Logic event structure via
TRef
and
TRefArray
Very powerful – can be spread over files,….
A word of caution though:
Require specific handling in event building: Never forget to reset the referenced object count
TProcessID
::
SetObjectCount
->blowing up file size otherwise
Probably most bugs and
pbs
in
kds
were related to
TRef issues
Slide10Kdata event structureLogic Layout:Benjamin Schmidt
TTree
KEvent
KBolometerRecord
KBoloPulseRecord
= Channel
KPulseAnalysisRecord
KSambaRecord
KMuonModuleRecords
Looping in python:
for event in
filereader
:
for bolo in
event.boloRecords
():
for pulse in
bolo.pulseRecords
():
for
analyis
in
pulse.analysisRecords
():Looping C++ style in python:for i in range(f.GetEntries()): f.GetEntry(i) event = f.GetEvent() for ii in range(event.GetNumBolos()): bolo = event.GetBolo(ii) samba
= bolo.GetSambaRecord() print samba.GetNtpDateSec() for iii in range(bolo.GetNumPulseRecords()):
pulse = bolo.GetPulseRecord(iii) Trace = pulse.GetTrace()
…
KPulseAnalysisRecord
KPulseAnalysisRecord
Bandpass analysis
Optimal filter
Trapezoidal filter …
Slide11Kdata event structure in detailBenjamin Schmidt
Structure
subclassed
inRaw: KRawEvent, KRawBolometerRecord, …Amp: KAmpEvent, KAmpBolometerRecord, ….HLA: KHLAEvent, KHLABolometerRecord, …
Raw – with pulse traces!
No
KPulseAnalysisRecords
Amp and HLA – no pulse traces, but KPulseAnalysisRecord
With a quick calculation
2.87* 356/1850 *2.35
FWHM 1.04
keV
Ana 1.1
keV
< 1/10
raw file size
~ 1/2 samba file size
Slide12Python and KDataPy
Benjamin Schmidt
Slide13simpleEventViewer output:
Benjamin Schmidt
Slide14Looping
utilites
–
no need to write the looping/plotting
Benjamin Schmidt
Use KDataPy.util with plotpulse(), looppulse(), loopbolo() andKDataPy.loop_amp
with loopchannel(), plotchan_x(), plotchan_x_files(), plotchan_x_dir()
Loop_amp to be completed with plotchannel_xy(), … and loop/plotbolo functions – Note that KDataPy.util loopbolo() also works for Amp and HLA data
Basic usage:import ROOTimport KDataPy.util as ut
ut.plotpulse(“/sps
/edelweis/kdata/data/raw/nk23b002_000.root”, “
chalB FID823”)Documentation
Slide15Our data acquisition chains revisited
Benjamin Schmidt
Samba Macs
Muon
Veto
DAQ
Bolo-Raw data
Automated
proc0: copy to Lyon
proc1:
rootification
p
roc2: raw->amp
proc3: amp->
hla
p
roc4: merge/skim
muon
/
hla
bolo data
spsToHpss
:
backup on tape drive
Kdata - ROOT on kalinkaOur look up place
Modane
Lyon
Karlsruhe
Radon
Slide16Using the Kdata pulse processing library
Benjamin Schmidt
Adam Cox our benevolent dictator for life
Slide17The KPulseAnalysisChain
Benjamin Schmidt
The
kpta
-
chain is applied before your analysis function
Slide18Ionisation channel after pattern removal:
Benjamin Schmidt
Slide19Advantages – Drawbacks (personal opinion)Flexibility of data structureConsistency of data structure (over time)
Same data structure for different detector systems -> Great for coincidence studies
Same data structure for different processing/analyses (
bandpass, optimal filter, …)Decouple high level analyses from DAQ/processing changesIndependent kpta libraryHas been reused with (flat) data from EURECA test standVery versatileBenjamin Schmidt
Flexibility of data structure comes with some complexity (
heavyness
)
Especially
Ttree.Draw() more
complexSingle raw data folder restricted use of ls
Writing kpta with templates a bit more complex
Slide20Usage of pyhtonBenjamin Schmidt
90 % of the time
python
feels like the right solution
Shorter, more legible code
Vast set of external libraries
Extremely handy for scripting
Basic Documentation in python always via ‘’’docstrings’’’
Main price – speed:
Circumvent by producing an additional set of data files skimmed by detector
Future use of
pypy
+ ROOT6
Slide21Benjamin Schmidt
But 50 x slower
PyPY
-JIT compile 1.06 x slower
Slide22Benjamin Schmidt
Slide23CouchDB for everything else andpython to glue everything together
Automat database (117 parameters every 20 sec)
dataDB
Samba header informationUseful to find data under conditions(temperature, voltage, run_type,…)Processing stateHistory of processing/file location (complete documentation)Supplementary processing databasesTemplates, high-/lowpass filter parameters, cutsRadon measurements…
Benjamin Schmidt
Slide24A more complex example:Heat template fitting codeThree python modules (all part of
KDataPy
!):
templateFitSelection.py (looping over data, select pulses, average parameters; call the other scripts)pulsetempy.py (perform template fit)uploadAnalyticalTemplateToDB.py (save fit parameter to DB)Usage:Import KDataPy.TemplateFitSelection as tfittfit.templateFitSelection(‘/sps/kdata/data/raw/nk23b002_000.root’)t
fit.run(‘chalB FID808’)
Note that there are some more options though!The code itself is commented and should help to discover more optionsSorry – Documentation (web) has not been updated yetBenjamin Schmidt
Slide25Basic looping once moreMore verbose version:Use
plotPulseEventViewer
module in
kanacodewokimport plotPulseEventViewer as pltplt.plotPulseEventViewer(‘/sps/edelweis/kdata/data/raw/nk23b002_000.root’, ‘chalA FID823’)
Benjamin Schmidt
Slide26More advanced usageHook in an analysis function
Benjamin Schmidt
Slide27Processing – some detailsDatabase driven:Proc0:
scp
of samba raw data to
ccage (Lyon)Task1: change scp account to keitel (all tests finished, batch-, hpss-,…)Task2: add md5 checksum test after transfer Proc1: rootification (Modane)
scp to ccage
(Lyon)Task: transfer rootification to LyonProc2: processing and filteringTemplate fitting tools with DB access implementedAdaptation of processing to 8 step function ionization channels
All data from november processed with KFeldbergKampSite
(BW Bandpass filter – all channels treated seperately) sps/
edelweis/kdata/data/amp/Run305Task1: automate using DB and redhook.sh scriptTask2: implement KSeebugKampSite
(BW Bandpass with simultaneous heat-ionization fits)Task3: (longer term) revive/debug optimal filter KChamonixKAmpSite
Benjamin Schmidt
Slide28Processing – some more detailsProc3: calibration of Amp level filesTask1: portation of Era scripts: perform calibration, store results (
calibDB
)
Task2: implement Amp->HLA process using calibDBProc4/5/6:Tasks: concat/Merge/Skim dataWhat can/should be automated?Tasks: facilitate access to data:Implement run list based on datadb (see talks by Cecile/Lukas/Valentin)Write python utilities to facilitate plotting/loopingKDataPy.utilKDataPy.loop_amp …
spsToHPSS:
Fully workingTask1: nj13b…tar. There is a file that was too big for automatic processingTask2: implement md5 checksum test after writingBenjamin Schmidt
Slide29Template fittingThe program is rather verbose!
Benjamin Schmidt
Slide30Template fittingBenjamin Schmidt
Strong dependence on initial parameters
Initial
params from last fit pulstemplates
dbSome tweaking still necessary (larger amplitude…)
Slide31A useful trick – Quitting your loop
Benjamin Schmidt
Slide32Loop-/plotboloYou need to correlate channels?
skip looping at bolometer level
Benjamin Schmidt
Slide33Benjamin Schmidt
Okay a stupid example, but a quick one
Note the documentation with further examples:
KDataPy Utility functions
Slide34From theory to practice – Part 2Working with Amp level dataBenjamin Schmidt
Structure
subclassed
inRaw: KRawEvent, KRawBolometerRecord, …Amp: KAmpEvent, KAmpBolometerRecord, ….HLA: KHLAEvent, KHLABolometerRecord, …
Raw – with pulse traces!
No
KPulseAnalysisRecords
Amp and HLA – no pulse traces, but KPulseAnalysisRecord
With a quick calculation
2.87* 356/1850 *2.35
FWHM 1.04
keV
Ana 1.1
keV
< 1/10
raw file size
~ 1/2 samba file size
Slide35Ttree.Draw() exampleBenjamin Schmidt
With a quick calculation
2.87* 356/1850 *2.35
FWHM 1.04
keV
Ana 1.1 keV
TTree
->Draw() command or rather
TChain
->Draw() (called from python)
c.Draw
("
fPulseAna
[].
GetAmp
()", "
fPulseAna
[].
GetBoloPulseRecord
().
GetChannelName
() == \"
slowD
FID823\" &&
fPulseAna
[].GetExtra(8)==5 ")
Slide36Using loop_amp
Benjamin Schmidt
Or – if the automatic binning is too crude:
Slide37Loop_amp together with file lists/directoriesUse loop.plotchan_x_files([“file1.root”, “file2.root”], ‘channel’, …) or
use
loop.plotchan_x_dir
(‘directory’, ‘file-pattern’, ‘channel’, …)Benjamin Schmidt
Amplitude
Entries
Entries
Amplitude
Entries
Slide38Plotting a Tgraph of two variables – very first example: RMS vs energy
Benjamin Schmidt
Chi2
Amplitude
These are just examples
Develop your own “hook-in” functions!
x_some_function
()Xy_some_function()….
Slide39Calibrated dataERA calibrated data in Kdata v3.0 format for Run12
Computing Center in Lyon and at KIT
Ana calibrated data in
Kdata (dev-version) for Run20https://edwdev-ik.fzk.de/wsvn/EDELWEISS/analysis/kdata/branches/newhla2/An initial data set FID804 available at KIT and Lyon/sps/edelweis/schmidt/AnaToKData/Run20
KData preliminary analysis files of single detectors Run12 – Run20 – Run 304 at KIT
Benjamin Schmidt
Hole collecting
Hole veto
Electron veto
Electron collecting