/
Derivation “ heatmaps ” Derivation “ heatmaps ”

Derivation “ heatmaps ” - PowerPoint Presentation

riley
riley . @riley
Follow
65 views
Uploaded On 2023-10-04

Derivation “ heatmaps ” - PPT Presentation

Doug Benjamin Duke Thomas Beermann CERN Mario Lassnig CERN Attila Krasznahorkay CERN Ilija Vukotic Univ of Chicago Data product usage for Aug2015 Grid Only User jobs ID: 1022820

jobs daod 2015 grid daod jobs grid 2015 aug 13tev derivation branches data inputfileproject panda derivations read cern data15

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Derivation “ heatmaps ”" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Derivation “heatmaps” Doug Benjamin (Duke)Thomas Beermann (CERN)Mario Lassnig (CERN)Attila Krasznahorkay (CERN)Ilija Vukotic (Univ of Chicago)

2. Data product usage for Aug-2015(Grid Only – User jobs)AOD’s 60% of jobs, Derivations 26%(Hadoop – Panda Job Archive table)

3.

4. % of possible branches used by at least one user JobAug-2015 Grid only jobs

5. derivationjobsused branchestotal branches% usedDAOD_SUSY114031478202624%DAOD_SUSY107461505107647%DAOD_SUSY2562441069759%DAOD_SUSY95370487204424%DAOD_SUSY6509541489646%DAOD_HIGG5D14451375222817%DAOD_TOPQ13254645218929%DAOD_HIGG2D1297355294958%DAOD_HIGG4D22270375109034%DAOD_EXOT2170214789516%DAOD_STDM4151842091446%DAOD_HIGG8D1108651678666%DAOD_HIGG1D1105069097271%DAOD_HIGG4D3930408107938%DAOD_HIGG2D4908348208917%DAOD_TRUTH17309522043%DAOD_EGAM1606746130457%DAOD_SUSY358830272042%DAOD_EXOT13534198189110%DAOD_HIGG5D3520354231715%DAOD_EXOT11430343210816%A few numbers

6. Aug-2015 Grid only jobs

7.

8.

9.

10.

11. Aug-2015 Grid only jobs

12. Aug-2015 Grid only jobs

13. Long tail~50% branches accessed in less 10% of the jobsPerhaps Derivation should be further splitSystematics derivation (read occasionally)Primary derivation (fewer branches but read more often)Most Derivations – make a good case for remote access - Aug-2015 Grid only jobs

14. Further workContinue to use jobs metric (vs # of time read)Separate data and mc samplesSeparate Panda jobs from Off Grid useSeparate work by month to see changes over timeDo we need to isolate derivations by AMI tag?Require mapping file name onto dataset and AMI tag – Need to mine the Rucio information in the HADOOP cluster – Is this an expensive operation?

15. ConclusionsNow have a monitoring framework to actually see what the user jobs are doingUsers read far fewer branches than are in the derivation files. Are we creating WRITE-ONLY data !!??!!All derivations could have their size reducedStart of a process – involves many different groups - Should be a discussion at Software Week

16. Backup plots

17. Data derived from Panda jobs table in CERN analytics hadoop clusterVariables used for job filteringPRODSOURCELABEL=='user’ NOT PRODUSERNAME=='gangarbt’(INPUTFILEPROJECT=='data15_13TeV' OR INPUTFILEPROJECT=='mc15_13TeV:mc15_13TeV' OR INPUTFILEPROJECT=='mc15_13TeV' OR INPUTFILEPROJECT=='data15_13TeV:data15_13TeV')Did not attempt to try to join the information from the Rucio Branch Cache tables for performance reasons and consistency reasons.Non grid jobs are not linkable to Panda Jobs table

18.

19. Aug-2015 Grid only jobs

20. Aug-2015 Grid only jobs

21.

22.

23. Aug-2015 Grid only jobs

24.

25.

26.

27. Aug-2015 Grid only jobs

28.

29.

30. Aug-2015 Grid only jobs

31.

32.