/
Network awareness and network as a Network awareness and network as a

Network awareness and network as a - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
391 views
Uploaded On 2017-04-05

Network awareness and network as a - PPT Presentation

resource in PanDA Artem Petrosyan University of Texas at Arlington 3d ANSE Collaboration Meeting UTA 12613 PanDA and networking Goal for PanDA Direct integration of networking with PanDA workflow never attempted before for large scale automated WMS systems ID: 534178

panda network anse data network panda data anse meeting collaboration selection site atlas networking information sites updated number plan

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Network awareness and network as a" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Network awareness and network as a resource in PanDA

Artem Petrosyan (University of Texas at Arlington)

3d ANSE

Collaboration Meeting

, UTA, 12/6/13Slide2

PanDA and networking

Goal for PanDA

Direct integration of networking with PanDA workflow – never attempted before for large scale automated WMS systems

Why PanDA and networkingPanDA is a distributed computing workload management systemData transfer/access is done asynchronously: by DQ2 in ATLAS, PhEDEx in CMS, pandamover/FAX for special cases…Data transfer/access systems can provide first level of network optimizations – PanDA will use these enhancements as availablePanDA relies on networking for workload data transfer/accessHigher level of network integration – directly in workflow managementNetworking is assumed in PanDA – not integrated in workflow

12/6/13

3d ANSE Collaboration Meeting

2Slide3

Concept: network as a resource

PanDA as workload manager

PanDA automatically chooses job execution site

Multi-level decision tree – task brokerage, job brokerage, dispatcherAlso manages predictive future workflows – at task definition, PD2P (Panda Dynamic Data Placement)Site selection is based on processing and storage requirementsCan we use network information in this decision?Can we go even further – network provisioning?Further – network knowledge used for all phases of job cycle?Network as resourceOptimal site selection should take network capability into account

We do this already – but indirectly using job completion metricsNetwork as a resource should be managed (i.e. provisioning)

We also do this crudely – mostly through timeouts, self throttling

12/6/13

3d ANSE Collaboration Meeting

3Slide4

Scope of effort

Three parallel efforts to integrate networking in PanDA

US ATLAS

fundedPrimarily to improve integration with FAXASCR funded – BigPanDA project, taking PanDA beyond LHCNext Generation Workload Management and Analysis System for Big Data, DOE funded (BNL, U Texas Arlington)ANSE funded – NSF CC-NIE programIntegrate advanced network-aware tools in the mainstream production workflows of ATLAS and CMS

12/6/13

3d ANSE Collaboration Meeting

4Slide5

PanDA use cases

Use network information

for

Cloud selectionSite selectionFAX brokerageJob assignmentDynamic data placement (PD2P)Provision circuits for PD2P transfersProvision circuits for input transfersProvision circuits for output transfers12/6/133d ANSE Collaboration Meeting

5Slide6

Site selection plan

S

ite selection in PanDA based on site weight calculation formula:

Weight=((1+number of available nodes/(number of active nodes+1))*number of running jobs)/number of activated or assigned jobshttps://twiki.cern.ch/twiki/bin/view/PanDA/PandaBrokerageSite selection basing on network info as extension of standard PanDA brokerage mechanism, include dynamic info to the formula based on configuration parameters:Select additional N sites basing on network infoThroughputs > 50Mb/sec considered “good” and equated with 50Network weights calculation formula:

(Throughput/50)*0.5 – maximum weight should not exceed 0.5 so that we set priority to sites selected basing on configuration parameters

Example: (34.5/50)*0.5=0.345

12/6/13

3d ANSE Collaboration Meeting

6Slide7

Cloud selection p

lan

Optimize choice of T1-T2 pairings (cloud selection

)In ATLAS, production tasks are assigned to Tier 1’sTier 2’s are attached to a Tier 1 cloud for data processingAny T2 may be attached to multiple T1’sCurrently, operations team makes this assignment manuallyThis could/should be automated using network information12/6/13

3d ANSE Collaboration Meeting

7Slide8

Sources of network i

nformation

DDM Sonar measurements

ATLAS measures transfer rates for files between Tier 1 and Tier 2 sites (information used for site white/blacklisting)Measurements available for small, medium, and large filesPerfSonar measurementsAll WLCG sites are being instrumented with PS boxesUS sites are already instrumented and fully monitoredFAX measurementsRead-time for remote files are measured for pairs of sitesStandard PanDA test jobs (HammerCloud jobs) are used

12/6/13

3d ANSE Collaboration Meeting

8Slide9

Data repositories

Native data repositories

Historical data stored from collectors

SSB – site status board for sonar and PS data (currently)HC FAX data is kept independently and uploadedAGIS (ATLAS Grid Information System)Most recent/processed data only – updated periodicallyPushed via JSON APISchedConfigDBInternal Oracle DB used by PanDA for fast accessData updated by extension of standard SchedConfig collector

12/6/13

3d ANSE Collaboration Meeting

9Slide10

Monitoring sources

SSB

http://

dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?view=SonarUpdate by different sources with different frequencyAGIS throughput source-destination pairs (dev)http://atlas-agis-dev.cern.ch/agis/close_sites/atlassites_links/ Updated every hourIntelligent Networking page (dev)http://

voatlas142.cern.ch/networking/Data updated

every hour, synchronized with update of AGIS

12/6/13

3d ANSE Collaboration Meeting

10Slide11

Status

Data delivery chain network data sources-SSB-AGIS-SchedConfigDB is on place and run on production machines

Site selection plan is implemented

Cloud selection plan in the developmentMonitoring pages run on integration machine12/6/133d ANSE Collaboration Meeting11Slide12

Plans

Short term

Put site selection

to productionImplement cloud selection algorithmImplement FAX brokerage algorithmEvaluate algorithmsExtend monitoringMedium termReliability of network informationDynamic network informationInternal measurementsLong termGo through PanDA use cases

12/6/13

3d ANSE Collaboration Meeting

12Slide13

Questions?

12/6/13

3d ANSE Collaboration Meeting

13