John Shade CERN ITCS September 2010 GDB LHCOPN Operations and Monitoring WGs F2F LHCOPN meetings London 89 March http indicocernchconferenceDisplaypyconfId 80755 Barcelona 2829 June ID: 464196
Download Presentation The PPT/PDF document "LHCOPN Update" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
LHCOPN Update
John Shade /CERN IT-CS
September 2010 GDBSlide2
LHCOPN Operations and Monitoring
WGsF2F LHCOPN meetingsLondon 8/9 March (http://
indico.cern.ch/conferenceDisplay.py?confId
=80755)Barcelona 28/29 June (http://indico.cern.ch/conferenceDisplay.py?confId=88698)Operations WGQuarterly phone conferencesTrack correlation between outages and GGUS ticketsMonitoring WGConference calls in May/June, numerous e-mail exchangesperfSONAR MDM setup & deploymentLHCOPN Dashboard designMailing list: LHCOPN-Interest@cern.ch
J. Shade/GDB LHCOPN Update
2
Working Groups
08-SEP-2010Slide3
Working with DANTE to get a robust MDM solution in place (
perfSONAR rollout had stalled)Clarified how to access performance data, and defined requirements for a dashboard for visualisation
:
See https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWGComments on Requirements document are still welcome!J. Shade/GDB LHCOPN Update 3Monitoring08-SEP-2010Slide4
Missing a central view of LHCOPN
Weathermap and e2emon applications restricted to GEANT portal
HADES data:
Bandwidth Test Control / Achievable Bandwidth (BWCTL, automated 1Gbit/s TCP Bandwidth Control Test) One Way Delay (OWD) One Way Delay Variance / Jitter (OWDV) Packet loss Traceroute (number of hops between two Hades nodes) Duplicate packets Out of order packets J. Shade/GDB LHCOPN Update
4
Monitoring
08-SEP-2010Slide5
Site
status is up when OWD between +/-15% from baseline and packet loss less than 0.1% per five minutes
Site
status is down when packet loss = 100% per five minutesSite status is degraded when measurement values are between a) and b). J. Shade/GDB LHCOPN Update 5Initial (simple) algorithm
08-SEP-2010Slide6
J. Shade/GDB LHCOPN Update
6
Prototype Dashboard
08-SEP-2010Slide7
J. Shade/GDB LHCOPN Update
7
Prototype Dashboard
08-SEP-2010Slide8
DANTE baulked at the idea of developing their prototype further and supporting it
SARA and CERN have picked up the gauntlet
SARA developers have tested XML
query/responses against the central HADES repository at DFNTOM team leader is evaluating how best to develop/integrate the LHCOPN dashboardSites already have local monitoring, but we need to provide a central view!Nagios probes for sites are also expectedJ. Shade/GDB LHCOPN Update 8Where do we go from here?
08-SEP-2010Slide9
Next F2F LHCOPN meeting will take place at CERN on 7
th-8th October
Agenda:
http://indico.cern.ch/conferenceDisplay.py?confId=102716Includes participants from Internet2, DANTE, T1s etc.Topics to be covered include:Tier2 Connectivity RequirementsService Level DefinitionGGUSMonitoringOperations J. Shade/GDB LHCOPN Update 9
Upcoming Events
08-SEP-2010Slide10
J. Shade/GDB LHCOPN Update
10