Balazs Voneki CERNEPLHCb Online group TIPP2017 Beijing 23052017 LHCb Upgrade Early results of RDMA optimizations on top of 100 Gbps Ethernet TIPP2017 Beijing 23052017 Balazs Voneki ID: 657449
Download Presentation The PPT/PDF document "RDMA optimizations on top of 100 Gbps Et..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
RDMA optimizations on top of 100 Gbps Ethernet for the upgraded data acquisition system of LHCb
Balazs VonekiCERN/EP/LHCb Online group
TIPP2017, Beijing
23.05.2017Slide2
LHCb Upgrade
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
2
To improve detectors and electronics such that the experiment can run at higher instantaneous luminosity
Increase the event rate from 1 MHz to 40 MHz
Selection in software
Key challenges:Relatively large chunksEverything goes through the networkRU=Readout UnitBU=Builder UnitSlide3
Network technologies
3
Possible 100 Gbps solutions:
Intel® Omni-Path
EDR
InfiniBand
100 Gbps Ethernet
Possible 200 Gbps solution:
HDR InfiniBandArguments for Ethernet:Widely usedOld, mature, well-triedOPA and IB are single vendor technologies, Ethernet is multi vendorEthernet is challenging at these speeds
Early results of RDMA optimizations on top of 100 Gbps Ethernet, TIPP2017, Beijing, 23.05.2017 – Balazs VonekiSlide4
Iperf
result charts
4
TCP
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs VonekiSlide5
Iperf
result charts
5
TCP
UDP
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
Major difference between vendors
High CPU loadSlide6
Linux network stack
6
Source:
http://www.linuxscrew.com/2007/08/13/linux-networking-stack-understanding
/
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs VonekiSlide7
What is RDMA?
7
Remote Direct Memory Access
DMA from the memory of one node into the memory of another node without involving either one’s operating system
Performed by the network adapter itself, no work needs to be done by the CPUs, caches or context switches
Benefits:
High throughput
Low latency
These are especially interesting forHigh Performance Computing!Early results of RDMA optimizations on top of 100 Gbps Ethernet, TIPP2017, Beijing, 23.05.2017 – Balazs Voneki
Source:
https://zcopy.wordpress.com/2010/10/08/quick-concepts-part-1-%E2%80%93-introduction-to-rdma
/Slide8
RDMA Technologies
8
Available solutions:
RoCE (RDMA over Converged Ethernet)
iWARP
(internet Wide Area RDMA Protocol)
RoCE needs custom settings
on the switch (priority
queues to guarantee losslessL2 delivery). iWARP does notneed that, only the NICs hasto support it.Test made using:Chelsio T62100-LP-CR NICMellanox CX455A ConnectX-4 NICMellanox SN2700 100G Ethernet switch
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017 – Balazs Voneki
Source:
https
://www.theregister.co.uk/2016/04/11/nvme_fabric_speed_messaging/Slide9
Testbed details
9
iWARP
testbench
elements
:
4 nodes of:
Dell PowerEdge C62202x Intel® Xeon® CPU E5-2670 at 2.60GHz (8 cores, 16 threads)32 GB DDR3 memory at 1333 MHzChelsio 100G T62100-LP-CR NICRoCE testbench elements:4 nodes of:Intel® S2600KP2x Intel® Xeon® CPU E5-2650 v3 at 2.30GHz (10 cores, 20 threads)64 GB DDR4 memory at 2134 MHzMellanox CX455A ConnectX-4 100G NIC
Early results of RDMA optimizations on top of 100 Gbps Ethernet, TIPP2017, Beijing, 23.05.2017 – Balazs VonekiSlide10
Result charts
10
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
iWARP
RoCESlide11
Result charts
11
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
iWARP
RoCE
iWARPSlide12
Result charts
12
By 1 single thread
2.5% CPU for
all message
sizes
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
iWARP
iWARP
RoCESlide13
Result charts
13
By 1 single thread
2.5% CPU for
all message
sizes
30 Gbps
2.5% CPU
17 Gbps
2.5% CPU
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
RoCE
TCP
UDPSlide14
Result charts
14
By 1 single thread
2.5% CPU for
all message
sizes
94.4 Gbps
8.6% CPU
98 Gbps
18% CPU
30 Gbps
2.5% CPU
17 Gbps
2.5% CPU
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
RoCE
TCP
UDPSlide15
MPI result
charts with RoCE15
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
RoCESlide16
Result charts 2 with RoCE
16
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
Purpose of heat maps:
To check stabilitySlide17
Summary
17Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki
Promising results
Pure
TCP/IP is inefficient
Zero-copy approach is
neededNeed to understand why the bidirectional heat map is not homogeneousSlide18
Thank you!
18
Early results of RDMA optimizations on top of 100 Gbps
Ethernet, TIPP2017, Beijing, 23.05.2017
– Balazs Voneki