Souvik Dey Principal Software Engineer Suyash Karmarkar Principal Software Engineer OpenStack Summit Sydney Nov 7 th 2017 Lightning talk 1 2 What is SBC Performance testing of an SBC NFV ID: 807767
Download The PPT/PDF document "SECRETS FOR APPROACHING BARE-METAL PERFO..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
SECRETS FOR APPROACHING BARE-METAL PERFORMANCE WITH REAL-TIME NFV
Souvik DeyPrincipal Software Engineer
Suyash KarmarkarPrincipal Software Engineer
OpenStack Summit - Sydney, Nov 7th 2017 - Lightning talk
1
Slide22
What is SBC Performance testing of an SBC NFV
Performance Requirements of an SBC NFVPerformance bottlenecks Performance gains by tuning
Guest level tuningsOpenstack tunings to address bottlenecks (CPU, Memory)
Networking choices : Enterprise workloads/carrier workloadsVirtio SR-IOV
OVS-DPDK
Future/Roadmap items
Agenda
3
What is a SBC : Session Border Controller?
Slide44
SBC is - Compute, Network and I/O Intensive NFVSBC sits at the Border of Networks and acts as an Interworking Element, Demarcation point, Centralized Routing database, Firewall and Traffic Cop
Slide55
PPS for a Telco NFV
IFG
12Stripped on wire
Preamble8
Ethernet Header
14
64
IP Header
20
Transport
8
Packet Payload
18
CRC
4
84
Maximum MPPS
1.5
Slide66
Guarantee Ensure application response time.
Low Latency and JitterPre-defined constraints dictate throughput and capacity for a given VM configuration.Deterministic
RTC demands predictive performance. Optimized
Tuning OpenStack parameters to reduce latency has positive impact on throughput and capacity. Packet Loss
Zero Packet Loss so the quality of RT traffic is maintained.
Performance Requirements of an SBC NFV
Slide77
CPU - Sharing with variable VNF loadsThe Virtual CPU in the Guest VM runs as Qemu threads on the Compute Host which are treated as normal processes in the Host. This threads can be scheduled in any physical core which increases cache misses hampering performance. Features like CPU pinning helps in reducing the hit.
Memory - Small Memory Pages coming from different socketsThe virtual memory can get allocated from any NUMA node, and in cases where the memory and the cpu/nic is from different NUMA, the data needs to traverse the QPI links increasing I/O latency. Also TLB misses due to small kernel memory page sizes increases Hypervisor overhead. NUMA Awareness and Hugepages helps in minimizing the effects
Network - Throughput and Latency for small packets
The network traffic coming into the Compute Host physical NICs needs to be copied to the tap devices by the emulator threads which is passed to the guest. This increases network latency and induces packet drops. Introduction of SR-IOV and OVS-DPDK helps the cause.
Hypervisor/BIOS Settings - Overhead, eliminate interrupts, prevent preemption
Any interrupts raised by the Guest to the host results in VM entry and exit calls increasing the overhead of the hypervisor. Host OS tuning helps in reducing the overhead.
Performance Bottlenecks in Openstack
The Major Attributes which Govern Performance and Deterministic behavior
8
Isolate cores for Fast Path Traffic, Slow Path Traffic and OAM.Use of Poll Mode Drivers for Network Traffic
DPDKPF-RINGUse HugePages for DPDK Threads
Do Proper Sizing of VNF Based on WorkLoad.
Performance tuning for VNF(Guest)
Slide99
PERFORMANCE GAIN WITH CONFIG CHANGES and Optimized NFV
Enable CPU PinningConfigure libvirt to expose the host CPU features to the guestEnable ComputeFilter Nova scheduler filter
Remove CPU OverCommitCPU Topology of the GuestSegregate real-time and non real-time workloads to different computes using host aggregates
Isolate Host processes from running on pinned CPU
Enable NUMA Awareness
Enable Hugepages on the host for Guest Memory.
Extend Nova scheduler with the NUMA topology filter
Remove Memory OverCommit
Slide1010
Networks in OpenStack
PF1
PF2
VNF with SR-IOV Single-Root IO Virtualization
Kernel space
User space
VNF with Open vswitch (kernel datapath)
VNF with OVS-DPDK
(DPDK datapath)
Up to 50kpps
Up to 4Mpps per socket*
*Lack of NUMA Awareness
Up to 21 Mpps per core
Slide11Kernel TuningThe “cpu-partitioning” profile will also tune the kernel to
Remove read-copy-update work from isolated CPUsReduce timer tick to isolated CPUs (when busy) from 1000 to 1/secondFor best performing 0-packet loss, also use “isolcpus” boot parameterDisable KSM (Kernel Sharable Memory)
Host Tunables for Performance - Kernel configuration
Slide1212
Configuring the txqueuelen of tap devices in case of OVS ML2 plugins:https://blueprints.launchpad.net/neutron/+spec/txqueuelen-configuration-on-tap
Isolate Emulator threads to different cores than the vCPU pinned cores:https://blueprints.launchpad.net/nova/+spec/libvirt-emulator-threads-policySR-IOV Trusted VF:
https://blueprints.launchpad.net/nova/+spec/sriov-trusted-vfsAccelerated devices ( GPU/FPGA/QAT) & Smart NICs.
https://blueprints.launchpad.net/horizon/+spec/pci-stats-in-horizonhttps://blueprints.launchpad.net/nova/+spec/pci-extra-info
SR-IOV Numa Awareness
https://blueprints.launchpad.net/nova/+spec/reserve-numa-with-pci
Future/Roadmap Items
Slide1313
Q & A
More Details : https://www.openstack.org/summit/sydney-2017/summit-schedule/events/20538/secrets-for-approaching-bare-metal-performance-with-real-time-virtual-network-functions-in-openstack
Slide1414
Thank You
Contact:
skarmarkar@sonusnet.comsodey@sonusnet.com