/
USING BENCHMARKING California at Electronics Research benchmarking res USING BENCHMARKING California at Electronics Research benchmarking res

USING BENCHMARKING California at Electronics Research benchmarking res - PDF document

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
425 views
Uploaded On 2016-06-18

USING BENCHMARKING California at Electronics Research benchmarking res - PPT Presentation

the domain network processing for instance transport media protocols including SONET become increasingly flexible support Unfortunately their supported interfaces Choosing the right Platform pe ID: 367014

the domain network processing for

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "USING BENCHMARKING California at Electro..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

USING BENCHMARKING California at Electronics Research benchmarking research usually In this case, computing performance primary concern. benchmarks contain executable specifica- application with corresponding data sets for this purpose. like to extend benchmarking full systems, able platforms. the diversity system architectures constantly evolving applications, it is currently to compare quantitatively. In this chapter, present principles benchmarking methodology that aim to facilitate the realistic evaluation system architectures. our approach is the sep- measurement within benchmark specification. particular attention to system-level interfaces normalizing environment to quantitatively able platforms. Looking at a set representative benchmarks motivate their selection equipment. Finally, our methodology network processor. the domain network processing, for instance, transport media protocols, including SONET, become increasingly flexible support. Unfortunately, their supported interfaces. Choosing the right Platform performance be measured throughput, latency, resource usage, and power consumption. are a performance metrics that These include derived metrics, such cost effectiveness (performance/cost), Thus, the performance quickly becomes large, differ across architects, system designers, system-level benchmarking, that allows benchmark results comparable, representative, real-world application performance. Comparability results is achieved precise specification benchmark functionality, environment, measurement. Functionality captures the important aspects benchmark's algorithmic contrast, the environment supplies the normalizing test-bench in which functionality resides allows results be compared across platforms. Finally, guidelines for the that quantitative results consistently measured across benchmark functionality, environment, For benchmarks be rep- real-world applications, select benchmarks not only methodology describe to select alistic suite benchmarks based application-domain analysis, to choose the appropriate granularity that results be indicative real-world performance. this chapter follows: Next, we describe our generalized benchmarking methodology. apply it network processing domain in our proposed example benchmark IPv4 packet forwarding. demonstrate results for the IXP1200 network processor we summarize previous relating to benchmarking. Finally, this chapter is concluded Benchmarking Methodology NPF-BWG (see Section example, implement benchmarks according to three different granularities: (micro-level), medium large (application right benchmark granularity is determined architecture. In some particular sub- small application kernel) dominate the larger application. cases bottlenecks cannot easily be identified or are the architecture implemented. In cases, choosing benchmark granular- ity that too small performance results that real-world application performance. benchmark must architectural bottlenecks the domain. Precise Specification our methodology is the precise specification benchmark specification requires surement specifications communicated in both descriptive (e.g. executable description. distinguish between measurement specifications. algorithmic details tional parameters required the benchmark. contrast, the system-level in- for a specific architectural platform. This ensures comparability results across multiple architectural platforms. environment spec- defines the test-bench functionality, which includes stimuli defines applicable performance metrics measure them. to determine functional correctness should be provided. specifications should forms: A English) specification and an executable description. English descrip- should provide all the necessary information system. This description also provides implementation guide- acceptable memorylspeed trade-offs (e.g. it be unreasonable for a benchmark to on-chip memory), required precision special hardware acceleration units. Besides Benchmarking Methodology for models overcome many the difficulties associated with comparing benchmark results different network processors. network processors are and edge equipment. In these segments, routing and tasks are performed by systems based on line cards with switched back-plane architectures. network pro- cessor sits a line card assists defining the boundary between functionality Later we illustrate the gateway model, more low-end access segment network equipment. router architecture. manage the physical network interfaces router. Typical router architectures, such as the one variable number cards [52, cards are included line card is connected to that passes packets to other line cards or control processors. different line card deployment network processor is configured serial fashion to its sur- rounding components, while the network processor in is connected Ethernet MAC units parallel fashion. Network pro- cessors within cards may have different interface configurations. example, line supports an OC- 192 interface, while supports 10x1 Gigabit Ethernet interfaces. Benchmarking Methodology for each port. functional specification must also be these parameters they directly impact functionality. and type active ports the system must specifically be defined order to program when, and how packets within the network processor. In optimal performance cannot achieved without customization benchmark software particular port configuration. IXP1200 must treat slow ports significantly differently than fast ports (i.e. Gigabit Ethernet) [108]. card configurations vendors, such as as and Juniper Networks Networks have a wide range of performance requirements, 2.1. Hence, we propose two card port configurations that targets lower performance card deployment scenarios and that targets higher performance scenarios. For this each benchmark should specify valid port configurations: that models low-end line card configuration and that models each benchmark may specify its own port configuration, configuration should implementation complexity. standardized port facilitates the understanding the overall performance network processor. processor interfaces. For the low-end line card configuration, Fast Ethernet ports, relatively low-bandwidth configuration many router vendors. For the high-end card configuration, standard configuration four OC-48 packet-over-SONET ports aggregate bandwidth). interfaces are card configurations In the future, a potential application is the a PC or workstation to a LAN, and has network interface the PC low-cost ASIC-based products designed for simple evolving applications such virtual private networks (VPN) at large data rates place greater processing requirements PC. Offloading to the necessary result network processing requirements. work will NIC environment model network processors. Additional considerations vironment specification are the packet sources sinks. The IETF ing Methodology recommends [32] exercising network following test setup: tester supplies the device under test with packets, which in turn sends its output tester. In this case, the tester as both sink. Alternately, packet sender (source) supplies whose output is fed into a packet receiver serve the our benchmark- traces meet benchmark environment specification. to the set for interconnected devices, such must include packet sizes, from the to maximum allowable packet according to particular application or network medium Ethernet). Specifically, recommend using packet sizes 128, 256, 1024, 1280, bytes [32]. For protocols that packets, network processor vendors often report packet throughput using minimum-size packets. applications operate primarily packet header, more processing must be performed on a large minimum-size packets large packets. Using minimum-size or maximum-size packets can test the comer benchmark performance, overall performance we defer to the IETF's and Newman's Newman's observed packet sizes Internet samples from four IP packet sizes were 23%, 17%, total, respectively). Newman uses this proportion packet sizes to develop pattern called are still for the deployment scenario requires workload processor benchmarks Figure 2.4, this benchmark. Functional Specification this benchmark is based 1812, Re- Routers [16]. Click Implementation Network Interface Control Interface and Load Distribution Functional Correctness packet arriving to be next-hop location is determined through prefix match IPv4 destination address field. packet is flagged and to the control plane. packets, packets to the control plane. packet header packet header fields' checksum TTL are and buffers can be optimally configured for the processor architecture unless large ability to measure sustained performance. network processor must maintain all non-fixed tables (i.e. tables the LPM) updated with minimal intrusion to the Routing tables the LPM should be able to address destination address should support next-hop information to 64,000 destinations simultaneously. and sinks that model network interfaces the low-end Performance Measurement the functional correctness a benchmark implementation, output packets (including from the Overall performance should the aggregate packet-forwarding network processor such that no valid packets are dropped. successively lowering which packets port simultaneously network processor does valid packets. This measurement is obtained using packet traces and the environment. Benchmark Suite After having shown we would to define scenario. Before introduce our suite benchmarks for network processors, we first discuss benchmark our methodology. Exception Queueing Misc. packet forwarding on the IXP1200 the appropriate benchmark granularity chosen after careful application-domain analysis formance bottlenecks. benchmarking approaches with relatively our experience, Benchmark Suite packet forwarding throughput integrated access devices, access concentrators, and cable this classification, major application categories which network processors are likely to play benchmarks were chosen based their relative significance the network. Ethernet Bridge IPv4 Packet Forwarding Network Address Port Translation (NAPT) (Layer-7) Switch Bridging and Packet over QoS and IPv4 Packet Forwarding with QoS and Statistics Gathering and Security Packet Forwarding with Tunneling and Encryption Packet Trace Size (bytes) 64 65 512 Packet Trace Size (bytes) In theory, packet size increases, packet throughput should increase. In throughput between byte packets, and across all three benchmarks. This is byte alignment the receive and transmit on the are required for non-aligned packet segments. information provides insight into the architectural nuances network processors. It can be seen that actually several different benchmarks representing the application domain required to analyze the platform certain architectural Judiciously using benchmarking Popular benchmarking approaches, such well because their intended architectural platform homogeneous. In other words, these other related approaches the functional do not emphasize system-level aspects architectural platform. As a result, these approaches do not work well widely heterogeneous Benchmarking efforts related to system-level architectures be found embedded systems network processing: CommBench, EEMBC, MiBench, Intel Corp. In general, these approaches do not emphasize the system-level interfaces Since the of, for network processor is heavily dependent system-level interfaces, such control interfaces, it is to account such differences corresponding benchmarking methodology. For example, the Freescale C-portTM (which contains unit) to Intel IXP1200 (which does not contain unit)? Without special consideration these environmental issues, comparisons between the C-portTM C-5 and the IXP1200 cannot be drawn. determined earlier, judiciously using benchmarking also requires a executable description, performance measure- benchmarks. In order to we will have a closer the more benchmarking efforts. CommBench [245] specifies network processor benchmarks, "header-processing applications" and CommBench motivates their provides no methodology specification or consideration system-level interfaces. In addition, CommBench currently targets general-purpose uniprocessors. is unclear their benchmark be applied heterogeneous platforms. Embedded Microprocessor Benchmark Consortium (EEMBC [61]) de- a set consumer, networking, telecommunication domains. Due to this wide the benchmark suits application domain necessarily representative. In the networking domain, for instance, defines only three simple benchmarks (Patricia route lookup, Dijkstra's between queues) level, function-level, system-level. Hardware-level benchmarks test functionality specific to (e.g. memory latencies). IPv4 forwarding Intel IXP1200. Intel's exe- cutable description performance measurement the IXP benchmark comparability benchmarking reference platform. model wraps black-box functionality network processor, co-processors, a set control interfaces. these interfaces are customers can reference platform to their particular argued earlier, these interfaces appropriately specified because are critical for does not indicate whether benchmark specification. and John [141] address the lack control plane benchmarks a set three classes: Traffic media processing, benchmarks also data plane to CommBench, EEMBC, this purpose. purpose. recognizes the needs and requirements of network pro- cessor benchmarking. Although Nemirovsky provides insight into quantifying network processor performance, he does not provide precise information about viable benchmarking approach. [55] present theoretical network processor architectures based programmable network workload analysis. does not current target systems for network processors. their model does not environment concerns. In Table 2.2 we compare existing approaches platform benchmarking, network processors. this domain, environment specifi- cation includes the specification external network traffic. column distinguishes between micro-kernel benchmarks that most compute-intense part more complex function bench- this table shows, none the existing approaches meet all the requirements there are a