OpenCAPI TM Forum SC16 November 16 2016 1 Open Coherent Accelerator Processor Interface Topics OpenCAPI Overview Brian Allison IBM STSM CAPI Engagement and Incubation ID: 589043
Download Presentation The PPT/PDF document "Industry Collaboration and Innovation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Industry Collaboration and Innovation Slide2
OpenCAPI
TM ForumSC16
November 16, 2016
1
Open Coherent Accelerator Processor InterfaceSlide3
Topics
OpenCAPI Overview - Brian Allison - IBM, STSM, CAPI Engagement and Incubation LeadIndustry landscape and approaches to address challengesTechnical Overview
What is OpenCAPI, today and looking forwardBenefits of OpenCAPI
OpenCAPI Consortium - Myron Slota – IBM, Executive Program Manager, OpenCAPI President
What is the OpenCAPI ConsortiumWho can join and participateOpenCAPI
Q&A Session - Scott Graham – Micron, General Manager and OpenCAPI Chairperson Steve
Fields – IBM Fellow, Chief Engineer of Power Systems2Slide4
OpenCAPI Overview
Brian Allison3Slide5
Industry Landscape
Two major technology trends will heavily impact the industryHardware acceleration will become commonplace as microprocessor technology and design continues to deliver far less than the historical rate of cost/performance improvement per generation
New advanced memory technologies will change the economics of computing
Existing system interfaces are insufficient to address these disruptive forcesTraditional I/O architecture results in very high
CPU overhead when applications communicate with I/O or Accelerator devices at the necessary performance levels
Systems must be able to integrate multiple memory technologies with different access methods and performance attributesThese challenges must be addressed in an open architecture allowing full industry participation
Establish sufficient volume base to drive cost down
Support broad ecosystem of software and attached devices4Slide6
5
OpenCAPI Approach
What is OpenCAPI?
OpenCAPI is an
Open
Interface Architecture
that
allows any microprocessor to attach to
Coherent user-level accelerators and I/O devices
Advanced memories accessible via read/write or user-level DMA semantics
Agnostic to processor architecture
Key Attributes of OpenCAPI
High-bandwidth, low latency interface optimized to enable streamlined implementation of attached devices
25Gbit/sec signaling and protocol built to enable very low latency interface on CPU and attached device
Complexities of coherence and virtual addressing implemented on host microprocessor to simplify attached devices and facilitate interoperability across multiple CPU architectures
Attached devices operate natively within an application’s user space and coherently with processors
Allows attached device to fully participate in application without kernel involvement/overhead
Supports a wide range of use cases and access semantics
Hardware accelerators
High-performance I/O devices
Advanced memories
100% Open Consortium / All company participants welcome / All ISA participants welcomeSlide7
Addressing the Industry Trend
Strong industry growth in use of various accelerators Introduction of device coherency requirements Storage and Memory, Compute, NetworkVarious form factors including GPUs and FPGAsBottom-up Design with Radical New Capabilities
is requiredHyperscale datacenters and HPC are driving need for much higher network bandwidthDeep learning and HPC require more bandwidth between accelerators and memoryNew storage technologies are increasing the need for bandwidth and CPU efficiency
Increased industry dependence on hardware acceleration for performanceOpenCAPI addresses these needs by providing higher bandwidth and lower latency
6Slide8
Virtual Addressing
An OpenCAPI device operates in the virtual address spaces of the applications that it supportsEliminates kernel and device driver software overheadImproves accelerator performanceAllows device to operate directly on application memory without kernel-level data copies or pinned pagesSimplifies programming effort to integrate accelerators into applicationsThe Virtual-to-Physical Address Translation occurs in the host CPUReduces design complexity of OpenCAPI-attached devices
Makes it easier to ensure interoperability between an OpenCAPI device and multiple CPU architecturesSince the OpenCAPI device never has access to a physical address, this eliminates the possibility of a defective or malicious device accessing memory locations belonging to the kernel or other applications that it is not authorized to access
7Slide9
8
OpenCAPI
3.0 Features
Base Accelerator Support
Accelerator Reads
with no intent to cache, DMA
write using
Program
Addresses
The accelerator is working in the same address domain as the host application
Pointer chasing, link lists are all now possible without
Device Driver involvement
Address
translation on host (processor) with error response back to
the accelerator
Very efficient translation latency mechanism using host processor Address Translation Cache (ATC) and MMU
Non-posted writes only
Ability for Partial Read/Write DMAs
Write with byte
enables
Translate touch to warm up
address translation caches
Allows accelerator to reduce translation latency when using a new page
W
ake Up host thread
Very efficient low latency mechanism in lieu of either interrupts or host processor polling mechanism of memory
Atomic Memory Operations
(AMO)
to
Host Processor Memory
Accelerator can now perform atomic
o
perations in the same coherent domain just like any other host processor threadSlide10
9
OpenCAPI
3.0
Features
(cont.)
Base Accelerator Support
MMIO slave
Accelerators have BAR space that provide MMIO register capability
Configuration space
facility
slave
Efficient discovery and enumeration of accelerators
OpenCAPI attached Memory
High
bandwidth and low latency memory home agent capability
Load/Store
model access to OpenCAPI attached
memory
Host Application can access memory attached to OpenCAPI endpoint as part of coherent domain
Data resides very close to the consumer with very low latency
Atomic Memory Operations (AMO) support toward OpenCAPI attached memorySlide11
10
OpenCAPI
4.0 Features (future)
Full Feature of the Architecture Specification
Added Accelerator Caching/Coherence support
Enabling application to have a Program Address Cache
on
accelerator chip
with
Host Address proxy
directory and translation on
host processor chip
Caching on accelerators provides latency advantage for frequently addressed/modified data
Added
Additional Link Widths
of x4, x8, x16, x32 support (
OpenCAPI
3.0 supports x8 only
)
Added Pinned
translations in host processor Address Translation Cache (ATC
)
Improved latency and ordered write performance from accelerator to host memory
Enhanced wake host thread with rollover to interrupt
Low latency communication method between accelerator and host application.
Avoid inefficient interrupts and host processor polling on memory
Enhanced
OpenCAPI attached memory
Host/Accelerator sharing of OpenCAPI attached memory
Atomic memory operations executed in attached memory controller on acceleratorSlide12
11
OpenCAPI protocol stack
TL
DL
*X remote chip (FPGA or ASIC component
)
DL/TL not symmetrical with
Accelerator
DLX/TLX
Accelerator
protocol layer optional
Advanced
accelerators
can
i
/f directly to TLXSlide13
12
FPGA Toolbox
Custom application and accelerator development
Operating System Enablement
Hardware to enable coherent acceleration
Architecture
Specs
TLx and DLx Reference Designs
Reference
Driver that partner
can use as starting point
or develop
their
own
OpenCAPI
Simulation Environment (OCSE
) to support TLx and DLx Ref DesignsSlide14
FPGA TLX and DLX Reference Designs
TLX and DLX will be provided as reference designs to OpenCAPI consortium membersTLX and DLX are not symmetric with TL and DL that are on the host processor
64B flit flow at 400MHzTLX presents Accelerator interface
Very efficient thin layer in a packet based formatManages CreditsAccelerator master <command, address, data>
Accelerator memory slave read and write w/Atomics
Accelerator maintenanceMMIO slave
Configuration, initializationError handling
14
13Slide15
Use Cases
14Slide16
OpenCAPI Consortium
Myron Slota15Slide17
OpenCAPI Consortium
Mission and incorporationWhat is the OpenCAPI ConsortiumWho can/should join and participateMembership levels
16Slide18
OpenCAPI Consortium overview
Provide a forum to give the industry ability to innovate the next generation bus protocolImprove data center economics with accelerators and advanced memory technologiesEnablement collateral including reference designs, s
imulation environment and specificationsDriving hardware and software innovation, choice and efficiency in data center architecture
Building an ecosystem to provide customers with the flexibility to build servers and data centers best suited for their computational demands3
The mission of the OpenCAPI Consortium is to create
an open coherent high performance bus interface based on a new bus standard called Open Coherent Accelerator Processor Interface (OpenCAPI) and grow the ecosystem that utilizes this interface
Incorporated September 13, 2016
Announced October 14, 2016
17Slide19
Enabling the data center industry
to move forwardCompanies realizing need for accelerated computing to meet demand and improve performanceIntroduction of device coherency requirements
Emergence of complex Storage and Memory solutionsGrowing demand for computational and network performanceVarious form factors
(e.g., GPUs, FPGAs, ASICs)Bottom-up Design approach needed for higher bandwidth and lower latency
Hyperscale datacenters and HPC are driving need for much higher network bandwidthDeep learning and HPC require more bandwidth between accelerators and memoryNew storage technologies are increasing the need for bandwidth and CPU efficiencyIncreased industry dependence on hardware acceleration for performance
18Slide20
OpenCAPI Consortium Overview
Open forum to manage the OpenCAPI specification and ecosystemFounded by AMD, Google, IBM, Mellanox, and MicronIncorporated in September 13 and announced in October 14, 2016Consortium board and officers establishedInitial deliverables
OpenCAPI 3.0 specificationEnablement including reference designs, documentation, SIM environment, etc.
Workgroups will be formed to evolve architecture and ensure complianceTechnical Steering Committee Formed (TSC chairperson and workgroup process being defined)Proposed workgroups include Specification, Compliance, Enablement, Software and more
Established website at www.opencapi.orgGoverning documents (Bylaws, IPR Policy, Membership Agreement)
Specification currently posted is starting point that was contributed by IBM
19Slide21
20
Who should join the OpenCAPI Consortium
Microprocessor vendors looking for
better
way to attach high performance devices
Hardware Accelerator developers looking for a better way to attach to systems
Software developers looking for an easier and more efficient way to integrate hardware accelerators into applications
System Vendors looking for more ability to innovate and differentiate
End users looking for choices to exploit new technologies and improve datacenter
cost/performanceSlide22
Cross Industry Collaboration and Innovation
22
OpenCAPI Protocol
SOC
Accelerator Solutions
Systems and Software
Products and Services
Welcoming new members in all areas of the ecosystem
ISVs
Research &
Academic
21Slide23
Membership Options
OpenCAPI Consortium is a 501c6 Not-for-profit entity with a Board of Directors and a Technical Steering Committee.Board Membership today consists of the Founders (AMD, Google, IBM, Mellanox, Micron) and has voting authority
Contributor Level and Academic Level can participate in workgroups, access to early documents and enablement, License
Observing Level has access to final specification and LicenseThe Bylaws detail additional governance by the Board including maximum seats, terms, etc.
Technical Steering Committee and Workgroups to be formed
23
Membership Level
Annual Fee
$ USD
Technical Steering Committee
Voting position
Board
25k
One seat per member not otherwise represented
Includes board position
Includes TSC position
Board Level voting authority
Contributor
15k
May be on TSC if
Work group lead
Voting at Workgroup Level
Observing
5k
_______
________
Associate and Academic
0
May be on TSC if
Work group lead
Voting at Workgroup Level
Anyone may participate in
OpenCAPI. Membership designed
for those that are investing to grow and enhance the
OpenCAPI
community and its proliferation within the industry.
22Slide24
Membership Entitlements
Board levelVote on new Board MembersNominate and/or run for election as officerDraft and Final Specifications and enablementLicense for Product developmentProminent listing in appropriate materialsWorkgroup
participation and votingTSC participationObserving levelFinal Specifications and enablement
License for Product developmentContributor level
Submit proposalsDraft and Final Specifications and enablementLicense for Product developmentWorkgroup participation and votingTSC participation
Associate and Academic levelFinal Specifications and enablementWorkgroup participation and voting
23Slide25
OpenCAPI Q&A Session
Brian Allison - IBM, STSM, CAPI Engagement and Incubation LeadMyron Slota – IBM, Executive Program Manager, OpenCAPI PresidentScott Graham – Micron, General Manager, OpenCAPI ChairpersonSteve Fields – IBM Fellow, Chief Engineer of Power Systems
24Slide26
Thank-you!