A POWER9 Roadmap

Edit: Now we are doing POWER10 roadmaps. Some links no longer work.

Originally posted March 21, 2017 on AIXchange

I want to point you to Jeff Stuecheli’s POWER9 presentation from January’s AIX Virtual User Group meeting. This information doesn’t involve specific announcements or new models, but it provides an informative look the capabilities of the chip itself. Download the presentation and/or watch the video.

Some highlights:

  • The slide on page 2 shows a roadmap with POWER9 appearing in the second half of 2017 and into 2018, with POWER10 appearing in the 2020 timeframe.
  • Page 3 covers different workloads that POWER9 has been designed for.
  • This is from page 4:

Optimized for Stronger Thread Performance and Efficiency
• Increased execution bandwidth efficiency for a range of workloads including commercial, cognitive and analytics
• Sophisticated instruction scheduling and branch prediction for unoptimized applications and interpretive languages
• Adaptive features for improved efficiency and performance especially in lower memory bandwidth systems

  • This is from page 5:

Re-factored Core Provides Improved Efficiency & Workload Alignment
• Enhanced pipeline efficiency with modular execution and intelligent pipeline control
• Increased pipeline utilization with symmetric data-type engines: Fixed, Float, 128b, SIMD
• Shared compute resource optimizes data-type interchange

  • From page 8: There will be two ways to attach memory. You can either attach it directly or you can use the buffered memory in the scale up systems.
  • Page 10 shows a matrix and what you will be able to expect from the two socket vs. multi-socket systems.
  • Page 11 shows the socket performance you can expect from POWER9 vs. POWER8.
  • Page 13 covers data capacity and throughput.
  • Page 15 covers the bandwidth improvements between CECs on the large systems, and page 17 examines the different accelerators that will be incorporated.
  • This is from page 18:

Extreme Processor/Accelerator Bandwidth and Reduced Latency
• Coherent Memory and Virtual Addressing Capability for all Accelerators
• OpenPOWER Community Enablement – Robust Accelerated Compute OptionsState of the Art I/O and Acceleration Attachment Signaling
– PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth
– 25Gb/s Common Link x 48 lanes – 300 GB/s duplex bandwidth
• Robust Accelerated Compute Options with OPEN standards
– On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2
– CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4
– NVLink 2.0 – Next generation of GPU/CPU bandwidth and integration using 25G
– Open CAPI 3.0 – High bandwidth, low latency and open interface using 25G

  • This is from page 19:

Seamless CPU/Accelerator Interaction
• Coherent memory sharing
• Enhanced virtual address translation
• Data interaction with reduced SW & HW overhead

Broader Application of Heterogeneous Compute
• Designed for efficient programming models
• Accelerate complex analytic/cognitive applications

  • Page 23 covers OpenCAPI 3.0 features. This is from page 26:

Enhanced Core and Chip Architecture for Emerging Workloads
• New Core Optimized for Emerging Algorithms to Interpret and Reason
• Bandwidth, Scale, and Capacity, to Ingest and Analyze
Processor Family with Scale-Out and Scale-Up Optimized Silicon
• Enabling a Range of Platform Optimizations – from HSDC Clusters to Enterprise Class Systems
• Extreme Virtualization Capabilities for the Cloud
Premier Acceleration Platform
• Heterogeneous Compute Options to Enable New Application Paradigms
• State of the Art I/O
• Engineered to be Open
These are things that stood out to me, but obviously you’ll get more from listening to the replay.

And if that doesn’t further whet your appetite for POWER9, here are two videos from the Open Compute Project Summit: Aaron Sullivan, Rackspace distinguished engineer, gives a video tour of a system. In another video, Google and Rackspace engineers provide even more details around the systems they are designing.