Edit: The more things change
Originally posted September 25, 2018 on AIXchange
In August there was an event called Hot Chips 30. A long-running conference for the semiconductor industry, Hot Chips is the place to learn about high-performance microprocessors and related topics like system memory. Here are a couple of interesting articles that came out of the conference that look at memory and where it’s headed in the near future.
This is from HPCwire:
Having launched both the scale-out and scale-up POWER9 [servers], IBM is now working on a third P9 variant with “advanced I/O,” featuring IBM’s 25 GT/s PowerAXON signaling technology with upgraded OpenCAPI and NVLink protocols, and a new open standard for buffered memory….
“The PowerAXON concept gives us a lot of flexibility,” said [IBM Power architect Jeff] Stuecheli. “One chip can be deployed to be a big SMP, it can be deployed to talk to lots of GPUs, it can talk to a mix of FPGAs and GPUs – that’s really our goal here is to build a processor that can then be customized toward these domain specific applications.”
The article concludes with this:
The roadmap shows the anticipated increase in memory bandwidth owing to the new memory system. Where the POWER9 SU chip offers 210 GB/s of memory bandwidth (and Stuecheli says it’s actually closer to 230 GB/s), the next POWER9 derivative chip, with the new memory technology, will be capable of deploying 350 GB/s per socket of bandwidth, according to Stuecheli.
“If you’re in HPC and disappointed in your bytes-per-flop ratio, that’s a pretty big improvement,” he said, adding “we’re taking what was essentially the POWER10 memory subsystem and implementing that in POWER9.” With Power10 bringing in DDR5, IBM expects to surpass 435 GB/s sustained memory bandwidth….
It’s an odd statement of direction, but maybe a visionary one, essentially saying a processor isn’t about computation per se, but rather it’s about feeding data to other computational elements.
This piece from top500.org says IBM is aiming to take memory in a new direction:
The memory buffering adds about 10ns of latency to memory accesses compared to a direct DDR hookup, but the tradeoff for more bandwidth and capacity is worth it for these extra-fat servers. And although the Centaur buffered memory implementation still uses DDR memory chips as the storage media, this no longer really needs to be the case since the DDR smarts have moved off the chip.
IBM plans to generalize this memory interface, which will be known as OpenCAPI memory, in their next version of the POWER9 processor that is scheduled to be launched in 2019. As far as we can tell, these upcoming POWER9 chips will be suitable for two-socket HPC servers, as well as mainstream systems. IBM is projecting that its next POWER9 chip will support over 350 GB/sec of memory bandwidth per socket, which is more than twice the speed of today’s fastest chips for two-socket servers. The company also intends to reduce the latency penalty to around 5ns in its first go-around.
Perhaps the bigger news here is that OpenCAPI memory will be proposed as an open standard for the entire industry. The use of the OpenCAPI brand is intentional, since IBM wants to do for memory devices, what the original OpenCAPI was designed to do for I/O devices, namely level the playing field. In this case, the idea is to enable any processor to talk to any type of memory via conventional SerDes links. As a result, CPUs, GPUs, or FPGAs would no longer need to be locked into DDR, GDDR, or any other type of memory technology. So, for example, a chip could use the interface to connect to traditional DDR-type DIMMs, storage-class memory based on NAND or 3D XPoint, or some other type of specialized memory.
Many times we are focused on what we can buy and deploy right now. But if you want to see where things are headed, read these and other articles from the conference at the Hot Chips site.