The remote procedure call, or RPC, might be the single most important invention in the history of modern computing. The ability to reach out from a running program and activate another set of code to do something — get data or manipulate it in some fashion — is a powerful and pervasive concept, and has given rise to modular programming and the advent of microservices.

In a world that is so unlike the monolithic code of days gone by, latency between code chunks and elements of the system running across a cluster means everything. And reducing that latency has never been harder. But some innovative researchers at Stanford University and Purdue University have come up with a co-design network interface card and RISC-V processor that provides a fast path into the CPUs that can significantly reduce the latencies of RPCs and make them more deterministic at the same time. This research, which was presented at the recent USENIX symposium on Operating Systems Design and Implementation (OSDI ’21) conference, shows how a nanoPU hybrid approach might be the way forward for accelerating at least a certain class of RPCs — those with very small message sizes and usually a need for very small latencies — while leaving other classes of RPCs to use the normal Remote Direct Memory Access (RDMA) path that has been in use for several decades and that has pushed to its lower latency limits.