Electron E1: Efficient Dataflow Architecture

There’s a growing need for CPUs that can live life on the edge. That is, computing for a long time embedded in hard to get to places and surviving on battery power or energy they can scrounge from the environment. Frustrated with inherent inefficiencies in the architecture of ultralow-power microprocessors, the founders of startup Efficient Computer decided to reinvent the general-purpose processor from the ground up for energy efficiency.

“We’re doing something that has the capability of a CPU but is one or two orders of magnitude more efficient,” says co-founder Brandon Lucia.

The result, the Electron E1 and its accompanying compiler, is now heading to developers and early partners. According to Lucia, the C-programmable processor is delivering between 10 and 100-fold better efficiency than commercial ultralow-power CPUs on typical embedded systems tasks, like performing a fast Fourier transform on sensor data or doing convolutions for machine learning.

The key innovation was to invent an architecture that can lay out any program’s instructions spatially on a chip rather than delivering them sequentially from memory as is done now in processors that follow the von Neuman architecture, says Lucia.

The von Neuman architecture has dominated computing for decades. It basically takes in an instruction from memory that tells the processor what to do with data—add it to something, flip it around, whatever—and puts the result in memory. Then it picks the next instruction, and the next, and so on.

It sounds simple, but it actually comes with a lot of overhead. “Several billion times per second, you’re pulling an instruction in from memory. That operation costs some energy,” says Lucia. Additionally, to prevent the process from stalling, modern CPUs have to guess at what instruction comes next, requiring logic called branch prediction and still more overhead.

Instead, the E1 maps out the sequence of instructions as a spatial pathway through which data moves. Fundamentally, the E1 is an array of “tiles.” Each is like a stripped-down processor core—capable of performing a set of instructions but lacking instruction fetching, branch prediction, and other overhead. The tiles are linked together in a specially designed, programmable network.

The E1’s compiler, called the effcc Compiler, reads the program, which can be written in C or other common languages and platforms, and assigns each instruction in the program to a tile. It then sets up the network so that data enters one tile, is processed, and the result becomes the input to the next tile all in the right sequence to run the program. When the sequence branches, such as when the program encounters an if/then/else, so too does the spatial pattern of tiles. “It’s like a switch track in a railroad,” says Lucia.

“There have been other dataflow style architectures,” Lucia notes. Google’s TPUs and Amazon’s Inferentia chips, for example, are designed around a dataflow architecture called a systolic array. But systolic arrays and other dataflow efforts are restricted to a subset of all the possible data paths software might demand, Lucia says.

In contrast, the E1’s network fabric allows any arbitrary path a program could ask for. Critical to that is the fabric’s ability to support so-called arbitrary recurrences, such as the “while loop.” (Think: “while the light is red, depress the brake.”) Such loops require a feedback path. “It turns out that’s harder than it seems when you first look at it,” says Lucia. The E1 fabric can carry values around the feedback paths in a way that allows for general purpose computing. “A lot of other dataflow architectures don’t do general purpose because they couldn’t crack that nut… It took us years to get it right.”

A bar chart with 3 categories and three bars in each. The left-most bar is smallest in each category. According to Efficient Computer, the E1 consumes less energy than two competing ARM processors at three common tasks: matrix multiplication for machine learning, the fast Fourier transform, and convolution for computer vision.Efficient Computer

According to University of Michigan computer science and engineering professor Todd Austin, chips like the E1 are a good example of an efficient architecture, because they minimize parts of the silicon engaged in things that are not purely computation, such as fetching instructions, temporarily stashing data, and checking if a network route is in use.

Lucia’s team “is doing a lot of clever work to allow you to get extremely low power for general purpose computing,” says Rakesh Kumar, a computer architect at University of Illinois Urbana-Champaign. The challenge for the startup will be economics, he predicts. “Ultralow power companies have had a hard time because of strong competition in low-power, very cheap microcontrollers. The key challenge is in identifying a new capability” and getting customers to pay for it.

From Your Site Articles

Electron E1: Efficient Dataflow Architecture

Tags: