Link to the paper: arXiv
In a nutshell: a methodology to decompose sparse tensors and reorder them as non-sparse representations ("cascade of Einsums") for more efficient hardware acceleration.
The process is based on fibertree decomposition and further improves existing accelerators such as Gamma or OuterSPACE
Why? ML models, in particular deep NNs, move towards more sparse representations (due to e.g. pruned or quantized networks, the emergence of graphs models, etc.).
With sparse models, most existing hardware accelerators have a harder time, i.e., HW accelerators prefer "full" matrices.
How? The work abstracts sparse tensors into fiber trees, which are then used to modify the order of computations (the tensor ranks)
to build a tree of dependent Einsum operations (which are not sparse) to be accelerated on existing hardware platforms.
In more details. Lately, I have been interested in sparse Neural Networks (NN) hardware acceleration. NNs are becoming sparse for various reasons,
e.g., pruning unuseful weights, quantizing the model's parameters, and removing (parts of) NN layers to compress the model.
All aim to increase energy efficiency and embed these NNs on the edge.
NN computation can be represented by successive tensor operations involving the inputs, weights, and activations of each neuron and each layer
(See e.g., HERE ).
So AI acceleration can essentially mean acceleration of tensors.
Yet, typical AI accelerators are efficient when processing NN layers with a high degree of parallelism, i.e.,
full tensors where "everyone connects to everyone". Spartity introduces zeros and cuts this parallelism and, with it, the accelerator's efficiency.
How do we make sparse tensors non-sparse again without recreating all computations we tried to avoid in the first place?
An elegant solution in this TeAAL paper is to decompose the successive (sparse) tensors induced by a given NN computation and
reorder them into a tree of non-sparse operations in a different order than initially laid out. So, we get an intermediate
representation of non-sparse operations, here Einsum operations, but arranged in a tree-fashion depending on their execution order.
The decomposition uses fiber trees, which facilitate reordering the ranks of the tensors and execute them efficiently on hardware.
Einsums explained: HERE
OuterSPACE accelerator: HERE