Demo: Data-tiling with multi-device
This write-up demonstrates how data-tiling works when there are multiple devices. It is the write-up followed by How data-tiling works with encoding specialization.
This write-up demonstrates how data-tiling works when there are multiple devices. It is the write-up followed by How data-tiling works with encoding specialization.
Data-tiling is a technique that transforms the input data to be in a particular layout for good performance. It allows you to access data through the cache hierarchy efficiently and do the computation with very less latency.
IREE is a compiler which sees the whole graph. There are many opportunities to remove layout-transformation overheads. They may be propagated, fused into other operations, or be constant-evaluated for weights. IREE uses encodings to apply data-tiling technique, and the post explores how encodings work in data-tiling.