Expected Attention: KV Cache Compression by Estimating Attention

(arxiv.org)

19 points | by sonabinu a day ago ago

3 comments