Cache Memory Principles

Designing Memory for AI Agents: Inside Linkedin’s Cognitive Memory Agent

LinkedIn introduces Cognitive Memory Agent (CMA), generative AI infrastructure layer enabling stateful, context-aware systems ...

InfoQ

Google’s TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware

Google Research unveiled TurboQuant, a novel quantization algorithm that compresses large language models’ Key-Value caches ...

Communications of the ACMOpinion

The Golden Rule of Big Memory: Persistence Is Not Harmful

Large-scale applications, such as generative AI, recommendation systems, big data, and HPC systems, require large-capacity ...

Wall Street Journal

The 2,000-Year-Old Cement Battery That Could Reduce Our Reliance on Fossil Fuel

Adding water to Cache Energy’s cement pellets causes a chemical reaction that releases heat. The reaction is reversible, allowing the system to store heat as well. CACHE ENERGY More than two millennia ...

Ars Technica

AMD’s Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip

For about four years now, AMD has offered special “X3D” variants of its high-end desktop processors with an extra 64MB of L3 cache attached, an addition that disproportionately benefits games. AMD ...

TechSpot

Google's TurboQuant compression tech cuts LLM memory use by 6x with no accuracy loss

The big picture: Google has developed three AI compression algorithms – TurboQuant, PolarQuant, and Quantized Johnson-Lindenstrauss – designed to significantly reduce the memory footprint of large ...

TechCrunch

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

If Google’s AI researchers had a sense of humor, they would have called TurboQuant, the new, ultra-efficient AI memory compression algorithm announced Tuesday, “Pied Piper” — or, at least that’s what ...

Ars Technica

Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

Even if you don’t know much about the inner workings of generative AI models, you probably know they need a lot of memory. Hence, it is currently almost impossible to buy a measly stick of RAM without ...

marktechpost

Google Introduces TurboQuant: A New Compression Algorithm that Reduces LLM Key-Value Cache Memory by 6x and Delivers Up to 8x Speedup, All with Zero Accuracy Loss

The scaling of Large Language Models (LLMs) is increasingly constrained by memory communication overhead between High-Bandwidth Memory (HBM) and SRAM. Specifically, the Key-Value (KV) cache size ...

CNBC

Micron rides memory price spike into earnings with stock up 62%, drubbing its tech peers

Micron is expected to report 148% revenue growth for the February quarter as average selling prices surge 32% quarter over quarter. The memory provider's stock has soared thanks to a shortage brought ...

VentureBeat

Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia researchers have introduced a new technique that dramatically reduces how much memory large language models need to track conversation history — by as much as 20x — without modifying the model ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results