RecurrentGemma
A family of open models using a novel recurrent architecture for faster processing of long sequences
Download RecurrentGemma
RecurrentGemma is based on Griffin, a hybrid model architecture that mixes gated linear recurrences with local sliding window attention.
Capabilities
-
Reduced memory usage
Lower memory requirements allow for the generation of longer samples on devices with limited memory, like single GPUs or CPUs.
-
Higher throughput
Performs inference at significantly higher batch sizes. Capable of generating substantially more tokens per second — especially for long sequences.
-
High performance
RecurrentGemma matches Gemma's performance while requiring less memory and achieving faster inference.