they claim running something like mixtral8x7b on their asic is about 5-10 times faster than on H100, not sure if it's quantized in their tests or not, they don't give much details really
The LPU Inference Engine from Groq is designed to be considerably faster than GPGPUs when processing LLM data. To achieve this, the LPU makes better use of sequential processing and is paired with SRAM instead of DRAM or HBM.