Quote from: Toortle on November 26, 2023, 11:52:21How is it unfair? You said this:
I said that x86 laptops will not be able to run big language models. M3 Pro is able to run them, but comparing its GPU _performance_ to 4080 is unfair, because it's half the GPU.
Quote from: A on November 26, 2023, 11:35:08Yep, thanks to unified RAM almost all RAM can be used as VRAM.
That MacBook in the vid has 18 GB of shared memory, 6 GB more than the RTX 4080 mobile. Why is it incapable to dedicate all 18 GB (as you claim) to VRAM and easily outperform that 4080 with is 12 GB?
[/quote]
Because Apple prioritises system responsiveness, so you can't allocate all RAM to GPU and you can't allocate all RAM bandwidth to any single module. Limits are like this:
32GB MBP - Max 24GB VRAM - 34B LLM models
64GB MBP - Max 48GB VRAM - 70B LLM models
120GB MBP - give or take 100GB VRAM - 120B or 3bit quantized 180B LLM models
It's not about _outperforming_ it's about being unable to run big models on 4080 on that laptop at all. People are building rigs of two 4090s to run 70B models. So yeah, if you can tun it and 4080 can't - it's outperforming.
Quote from: Toortle on November 26, 2023, 11:16:38Gets to be slower in all variants including M3 Max than equivalent PC and even slower than M1s unless you go with the M3 Max?
He was using an incorrect way of testing LLMs, some crappy appstore app with a small 7B LLM model that actually never loaded M Soc enough to see the difference. He should be using llama.cpp with heavy models.