Post reply

Posted by CapNemo72

- February 26, 2025, 16:53:50

Can you also add Ollama as AI benchmark. Have DeepSeek R1 1.5B, 8B and 32B models tested. Use verbose option to see the speed in tokens per second.

Usual prompt is "Write me a 1000 word story"
You can repeat 5 times to get some average.

Posted by rk

- February 26, 2025, 10:28:28

Quote from: Papajon on February 19, 2025, 11:34:42If it's for AI users, why is it a laptop chip ?

There is Z13 tablet and HP laptop with Strix Halo. Then on desktop we have HP, Framework and one of the Chinese nettop makers. Right now it's 3:2 for the desktop with this chip.

Both HP offerings are for prosumers (with inflated pricing), Z13 is a tablet at extra the cost. Framework desktop is bit cheaper but still above classic desktop (offering better specs/value).

Posted by Aldf

- February 25, 2025, 08:04:11

Hey, the efficiency per watt for the cb 2024 test does not add up, after you have added the update.

Posted by CJ

- February 24, 2025, 22:36:09

Quote from: Papajon on February 19, 2025, 11:34:42
Quote from: Donkey545 on February 18, 2025, 17:01:45While the standard suite of benchmarks is appreciated in this review, these performance metrics are largely irrelevant to the target audience of a product like this. The AI series chips, with high bandwidth, uniform memory architecture, are targeted at LLM inference users. A valuable benchmark for these users could be the running of various LLMs in verbose mode to check the tokens/s. The advantage of this product is that it can fit massive models in memory compared to even the highest end dGPUs. A great comparison for this product segment would be to compare the performance of Llama3.3 70b using ollama on CPU, and GPU (using ROCm) to the M4 series hardware from apple.

If it's for AI users, why is it a laptop chip ?

That question makes about as much sense as asking in an RTX4090 mobile review "If it's for gamers, why is it a laptop chip?". Sometimes, you might want to run a LMM offline. Sure, you could SSH into a server at home over a VPN or whatever, but maybe you just want a laptop that does everything and don't want a desktop. Also, a 128GB strix halo laptop might cost $3000, an 80GB H100 costs tens of thousands of dollars and requires a server to put it in.

Posted by GERMAN_MEN

- February 24, 2025, 19:18:01

The Asus ROG Flow Z13 GZ302 is a true marvel that brings together what every company needs: mobility and performance.
My next laptop or tablet, and I don't care, but the most important thing and what 99% of users are looking for is an APU with the perfect balance between CPU + iGPU and this is what the AMD Ryzen AI Max + 395 Ryzen AI Max 390 Ryzen AI Max 385 Ryzen 9 AI Max 380 processors have.

Posted by davidm

- February 24, 2025, 16:01:06

People want to use "AI" on their notebooks too, for always running local assistance.

The review plays a shell game, they should just talk about memory bandwidth rather than changing things around in the headline, screenshots, text, etc. As others have said, memory bandwidth is the main thing that matters for LLMs, the NPU is not really used for LLMs, it's GPU cores and memory bandwidth. Mixture of Expert models may become more popular, they can run ok on slower memory, but x86 still needs something at least as fast as Apple Max chips.

Posted by Udin

- February 24, 2025, 12:45:09

have you set v-ram to 8gb minimum

Posted by LL

- February 20, 2025, 17:27:27

Notebookcheck should test it with Unreal Engine where GPU memory is crucial and compare it with Nvidia options.

It would also be a speed test of GPU memory vs RAM memory for GPU use in a practical situation. Does it matter?

Posted by Papajon

- February 19, 2025, 11:34:42

Quote from: Donkey545 on February 18, 2025, 17:01:45While the standard suite of benchmarks is appreciated in this review, these performance metrics are largely irrelevant to the target audience of a product like this. The AI series chips, with high bandwidth, uniform memory architecture, are targeted at LLM inference users. A valuable benchmark for these users could be the running of various LLMs in verbose mode to check the tokens/s. The advantage of this product is that it can fit massive models in memory compared to even the highest end dGPUs. A great comparison for this product segment would be to compare the performance of Llama3.3 70b using ollama on CPU, and GPU (using ROCm) to the M4 series hardware from apple.

If it's for AI users, why is it a laptop chip ?

Posted by Papajon

- February 19, 2025, 11:33:08

"Gets destroyed by last years QC CPU in MC efficiency and SC performance."

Seems like a great chip in portable devices.

Posted by A

- February 19, 2025, 05:24:00

Aren't LLMs broken under windows and AMD due to crappy MS DirectML? So it may not get that great LLM results unless you load up Linux. That is assuming that amdgpu supports it. Then there is the fact that a lot of the software and libraries out there aren't going to be using the NPU to assist

Posted by Alpha_Lyrae

- February 19, 2025, 05:15:15

Quote from: Yeshy on February 18, 2025, 23:15:26For the "Power Consumption / Cyberpunk 2077 ultra Efficiency", do you / could you do a version that combines the CPU and GPU power?

If lets say 4070 60W = 8060S 60W, that's great, but it's ignoring that the 4070 has a CPU to power alongside it

I don't know what would be a fair way to test, besides making curves; comparing 100W 4070 is "unfair" since you get diminishing returns as you approach 100W on it

Maybe just 1080p60 Medium 60fps limit? Or just test different TDP limits, but it would be arbitrary

Yeah, total system power should be used when comparing between APUs/SoCs and CPU+dGPUs. You'll find that power consumption is much higher in discrete hardware simply by design: having two chips, CPU and GPU, and two sets of memory, LPDDR5/DDR5 and GDDR6, and more VRM MOSFETs to provide power.

Posted by Callum

- February 19, 2025, 04:02:43

I would love to see a CFD test run on all of these, or at least a meshing process. This is a great general test of many things and overall performance of motherboards, CPUs, gpus and memory. Along with added LLM testing...

Posted by Yeshy

- February 18, 2025, 23:15:26

For the "Power Consumption / Cyberpunk 2077 ultra Efficiency", do you / could you do a version that combines the CPU and GPU power?

If lets say 4070 60W = 8060S 60W, that's great, but it's ignoring that the 4070 has a CPU to power alongside it

I don't know what would be a fair way to test, besides making curves; comparing 100W 4070 is "unfair" since you get diminishing returns as you approach 100W on it

Maybe just 1080p60 Medium 60fps limit? Or just test different TDP limits, but it would be arbitrary

Posted by Kravis

- February 18, 2025, 23:08:21

I know, right? Give us tok/sec for various llms with various level of quantization. People are not scooping 4090s and 5090s these day to play games with highest fps, they are buying them to run local AI.

News:

Post reply

Topic summary

Posted by CapNemo72

Posted by rk

Posted by Aldf

Posted by CJ

Posted by GERMAN_MEN

Posted by davidm

Posted by Udin

Posted by LL

Posted by Papajon

Posted by Papajon

Posted by A

Posted by Alpha_Lyrae

Posted by Callum

Posted by Yeshy

Posted by Kravis