Apple M4 Max defeats RTX 4070 Desktop GPU in Blender benchmark, trades blows with RTX 4080 Laptop

Redaktion

Apple's M4 Max is quite the beast of an ARM SoC. Packing up to 16 CPU cores and 40 GPU cores, the M4 Max places the MacBook Pro among the fastest notebooks on the market. According to a recently uncovered Blender benchmark, it appears the mobile SoC challenges high-end desktop GPUs as well.

https://www.notebookcheck.net/Apple-M4-Max-defeats-RTX-4070-Desktop-GPU-in-Blender-benchmark-trades-blows-with-RTX-4080-Laptop.920575.0.html

GeorgeS

Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

TruthIsThere

Fake news!

Real-time numbers (not from 3rd-party benches... 😂) from a real-time processing encode from a NATIVE SOURCE (name of file, ect.) that shows its metadata, the software encoding setting used, storage device used, & its final timing... or it NEVER... EVER... HAPPENED! GOT IT?!?!

Toortle

Quote from: TruthIsThere on Yesterday at 02:39:59Fake news!

Real-time numbers (not from 3rd-party benches... 😂)

Blender's data, which is publicly available on Blender's own website (listed in the source for the article), is...

🥁🥁🥁🥁

...a third-party benchmark? In other words, Blender is a third-party of Blender? Ok.

dada_dave

Quote from: GeorgeS on Yesterday at 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.

The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.

TruthIsThere

Quote from: Toortle on Yesterday at 02:43:14
Quote from: TruthIsThere on Yesterday at 02:39:59Fake news!

Real-time numbers (not from 3rd-party benches... 😂)
Blender's data, which is publicly available on Blender's own website (listed in the source for the article), is...

🥁🥁🥁🥁

...a third-party benchmark? In other words, Blender is a third-party of Blender? Ok.

Cause, it's a bad idea for any so-called "review outlet" to have their own NATIVE content that shows its REAL-TIME testing methodology that these outlets do for a living,
😏 ... which Ai model(s) used (plus its parameters), duration of file, storage medium used, the latest software version used, on & on, right. 😏

Nah, it's just best to take someone's tool that the many of today's creators DO NOT USE THAT OFTEN (Blender) over Premiere Pro, Maya, inference workloads, ect. 😏

Toortle

Quote from: TruthIsThere on Yesterday at 02:57:48Nah, it's just best to take someone's tool that the many of today's creators DO NOT USE THAT OFTEN (Blender) over Premiere Pro, Maya, inference workloads, ect. 😏

Read dada_dave's great comment. And rename yourself to ParanoiaIsThere.

TruthIsThere

Quote from: dada_dave on Yesterday at 02:45:32
Quote from: GeorgeS on Yesterday at 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.

The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.

Nonsense.

There are spots that has clearly showed that these M4 slop is around the same performance as the M3.

You shills are working soft tonight. Bahahaha!

And if it is true. It's even worse. The RTX 40 cards are 2-years plus old, SMH! NVIDIA is moving with Blackwell in a bit and Apple still fails to match a 4080 SUPER today, and let's not even go there with the King... the 4090!

NVIDIA has the industry, especially Tim Apple, by several years and it's only going to get even worse after Blackwell. Bahahaha!

LL

It is convenient that someone that uses Blender explains a bit... i use it.

Blender cooperates with all industry and Nvidia, Apple, Intel, AMD/ATI have render people in Blender developers discussions.

For both render engines Cycles and Eevee(Eevee is the Real time engine or almost),

Note the Benchmark currently only evaluates Cycles.

Blender in Nvidia cards uses CUDA/OPTIX coding it is developed with help of Nvidia
Blender in Mac uses Metal coding it is developed with help of Apple.

Blender is one of best tools to see the render power of this hardware solutions.
Nvidia has been dominating Blender and is also the most reliable option. Apple has been playing catch-up and M4 is a sizable improvement.

These were the participants in last render developers meeting.

2024-11-12 Render & Cycles Meeting
Attendees
Nikita Sirgienko (Intel)
Lukas Stockner (Blender)
Sebastian Herholz (Intel)
Sahar A. Kashi (AMD)
Sergey Sharybin (Blender)
Thomas Dinges (Blender)
Weizhen Huang (Blender)

dada_dave

Quote from: dada_dave on Yesterday at 02:45:32Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max).

Ah of course there is an even simpler explanation. It's true that the desktop 4070 technically has a little more TFLOPs and bandwidth than the 4080 mobile, but it has less L2 cache and even more importantly, fewer SMs with fewer ray tracing cores. The extra L2 cache likely helps with slightly lower bandwidth while the wider design with more RT cores compensates for the lower clocks. The high clocks on the 4070 compared to the 4080 mobile/M4 Max is why of course it draws so much power for its TFLOPs (clock speed is very non-linear with power draw).

LL

For 2024-10-29 Render & Cycles Meeting

Attendees
Lukas Stockner (Blender)
Nikita Sirgienko (Intel)
Patrick Mours (NVIDIA)
Sergey Sharybin (Blender)
Thomas Dinges (Blender)
Weizhen Huang (Blender)

Some caveats regarding "efficiency"

Spending 100w rendering in 30 seconds is not less efficient than 50w in 1 min.

LL

Finally the benchmark is done by Blender. it is a not 3rd party benchmark.

Technerd

Quote from: dada_dave on Yesterday at 02:45:32
Quote from: GeorgeS on Yesterday at 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.

The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.

Don't cry. Apple pay companies to flex the result in their favour. And for l1 memory bandwidth, even a rtx 4050 mobile will trash m4 ultra. No wonder why apple sucks in f32 precision. Or in deep learning, machine learning and ai stuff.

RobertJasiek

Quote from: Technerd on Yesterday at 07:30:59for l1 memory bandwidth, even a rtx 4050 mobile will trash m4 ultra. No wonder why apple sucks in f32 precision. Or in deep learning, machine learning and ai stuff.

Presumably, M4 Max does suck in F32 precision, deep learning, machine learning and AI. (So far, TechNotice showed one AI benchmark result for which M4 Max sucks.) However, I do not understand why you say that L1 memory bandwidth would be the cause. Rather I think that it depends on the specific software whether L1 memory bandwidth might be a bottleneck. Certainly, it is not for such deep learning that does not need a high memory bandwidth because it operates on many small data.

Jay Mann

This article makes me wonder. I saw a tool called ExoLabs that supposedly let's you use several machines as a single AI cluster. How many M4 Mac Minis would you need to reach the performance of a RTX 4090? How many Mac Minis would consume the same power as a 4090 and how much power would that have?

News:

Apple M4 Max defeats RTX 4070 Desktop GPU in Blender benchmark, trades blows with RTX 4080 Laptop

Redaktion

GeorgeS

TruthIsThere

Toortle

dada_dave

TruthIsThere

Toortle

TruthIsThere

LL

dada_dave

LL

LL

Technerd

RobertJasiek

Jay Mann

Quick Reply