Post reply

Posted by LL

- Today at 08:32:46

You just need to read the Blender CPU and GPU database at opendata.blender.org

Posted by Jay Mann

- November 20, 2024, 23:32:00

This article makes me wonder. I saw a tool called ExoLabs that supposedly let's you use several machines as a single AI cluster. How many M4 Mac Minis would you need to reach the performance of a RTX 4090? How many Mac Minis would consume the same power as a 4090 and how much power would that have?

Posted by RobertJasiek

- November 20, 2024, 10:08:38

Quote from: Technerd on November 20, 2024, 07:30:59for l1 memory bandwidth, even a rtx 4050 mobile will trash m4 ultra. No wonder why apple sucks in f32 precision. Or in deep learning, machine learning and ai stuff.

Presumably, M4 Max does suck in F32 precision, deep learning, machine learning and AI. (So far, TechNotice showed one AI benchmark result for which M4 Max sucks.) However, I do not understand why you say that L1 memory bandwidth would be the cause. Rather I think that it depends on the specific software whether L1 memory bandwidth might be a bottleneck. Certainly, it is not for such deep learning that does not need a high memory bandwidth because it operates on many small data.

Posted by Technerd

- November 20, 2024, 07:30:59

Quote from: dada_dave on November 20, 2024, 02:45:32
Quote from: GeorgeS on November 20, 2024, 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.

The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.

Don't cry. Apple pay companies to flex the result in their favour. And for l1 memory bandwidth, even a rtx 4050 mobile will trash m4 ultra. No wonder why apple sucks in f32 precision. Or in deep learning, machine learning and ai stuff.

Posted by LL

- November 20, 2024, 05:26:52

Finally the benchmark is done by Blender. it is a not 3rd party benchmark.

Posted by LL

- November 20, 2024, 05:25:12

For 2024-10-29 Render & Cycles Meeting

Attendees
Lukas Stockner (Blender)
Nikita Sirgienko (Intel)
Patrick Mours (NVIDIA)
Sergey Sharybin (Blender)
Thomas Dinges (Blender)
Weizhen Huang (Blender)

Some caveats regarding "efficiency"

Spending 100w rendering in 30 seconds is not less efficient than 50w in 1 min.

Posted by dada_dave

- November 20, 2024, 05:18:12

Quote from: dada_dave on November 20, 2024, 02:45:32Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max).

Ah of course there is an even simpler explanation. It's true that the desktop 4070 technically has a little more TFLOPs and bandwidth than the 4080 mobile, but it has less L2 cache and even more importantly, fewer SMs with fewer ray tracing cores. The extra L2 cache likely helps with slightly lower bandwidth while the wider design with more RT cores compensates for the lower clocks. The high clocks on the 4070 compared to the 4080 mobile/M4 Max is why of course it draws so much power for its TFLOPs (clock speed is very non-linear with power draw).

Posted by LL

- November 20, 2024, 05:15:42

It is convenient that someone that uses Blender explains a bit... i use it.

Blender cooperates with all industry and Nvidia, Apple, Intel, AMD/ATI have render people in Blender developers discussions.

For both render engines Cycles and Eevee(Eevee is the Real time engine or almost),

Note the Benchmark currently only evaluates Cycles.

Blender in Nvidia cards uses CUDA/OPTIX coding it is developed with help of Nvidia
Blender in Mac uses Metal coding it is developed with help of Apple.

Blender is one of best tools to see the render power of this hardware solutions.
Nvidia has been dominating Blender and is also the most reliable option. Apple has been playing catch-up and M4 is a sizable improvement.

These were the participants in last render developers meeting.

2024-11-12 Render & Cycles Meeting
Attendees
Nikita Sirgienko (Intel)
Lukas Stockner (Blender)
Sebastian Herholz (Intel)
Sahar A. Kashi (AMD)
Sergey Sharybin (Blender)
Thomas Dinges (Blender)
Weizhen Huang (Blender)

Posted by TruthIsThere

- November 20, 2024, 03:09:52

Quote from: dada_dave on November 20, 2024, 02:45:32
Quote from: GeorgeS on November 20, 2024, 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.

The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.

Nonsense.

There are spots that has clearly showed that these M4 slop is around the same performance as the M3.

You shills are working soft tonight. Bahahaha!

And if it is true. It's even worse. The RTX 40 cards are 2-years plus old, SMH! NVIDIA is moving with Blackwell in a bit and Apple still fails to match a 4080 SUPER today, and let's not even go there with the King... the 4090!

NVIDIA has the industry, especially Tim Apple, by several years and it's only going to get even worse after Blackwell. Bahahaha!

Posted by Toortle

- November 20, 2024, 03:08:40

Quote from: TruthIsThere on November 20, 2024, 02:57:48Nah, it's just best to take someone's tool that the many of today's creators DO NOT USE THAT OFTEN (Blender) over Premiere Pro, Maya, inference workloads, ect. 😏

Read dada_dave's great comment. And rename yourself to ParanoiaIsThere.

Posted by TruthIsThere

- November 20, 2024, 02:57:48

Quote from: Toortle on November 20, 2024, 02:43:14
Quote from: TruthIsThere on November 20, 2024, 02:39:59Fake news!

Real-time numbers (not from 3rd-party benches... 😂)
Blender's data, which is publicly available on Blender's own website (listed in the source for the article), is...

🥁🥁🥁🥁

...a third-party benchmark? In other words, Blender is a third-party of Blender? Ok.

Cause, it's a bad idea for any so-called "review outlet" to have their own NATIVE content that shows its REAL-TIME testing methodology that these outlets do for a living,
😏 ... which Ai model(s) used (plus its parameters), duration of file, storage medium used, the latest software version used, on & on, right. 😏

Nah, it's just best to take someone's tool that the many of today's creators DO NOT USE THAT OFTEN (Blender) over Premiere Pro, Maya, inference workloads, ect. 😏

Posted by dada_dave

- November 20, 2024, 02:45:32

Quote from: GeorgeS on November 20, 2024, 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.

The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.

Posted by Toortle

- November 20, 2024, 02:43:14

Quote from: TruthIsThere on November 20, 2024, 02:39:59Fake news!

Real-time numbers (not from 3rd-party benches... 😂)

Blender's data, which is publicly available on Blender's own website (listed in the source for the article), is...

🥁🥁🥁🥁

...a third-party benchmark? In other words, Blender is a third-party of Blender? Ok.

Posted by TruthIsThere

- November 20, 2024, 02:39:59

Fake news!

Real-time numbers (not from 3rd-party benches... 😂) from a real-time processing encode from a NATIVE SOURCE (name of file, ect.) that shows its metadata, the software encoding setting used, storage device used, & its final timing... or it NEVER... EVER... HAPPENED! GOT IT?!?!

Posted by GeorgeS

- November 20, 2024, 02:15:37

Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?

Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.

News:

Post reply

Topic summary

Posted by LL

Posted by Jay Mann

Posted by RobertJasiek

Posted by Technerd

Posted by LL

Posted by LL

Posted by dada_dave

Posted by LL

Posted by TruthIsThere

Posted by Toortle

Posted by TruthIsThere

Posted by dada_dave

Posted by Toortle

Posted by TruthIsThere

Posted by GeorgeS