Quote from: Technerd on November 20, 2024, 07:30:59for l1 memory bandwidth, even a rtx 4050 mobile will trash m4 ultra. No wonder why apple sucks in f32 precision. Or in deep learning, machine learning and ai stuff.
Quote from: dada_dave on November 20, 2024, 02:45:32Quote from: GeorgeS on November 20, 2024, 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?
Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.
Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.
The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.
Quote from: dada_dave on November 20, 2024, 02:45:32Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max).
Quote from: dada_dave on November 20, 2024, 02:45:32Quote from: GeorgeS on November 20, 2024, 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?
Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.
Blender's engines are open source and Nvidia was the original GPU vendor to really optimize its offerings, especially with ray tracing. Apple has indeed been working hard to close that gap, but it's impossible to say if Nvidia or Apple is the "more optimized" at this point. Apple's ray tracing engines, thought to be licensed from ImgTech, are likely comparable to Nvidia's at this point, maybe better, maybe worse, hard to say.
The actual power discrepancy between the 4070 desktop and the M4 Max GPU is about 4x give or take, but there are caveats and a few things going on here. From what I can tell, blender renders tend to hit memory bandwidth pretty hard and Apple tends to match or even have better bandwidth than comparable GPUs, especially with respect to TFLOPS (as in Apple GPUs have better bandwidth to TFLOP ratios). Also, even by paper specs a 4080 mobile should be outperformed here by the desktop 4070, but isn't (this can also be a function of the typical overall system the 4080 mobile is in versus a desktop 4070). And its TDP is half that of the 4070 (though still double the likely power draw of the M4 Max). Further, Apple memory is unified so on larger renders that extra memory directly accessible by the GPU can help. Then there are node improvements on TSMC N4 vs N3E, if I remember right about 20% perf/W (so a nice boost but not fully explanatory - the M4 Max and the 4080 mobile have similar clock speeds, the 4080 has more potential FP32 units while the Max has more bandwidth). Finally, Apple has a vastly different cache structure, TBDR to improve rasterization of complex scenes, and a different core structure (Apple chips have much bigger INT32 and FP16 throughput) - in fact, even though it lacks the matrix units of the Nvidia chip (which can help with noise reduction in Blender renders) the Apple M4 Max is likely bigger than the 4070/4080 mobile in terms of raw transistor count. Factor all that in and that it has competitive performance in Blender becomes a little less surprising.
Quote from: TruthIsThere on November 20, 2024, 02:57:48Nah, it's just best to take someone's tool that the many of today's creators DO NOT USE THAT OFTEN (Blender) over Premiere Pro, Maya, inference workloads, ect. 😏
Quote from: Toortle on November 20, 2024, 02:43:14Quote from: TruthIsThere on November 20, 2024, 02:39:59Fake news!Blender's data, which is publicly available on Blender's own website (listed in the source for the article), is...
Real-time numbers (not from 3rd-party benches... 😂)
🥁🥁🥁🥁
...a third-party benchmark? In other words, Blender is a third-party of Blender? Ok.
Quote from: GeorgeS on November 20, 2024, 02:15:37Am I the only one that thinks that quite possibly the 'blender benchmark' may not be optimized for the Nvidia hardware/drivers?
Surely SOMETHING must be going on for a M4 to trade blows with a desktop chip that consumes an order of magnitude more power.
Quote from: TruthIsThere on November 20, 2024, 02:39:59Fake news!Blender's data, which is publicly available on Blender's own website (listed in the source for the article), is...
Real-time numbers (not from 3rd-party benches... 😂)