Post reply

Name
Email
Subject
Message icon

Other options

Return to this topic
Don't use smileys

Verification:

Please leave this box empty:

Shortcuts: ALT+S post or ALT+P preview

Topic summary

Posted by RobertJasiek

- Yesterday at 19:55:10

Quote from: John Doe on Yesterday at 16:42:174*RTX5090 don't work in a "distributed" 128GB VRAM scenario [...]
the multiple 5090 set-up is unfeasible due to a lack of fast interconnect.

It depends on the software whether distribution to several GPUs works. Some softwares scale almost linearly - other softwares fail completely.

Posted by John Doe

- Yesterday at 16:42:17

Quote from: RobertJasiek on March 12, 2025, 19:33:47
Quote from: MigitMD on March 12, 2025, 17:59:01Unified is for both system and video.

Subject to limitations (ca. 25% needs to stay for the system) and assignments (one can choose how much to use for either purpose).

Quotean SoC that has 4 times the amount of transistors on a more advanced node, will do better.

While your analysis has some value, your conclusion is wrong because the "software stack" (drivers, libraries and softwares) and the requirement of every particular software for RAM or VRAM (or unified memory assigned as eiher) also have a very great impact. Hardware expense is another aspect (M3 Ultra 512GB unified memory is all fine and well until you realise it is €10,000 and 4*RTX5090 might be an alternative if distributed 128GB VRAM should be enough).

If software is available / optimised for only one system, it will not / only badly work on other systems. If VRAM limit is essential for a software, it will only run on systems with enough VRAM (or assigned unified memory). Otherwise, software might be designed for both systems. While big LLMs might prefer large unified memory, most other AIs prefer Nvidia GPUs and libraries. There have been several examples for which choosing the right system means dozens of times greater speed. Also in the Nvidia - AMD - comparison.

Never just believe hardware numbers but always inform yourself on which system your preferred software will run at all or faster before buying hardware!

All good and well but with one exception: 4*RTX5090 don't work in a "distributed" 128GB VRAM scenario because Nvidia in their ever growing greed, killed the nvlink ever since the 30 series exclusively. Without direct hardware communication like infinity fabric, nvlink etc. the attempt to build a distributed system fails immediately due to the gigantic overhead the south-north bridge comms and also the hardware limitations of even like PCIe 5.0 total bandwidth which as it stands sits at "just" 64GB/s. That's almost almost half of the nvlink bandwidth of the nvidia 30xx series, but when true enterprise solutions are considered that dwarf even the 15k fully-specd M3 Ultra Studio, then the interconnect bandwidth reaches almost the TB/s realm.

Anyhow TL; DR:

1. No, the multiple 5090 set-up is unfeasible due to a lack of fast interconnect.
2. Attempting to circumvent that hardware limitation results in overhead cluterfuck = pointless.
3. Special enterprise hardware can do whatever peeps try to achieve with "cheap" 4xV100 or multiple Mac Studios, but it will cost 150k-500k.

Posted by Baalzie

- March 17, 2025, 23:19:07

Quote from: MigitMD on March 12, 2025, 17:59:01Now, scale down that 184 to 45.6, roll it back to 5 nm and retest. How does it fair now?

So you're saying ALL comparisons between Nvidia and AMD and especially between generations should be scrapped and ONLY looked at from a power core perspective?

I think you wouldn't like that very much and it would be ridiculous. Obviously.
Means you can never upgrade if each core isn't bringing more efficiency.
We still be sitting here on 3DFX and Matrox 450's if we did that... Utterly preposterous...

Posted by Mietek

- March 16, 2025, 04:38:55

No Apple CPU supports Real Time pathtracing in twinmotion/epic - not usable for serious work in those popular tools

Posted by RobertJasiek

- March 12, 2025, 19:33:47

Quote from: MigitMD on March 12, 2025, 17:59:01Unified is for both system and video.

Subject to limitations (ca. 25% needs to stay for the system) and assignments (one can choose how much to use for either purpose).

Quotean SoC that has 4 times the amount of transistors on a more advanced node, will do better.

While your analysis has some value, your conclusion is wrong because the "software stack" (drivers, libraries and softwares) and the requirement of every particular software for RAM or VRAM (or unified memory assigned as eiher) also have a very great impact. Hardware expense is another aspect (M3 Ultra 512GB unified memory is all fine and well until you realise it is €10,000 and 4*RTX5090 might be an alternative if distributed 128GB VRAM should be enough).

If software is available / optimised for only one system, it will not / only badly work on other systems. If VRAM limit is essential for a software, it will only run on systems with enough VRAM (or assigned unified memory). Otherwise, software might be designed for both systems. While big LLMs might prefer large unified memory, most other AIs prefer Nvidia GPUs and libraries. There have been several examples for which choosing the right system means dozens of times greater speed. Also in the Nvidia - AMD - comparison.

Never just believe hardware numbers but always inform yourself on which system your preferred software will run at all or faster before buying hardware!

Posted by MigitMD

- March 12, 2025, 17:59:01

Let's put this in perspective a bit, shall we?

Apple M3 Ultra: has 184 BILLION transistors at 3 nm.
The RTX 5070 Ti has 45.6 Billion Transistors at 5 nm.

184/45.6 = 4. The M3 has 4 times the amount of transistors.
And up to 512 GB of unified memory, NOT VRAM. VRAM is dedicated video RAM. Unified is for both system and video.
So yeah, an SoC that has 4 times the amount of transistors on a more advanced node, will do better.

Now, scale down that 184 to 45.6, roll it back to 5 nm and retest. How does it fair now?

Posted by TruthIsThere

- March 12, 2025, 10:42:38

Yeah, the WORLD has been down this road before (the M4 series entire line results... just lies from shills, ect. when it comes to inference workloads, ect.) with the TRUE REAL-TIME RESULTS to these weak SoCs staged performance.

NO MOBILE CHIP will ever... EVER... compete with its rival flashship, or mid-tier, desktop GPU (heck, the M series line IS NOT even comparable to a dedicated flagship mobile GPU in REAL-TIME results); unless dedicated GPUs vendors (really, just NVIDIA) stop producing dGPUs.

Either show REAL-TIME PERFORMANCE... resolution setting, Ai model used, video codec used, duration of video revealed, the avg. FPS... on & on... ...or, it NEVER happened! 😏

Posted by Redaktion

- March 11, 2025, 18:50:35

Apple's M3 Ultra SoC is a massive ARM chip that packs a powerful 32-core CPU and an 80-core GPU, allowing for performance that trades blows with high-end workstations. If the early GPU benchmark scores are anything to go by, it sure does seem that the M3 Ultra is ready to take on RDNA 4 and Nvidia RTX 50 GPUs in most if not all workloads.

https://www.notebookcheck.net/Apple-M3-Ultra-crushes-Nvidia-GeForce-RTX-5070-Ti-in-GPU-benchmark-but-falls-short-of-RTX-5080.977089.0.html

News:

Post reply

Topic summary

Posted by RobertJasiek

Posted by John Doe

Posted by Baalzie

Posted by Mietek

Posted by RobertJasiek

Posted by MigitMD

Posted by TruthIsThere

Posted by Redaktion