Quote from: WhoCares on September 03, 2023, 00:00:54there are also a few tasks nVidia cards can't do at all -
I have never claimed that a) one system would always be better than another system and b) one system could do all tasks for a given expense and time of execution.
Quotefor example, ProRes hardware encoding.
It is well known that hardware encoding is an advantage for specific tasks, such as specific video encoding. Every chip (Apple, Intel, Nvidia etc.) has some such hardware encodings. Unsurprisingly, they help for those specific tasks. However, even so, testers have revealed that it also depends on video en- or decoding settings whether hardware encoding applies and which system is faster for, in particular, video transcoding. In particular, Tech Notice has found that some settings result in relatively much faster, while other settings result in relatively much slower video transcodings of Apple M versus x64 PCs. A statement for ProRes is too general - settings also matter.
QuoteAlso very large neural net model training could be faster on Apple, since some models can't fit in relatively small video memory of consumer nVidia cards.
Could be, or could not be. For it to possibly be faster, software must be written for the used system and its chips.
Neural net is one type of machine learning. Things apply to every kind of machine learning. Indeed, usually training AI requires much more storage (VRAM and RAM, or unified memory) than applying pretrained AI. Indeed, insufficient storage can prevent a task beyond certain application limits.
I apply a pretrained AI on 12 GB VRAM and 64 GB RAM. It can fill 0.8 GB VRAM and 64 GB RAM after ca. 2 hours of execution. I might have spent another €200 to get 128 GB RAM but 64 GB is good enough for me. This exemplifies that it depends on the AI software whether much VRAM, much RAM or both are needed. AAA 3D games are known to sometimes need more than 12 GB VRAM. If I trained the AI I only use, I would profit from gigantic amounts of VRAM (and RAM). How does this compared to 96 GB Apple M unified memory? Within ca. 2 hours, already 64 GB would be filled for RAM-like use so only 32GB would remain for VRAM-like use but to use the latter, more than 64 GB RAM-like use would also be needed. So in practice, 96 GB Apple M unified memory would behave more like 24 GB VRAM + 64 GB RAM, something that is available in a PC as an RTX 4090 (ca. €1650) + 64 GB RAM (ca. €200). Therefore, 96 GB Apple M unified memory is not at all that impressive except for its lower TDP. For the AI I use, RTX 4070 + 64 GB RAM are 32.5 times faster and 5 times as efficient as Apple M1; Apple M2 96 GB might compare less badly but you get the idea that Nvidia GPU and Nvidia libraries can be much faster and more efficient than Apple M unified memory. I guess that also some AI (video?) software can be found that uses Apple M relatively better, maybe because it also uses some of its hardware transcoding.
It becomes interesting, read: expensive, if we go beyond prosumer Apple M or prosumer PC limitations and demand, say, 1 TB VRAM, 1 TB RAM, 128 core CPU and suitably fast networking / buses. This is territory of €100,000 to €5 million and limitless beyond. Or, more modestly, build an AI workstation with, say, 64 core Threadripper, 8 * RTX 4090, 256 GB RAM for roughly €20,000 ~ €50,000.
For such hardware, AI software design might, or might not, work on distributed storage (especially VRAM) and distributed dGPUs. It often just depends on how that software is designed. There might be specific tasks that cannot work computationally on distributed systems. Usually, it should be possible though. On the software layer, the unified storage model of Apple M can also work when computationally needed. However, recall the 96 GB limit. This advantage becomes meaningless when much more storage is needed and distributed hardware cannot be avoided. I think that, computationally, every algorithm can be transferred from a unified to a distributed approach - it is just a matter of losing some computational speed, but hardly orders of magnitude.
QuoteIf your model can't fit in RAM, your model training will be orders of magnitude slower.
Orders of magnitude slower is really only an issue of VRAM / RAM / unified memory are exceeded and permanent (SSD) storage must be used. Of course, we need enough volatile storage (and speed of the chips).
QuoteMiddle-level Apple laptop has 24-32 Gb of RAM, while high-level Apple laptop/desktop has 64-96 Gb of RAM - and all of it could be used by GPU/Neural Engine part of SoC. Have you ever seen nVidia card with such amount of RAM?
Brainwashing. See above. Just because there is 96 GB unified memory does not mean that 96 GB could be used for either RAM-like or VRAM-like use because the other kind is needed simultaneously.
QuoteDo you know what price is it of?
Much less than an Apple M2 96 GB computer (PC roughly €3000 with RTX 4090 at 24 GB VRAM versus Apple €4500, if my quick Idealo check is right) because PC RAM is cheap. Only the step from 12 to 24 GB VRAM is outrageous (€1000 of the €3000).
For the unlikely split of 48 GB VRAM + 48 GB RAM, the PC with 2 * RTX 4090 will cost ca. €4600, similar to Apple prices. However, I really expect a need for at least 128 GB RAM then (PC €4850 while Apple is impossible with 128 + 48 GB = 176 GB then needed unified memory; maybe Apple M3 for €€€€€?).