I'd just like to add, Tom Petersen, has been in a few videos discussing Xe2 Battlemage in Lunar Lake. In these talks he goes on about improvements and what caused issues/bottlenecks in the previous alchemist architecture.
One thing I often hear him mention, is the change in the vector unit (the Xe core, the main computational foundation building block of Intel's render slice, is composed of two things, a vector and matrix unit). The move from going SIMD8 to SIMD16 within the vector unit apparently made significant improvements not just resolving bugs within games but also improving overall efficiency. Apparently, most games are designed for SIMD16 pipelines (because that was the industry standard that both ATI and Nvidia settled on decades ago). SIMD stands for single instruction multiple data, btw. Moving towards a higher degree of SIMD improves game compatibility. Many games have certain SIMD preferences.
They also were emulating certain instructions via software. One such example is one called execute indirect, which is used in many (modern / next generation) game engines to accelerate command lists. In Xe2, this instruction is implemented directly within the hardware itself.
I don't think there's any single reason to simply explain why current intel graphics is so bad. It's more graphics is complicated. It was Intel's first real attempt at making a larger GPUs geared towards serious graphic workloads. And there were just many short comings both from hardware, software and just generally speaking game code not being tested / optimized for their stack doesn't help either. (Chicken and egg problem, they've limited market share so game dev studios don't bother trying as they've limited amount of resources and time aswell) Overcoming these obstacles time, both of which Nvidia and AMD have had decades of. And yes, I'm aware intel has been making graphics for equally as long and have made multiple previous attempts. But Xe was fairly recent and a completely different architecture with drivers from scratch. Just look at how much Qualcomm is struggling with their adreno drivers on windows laptops.
Still trying to answer the topic question: "why does Intel Arc 8-Xe-Cores still lack behind AMD Radeon 780m"?
I would say it's because Intel Alchemist does not have the specific hardware optimizations for gaming (keyword: short workqueues). Thus its state is similar to AMD Vega. Only with Battlemage they started to make these hardware optimizations for gaming, similar to AMD RDNA.
Sorry guys... I probably spread some misinformation (partially caused by confusing or wrong information on websites).
The part with AMD having 2x ALUs is probably wrong. The amount of ALUs between GCN and RDNA seem to remain the same, but they adjusted the ALUs in RDNA, so it allowed shorter workqueues, which brought big performance gains in gaming. For a deep dive I suggest to read the following article: hardwaretimes.com/difference-between-amd-rdna-vs-gcn-gpu-architectures/
So the hardware still looks like this: Radeon 740m has 256 ALUs (and not 512 as I suggested) Radeon 760m has 512 ALUs (and not 1024 as I suggested) - the same amount of ALUs as the Vega 8 Radeon 780m has 768 ALUs (and not 1536 as I suggested)
The performance gains seem exceptional large for the 740m. It has the same performance as the Vega 8, with only half the ALUs. And the 780m is only 50% better in performance although it has 200% more ALUs. Strange. But maybe that's a "sweetspot" thing.
Anyways... my original points about "direct and fair competition" between Intel iGPUs and AMD iGPUs is obsolete. It was based on false information.
Technically it still looks like this to me: -Intel Alchemist seems like the AMD GCN moment (similarities in performance and power consumption) -Intel Battlemage seems like the AMD RDNA moment (similarities in refined design, efficiency)
But in the end that's all irrelevant technical details. The market does not wait for one or the other, and the average Joe customer neither. Thus it's ok to only compare GPUs and iGPUs, which are available at the same time.
I also want to add another information, which can confirm the points above:
The old 5700G with Vega 8 has 8 CUs and 512 shaders. The recent 8500G with 740m has 4 CUs and 256 shaders.
Strangely enough gaming benchmarks show that the 8500G is as strong as the 5700G. How is that possible? It's because the Radeon 740m has 2x the ALUs per CU compared to Vega. So in the end 8500G has also 512 ALUs. The same amount as the 5700G.
Back to topic... For a fair comparison the following ones would be direct competitors: The direct competitor for the Intel Arc-4-Xe (512 ALUs) should be the Radeon 740m (512 ALUs) or the Vega 8 (512 ALUs) The direct competitor for the Intel Arc-8-Xe (1024 ALUs) should be the Radeon 760m (1024 ALUs)
Let's be honest, despite Intel attempts to improve their graphics their iGPU still struggles to keep up with AMDs iGPU. Especially in gaming. People say it's all because of unoptimized drivers from Intel - but is that really *the* reason for it? Maybe not...
Please look here:
The Intel Arc-8-Xe iGPU has 8 XEs, which are 1024 ALUs total. The AMD Radeon 780m has 12 CUs, which are - according to Anandtech - 1536 ALUs total.
So the Radeon 780m has 50% more ALUs. Really? Why does no one mention that? This is a very significant difference.
And if that is true, some things suddenly make sense again. Firstly it makes sense why Intel struggles so hard to keep up with AMDs iGPU. Because of much less ALUs. Secondly it makes sense why Intel can only keep up with AMD at higher power consumption. Because using higher power consumption is a simple way to compensate for lacking hardware.
And with all that in mind, most comparisons aren't exactly fair. Comparisons should be made with iGPUs with the same amount of ALUs. The direct competitor of the Intel Arc-8-Xe is the Radeon 760m, because both have the same amount of ALUs (both have 1024 ALUs). Anything above (780m, 880m, 890m) is already specced +50% higher.