I'd be really careful about drawing such definite conclusions at this point in the game. "ML" generally has several different types of optimizations that are possible in hardware.
1) Is it specifically tensor configured?
2) What kinds of pipeline operations are optimized?
3) Does the general app space have access to the dedicated hardware?
The article doesn't seem to fill in these gaps, and we don't yet have hardware in our hands or Google commentary helping us add context.
I suspect that Apple absolutely has a lead here. They invested earlier. They have an entire team who has been working on this for a while. They have a lot of dedicated/custom IP. They hand-tuned the engine of thier own car, so to speak.
Google has a Samsung generic car that is a year old, that they bolted a supercharger onto.
I think once we understand more about what the hardware specifics are for Tensor, how they appear to applications, how the pipelines can be used, we will learn more not just about "App XYZ has a gap of 70%" but WHY that is. The why is important. It could be that Tensor is 70% less capable. It could be that its 20% less capable but the specific model in the benchmark was not running on the ML hardware at all. Or was running using functions that the hardware doesnt provide as many resources for. Or.... well... thats the problem: we dont know the important context of "why" yet.
All this article can really say with authority is: Google runs widget-spinning synthetic program 70% less fast than Apple - apparently on par with CPU level benchmarking from prior SoCs.