Well, I said crap, totally mixed the units...
But after ccalculs, I don't understand the 11GWh
So, we had 39.3M of hours of work for 700W GPUs : which means 27.510 GWh, not 11.
The best cloud datacenter have an overhead (extra consumption like cooling etc) of 10%, 58% for the average.
Let's take 15% overhead, which means we are closer to 31.6 GWh.
Google uses around 15 TWh a year so 41 GWh/day
A US citizen uses on average 11.2MWh/year so, the training alone uses the equivalent of the yearly consumption of 2 800 americans.
And that is just the training, I'm afraid it's nothing compare to running the thing for all cloud users.