But after ccalculs, I don't understand the 11GWh So, we had 39.3M of hours of work for 700W GPUs : which means 27.510 GWh, not 11. The best cloud datacenter have an overhead (extra consumption like cooling etc) of 10%, 58% for the average.
Let's take 15% overhead, which means we are closer to 31.6 GWh. Google uses around 15 TWh a year so 41 GWh/day
A US citizen uses on average 11.2MWh/year so, the training alone uses the equivalent of the yearly consumption of 2 800 americans.
And that is just the training, I'm afraid it's nothing compare to running the thing for all cloud users.
Meta has unveiled its biggest, smartest, most-neutered Llama 3.1 405B AI for royalty-free use. The 750 GB, 405 billion parameter large language model (LLM) is one of the biggest ever released and performs competitively with flagship competitors Anthropic Claude 3.5 Sonnet and OpenAI GPT-4o.