You've got some funky math. How on Earth did you arrive at 256 GB/s? Gen3 offers 8 GT/s per lane. With 16 lanes, you get 128 GT/s. With a 128b/130b encoding (hence why they specify transfers and not bits), you get about 126 Gb/s bandwidth (= 15.75 GB/s). Actual throughput is going to be a bit less because of overheads like error correction. And that's why Gen4 was awaited as you need it to power 200 Gb/s server interconnects. The only alternative was to use 32 Gen3 lanes per single port network card (using two 16x cards linked together - this is also used in multi-socket systems to avoid having to go through inter-socket bus).
Not to mention that even if I overlook the issue with you forgetting to account for encoding, 256 isn't double compared to Gen5. 16x Gen4 offers 31.5 GB/s. 16x Gen5 is going to offer 63 GB/s. You're talking quadruple. And that's not what they promised, right?
Development takes time. Although they might find a consumer application for it, I imagine it to be of interest primarily to server market - fast interconnects for clusters, large NVMe arrays, fast access into VRAM. That's a very different game.