Zen 4 is going to have SMT4 ?
> lowering core count while maintaining multi-thread performance with 4 threads per core
That's not really how it works though. you really get dimishing returns with SMT.
SMT2 give you a 1.6x performance improvement over SMT1 in the most favorable cases. But it can also go down to 10% performance loss in the unfavorable cases. Most applications lie between 1x and 1.4x,
SMT4 shows diminishing returns. with 4x the threads, you end up with a 2.5x performance improvements over SMT1 in the most favorable cases, but up to 60% (!!) performance loss in unfavorable cases. Most applications lie between 1.9x and 0.8x.
In particular when executing parallel application, having more parallelism can lead to worse results whenever shared resources need to be accessed. You have more threads fighting for the same resource, leading to contention and wasted cpu cycles. This can be especially the case when some threads execute "faster" than others, like in SMT (SMT threads run instructions during the "wasted" CPU cycles of another "main" thread).
So not only is a SMT thread not as "fast" as a hardware thread, but also sometimes throwing more, slower threads leads to performance degradations.
Source: "An SMT-Selection Metric to Improve Multithreaded Applications' Performance", page 9, available on researchgate