The Ryzen processor put AMD back in the game after they had lagged behind Intel for several years. The Zen kernel has a throughput of five instructions per clock cycle, which is the record so far. The throughput is particularly high for 128-bit vector code. The Ryzen can calculate four 128-bit floating point vectors per clock cycle, or two 256-bit vectors.
The high throughput places a higher burden on programmers and compilers to utilize the increased instruction level parallelism in single threaded applications. The core throughput is so high that it makes good sense to run two threads per core, unlike some other processors with less core throughput which are likely to see a serious performance drop due to the two threads competing for the limited resources.
The new ?op cache is an important improvement which removes the bottleneck of instruction fetching and decoding in most of the critical loops.
The large caches at all levels is a particularly important improvement. But the cache bandwidth is limited to 32 bytes per clock which is less than the best competing Intel processors.