先看一下冯诺依曼结构:


再看一下冯诺依曼结构的“对头”——哈佛结构:


可以看到两者的主要差别是冯诺依曼架构不区分数据与指令,将两者放在同一内存中;而哈佛结构将两者分别存放在Instruction Memory和Data
Memory。

指令和数据放在一起的后果是取指令和取数据不能同时进行,否则会引起访存的混乱。发展到今天,CPU的运算速度已经远远超过了访存速度,因此CPU必须浪费时间等数据;而哈佛构架由于指令和数据是分开存放的,所以在等数据的同时可以预取指令,CPU的利用率更高。
由于指令与数据放在同一内存带来的CPU利用率(吞吐率)降低就是冯诺依曼瓶颈
维基百科的解释如下:

The shared bus between the program memory and data memory leads to the von
Neumann bottleneck, the limited throughput (data transfer rate) between the
central processing unit (CPU) and memory compared to the amount of memory.
Because the single bus can only access one of the two classes of memory at a
time, throughput is lower than the rate at which the CPU can work. This
seriously limits the effective processing speed when the CPU is required to
perform minimal processing on large amounts of data. The CPU is continually
forced to wait for needed data to be transferred to or from memory. Since CPU
speed and memory size have increased much faster than the throughput between
them, the bottleneck has become more of a problem, a problem whose severity
increases with every newer generation of CPU.

冯诺依曼瓶颈的缓解办法有:
1. Providing a cache between the CPU and the main memory
2. providing separate caches or separate access paths for data and
instructions (the so-called Modified Harvard architecture)
3. using branch predictor algorithms and logic
4. providing a limited CPU stack or other on-chip scratchpad memory to reduce
memory access

实际上,绝大多数现代计算机使用的是所谓的“Modified Harvard Architecture”,指令和数据共享同一个 address
space,但缓存是分开的。在内存里,指令和数据是在一起的。而在CPU内的缓存中,还是会区分指令缓存和数据缓存,最终执行的时候,指令和数据是从两个不同的地方出来的。你可以理解为在CPU外部,采用的是冯诺依曼模型,而在CPU内部用的是哈佛结构。

参考:
冯诺依曼架构
<https://zh.wikipedia.org/wiki/%E5%86%AF%C2%B7%E8%AF%BA%E4%BC%8A%E6%9B%BC%E7%BB%93%E6%9E%84>
哈佛架构 <https://zh.wikipedia.org/wiki/%E5%93%88%E4%BD%9B%E7%BB%93%E6%9E%84>
知乎:为什么电脑还沿用冯·诺伊曼结构而不使用哈佛结构?
<https://www.zhihu.com/question/22406681>