学习日志之microelectronic(2)——pentium 4架构

P4架构简介

pentium4架构是intel公司推出的第7代处理器架构,属于超标量处理器,一次性可以issue多达4个指令,因此其处理速度理论上可以提升至CPU时钟速率的4倍,其同时支持CISC和RISC指令集。相比较第七代处理器架构和第六代处理器架构上有一定的改进但是总体而言并不大,下图为pentium4和pentium2架构图,不同在于其将某一些结构进行了容量的扩展,比如说BTB的entry,扩展到了4K个寻址范围,以及在流水线的安排上鉴于其处理速度的上升,加深了流水线的深度,此外将L1 instruction cache从fetch之前放到了decoder之后,并且换了个名字叫trace decoder(这里的目的是为了省去重新解码microinstruction的功夫将已经解码好的指令给放进来)。

学习日志之microelectronic(2)——pentium 4架构
学习日志之microelectronic(2)——pentium 4架构

P4的流水线

相比前一代P3,P4的流水线比其更深。P3的流水线只有11级,而P4却有20级。而90纳米级版本的P4处理器甚至有多达31级的流水线,这意味着单个指令处理速率的下降,在度过了以主频速率竞争的年代后,当前所用的市面处理器大多都在13级流水线左右。

P4的流水线如下图所示,各阶段的作用我就粘贴原文了:

学习日志之microelectronic(2)——pentium 4架构

  • TC Nxt IP: Trace cache next instruction pointer. This stage looks at branch target buffer (BTB) for the next microinstruction to be executed. This step takes two stages.
  • TC Fetch: Trace cache fetch. Loads, from the trace cache, this microinstruction. This step takes two stages.
  • Drive: Sends the microinstruction to be processed to the resource allocator and register renaming circuit.
  • Alloc: Allocate. Checks which CPU resources will be needed by the microinstruction – for example, the memory load and store buffers.
  • Rename: If the program uses one of the eight standard x86 registers it will be renamed into one of the 128 internal registers present on Pentium 4. This step takes two stages.
  • Que: Queue. The microinstructions are put in queues according to their types (for example, integer or floating point). They are held in the queue until there is an open slot of the same type in the scheduler.
  • Sch: Schedule. Microinstructions are scheduled to be executed according to its type (integer, floating point, etc). Before arriving to this stage, all instructions are in order, i.e., on the same order they appear on the program. At this stage, the scheduler re-orders the instructions in order to keep all execution units full. For example, if there is one floating point unit going to be available, the scheduler will look for a floating point instruction to send it to this unit, even if the next instruction on the program is an integer one. The scheduler is the heart of the out-of-order engine of Intel 7th generation processors. This step takes three stages.
  • Disp: Dispatch. Sends the microinstructions to their corresponding execution engines. This step takes two stages.
  • RF: Register file. The internal registers, stored in the instructions pool, are read. This step takes two stages.
  • Ex: Execute. Microinstructions are executed.
  • Flgs: Flags. The microprocessor flags are updated.
  • Br Ck: Branch check. Checks if the branch taken by the program is the same predicted by the branch prediction circuit.
  • Drive: Sends the results of this check to the branch target buffer (BTB) present on the processor’s entrance.

pentium4主要结构

虽然说流水线很复杂,但是整体而言P4的指令操作步骤可以分为4个阶段:

>按顺序取微指令*

>译码微指令*

>执行(乱序)

>顺序commit

由于兼容性的关系,在通用PC机上架构必须支持CISC指令(即x86之类的),所以虽然说RISC指令更优秀,但是还是得有个译码器将RISC指令翻译成CISC指令才能兼容RISC指令的内容,从前端(front-end)看起来处理器只支持CISC但是,其实在内部是支持RISC的,为了区分这两种指令,CISC指令一般就被看作为指令(Instruction)而RISC指令被称为微指令(microinstruction)。一般的仅用RISC指令的处理器不会用来做通用PC而是在更专门化的嵌入式设备上应用更广泛。

学习日志之microelectronic(2)——pentium 4架构

在取值和译码阶段的主要结构如下所示,这个结构主要意图是为了节约在loop中译码消耗的时间,原本放在instruction fetch阶段之前的cache放在了decoder之后,存在这个cache中的指令都是已经译码好的指令,然后通过uops queue放到Allocator 和 Register Renamer中之后就可以看作一个超标量处理器来理解了:

学习日志之microelectronic(2)——pentium 4架构