Further Work
While Blimp has made lots of progress as both a processor and framework for processor development, there are still improvements that could be made
Fixes
These improvements seek to address issues or shortcomings in the current design, and should be prioritized over enhancements
Simplify Sequence Number Generation
Currently, the sequence generation units (hw/fetch/SeqNumGen*)
maintain a vector representing which sequence numbers are in-flight
(seq_num_list). This approach was taken when the exact semantics of
sequence numbers were being defined, and allows for sequence numbers to
be freed in any order. However, since sequence numbers are freed upon a
commit (occuring in-order with respect to sequence numbers), this
vector could be removed, with in-flight sequence numbers only needing
a head and tail pointer to be represented. This would not only result
in area savings, as well as a one-cycle saving by directly updating the
tail pointer on a commit (instead of later reclaiming entries), but would
cause the area associated with \(N\) sequence numbers to scale overall
as \(\log(N)\).
Fix Memory Network Dependence
With the current memory network, there exists a possibility for deadlock
due to a dependency chain between imem and dmem:
A stalled fetch unit leads to a stalled
FPGAMem, since the response cannot be takenA stalled
FPGAMemleads to a stalled memory XU, since the request cannot be takenA stalled memory XU (in the case of another memory transaction) leads to a stalled pipeline, stalling the fetch unit
Currently, this is resolved by placing FIFOs in the processor at the
imem and dmem ports to buffer responses. If these FIFOs are sized
to the pipeline depth of the memory network (and assuming fair
arbitration), the memory network won’t stall; however, this approach isn’t
very modular and doesn’t scale well.
Instead, the dependency between imem and dmem in the memory
network should be broken. Some ways to do this include:
Having separate pipelines for both channels in the memory (requiring a dual-ported memory)
Having separate credits for both channels in the memory network, with memory able to service the other channel if one is stalled
SPI Enable
SPI communication is currently used to program the processor. However, unplugging the SPI lines once programming is finished (to allow the design to run on its own, not connected to a computer) can introduce noise that could be received and interpreted as a transaction by the SPI Minion. This behaviour was observed when testing and led to errors, especially if the “go” bit was written (stalling the processor). Instead, some mechanism outside of SPI should be introduced to enable SPI communication (perhaps a switch to enter “programming” mode), such that SPI programming can be disabled to allow an SPI channel to be disconnected.
Functions vs. Tasks
In later versions of Verilator than those tested with, functions are
(correctly) not allowed to call tasks.
This behavior is currently violated by the calling of init_mem in
the FL Memory (a task) by the corresponding function in the top-level
processor simulators (where init_mem is a function). Either could
be changed, but the best is likely to have the top-level simulators
export init_mem as a task.
(This error can be ignored by Verilator, but should be fixed instead to avoid unnecessary linting waivers)
Enhancements
These improvements seek to expand the functionality of Blimp’s processor models
Privileged ISA Support
While the current ISA can support many programs, it lacks the support for
operating system-level control. For this, the privileged RISCV ISA would
need to be implemented, with support for CSRs (and their effects) as well
as exceptions and interrupts. The basic CSRs needed for this would be
mtvec, mepc, mcause, and mstatus (for interrupts);
however, many more may wish to be implemented for various operating
systems, such as hardware privilege modes. A good reference/goal for
minimal support would be those necessary to run EGOS 2000,
also developed at Cornell.
EGOS CSRs
Examining EGOS, the following CSRs (in addition to those listed above) would be needed:
mhartidmvendoridmscratchmipmiemtimemtimecmp
Squashing Locations in the Processor
Recall from the Version 6 processor that currently, we can limit the scope of squashing by handling all control flow instruction in one cycle, ensuring that no potentially-squashed instructions make it past the DIU. However, interrupt/exception handling breaks this assumption (particularly exceptions, which would be handled in the WCU).
Before implementing the privileged ISA, Blimp will need to support squashing of instructions anywhere in the pipeline, including in XUs and the WCU (specifically the ROB) through squash notifications. While this requires many additional modules to be implemented, many should require minimal changes compared to existing versions, with a notable exception being the memory XU (to keep track of in-flight requests to determine which should be squashed, likely using an LSQ in an out-of-order architecture). However, it would also have the benefit of making the system more modular by removing the assumption on control flow resolution time.
Full Out-Of-Order Execution
Currently, Blimp only implements an out-of-order backend; this feature would extend out-of-order processing to include the DIU, making the processor fully out-of-order. This would require the addition of an issue queue in the DIU, which would likely tightly couple with the rename table to determine which registers have pending writes. The rename table would also have to be able to free specific pending writes based on their sequence number on a squash. Additionally, the memory XU would now require a load-store queue (LSQ) to resolve potential RAW hazards through memory (after some discussion, we determined that the best place for the LSQ to live is in an execute unit).
This change would also imply that instructions could be squashed anywhere in the pipeline; see the note above.
Superscalar Pipeline Support
Blimp currently only supports a single-issue pipeline; however, the
methodology supports a superscalar pipeline. For this, the best way to
modify the pipeline (ensuring backwards compatibility) would be to have
units take a parametrized array of interfaces (ex. for multiple
instructions being fetched at once, the FU would drive an array of F__D
interfaces).
The easiest path to implementation would be to start from the back of
the processor, and gradually replace each interface with a parametrized
array, modifying all of the units that it touches (i.e. support an
array of Commit notifications before worrying about multiple issue).
Fairness
Currently, a round-robin arbiter is used at in the WCU to ensure that all execute units are treated fairly and can make progress towards committing their instruction. Maintaining this arbitration may scale difficulty as the WCU supports multiple instructions; however, it’s not necessary to keep this selection fair. Fairness is likely useful from a performance standpoint; however, the WCU could instead have a fixed preference, knowing that other execute units will eventually make progress (as dependent instructions would stall in the DIU). Having such a fixed priority wasn’t adopted for single-issue versions as arbitrating was relatively simple; however, larger designs may prefer a fixed priority over fairly arbitrating N instructions to complete, as the lesser complexity is likely worth the slight performance decrease.
This change would also imply that instructions could make it into the pipeline at the same time as control flow instructions that would want to squash them; see the note on squashing above.
Formal Verification
While Blimp is extensively verified through unit and integration testing,
both the directed and golden tests are user-specified. An ongoing effort
has been to create formal models of each instruction, such that processors
can be formally verified.
For this, the existing InstTraceNotif interface (for producing an
instruction trace and verifying against the FL model) would need to be
replaced with the RVFI Interface,
to hook into RISCV’s formal verification framework.