

### Links

- > My homepage: brunolevy.github.io
- > Learn-fpga project: https://github.com/BrunoLevy/learn-fpga/



#### >Research:

Invent and test new architectures

arithmetics, encryption, energy, OS research

Real-scale testbed for formal methods



#### >Research:

Invent and test new architectures
arithmetics, encryption, energy, OS research
Real-scale testbed for formal methods

>Curiosity: you want to have a deeper understanding of how things work



#### >Research:

Invent and test new architectures
arithmetics, encryption, energy, OS research
Real-scale testbed for formal methods

- >Curiosity: you want to have a deeper understanding of how things work
- >Opportunity: you feel it is something that is going to be important in a near future



#### >Research:

Invent and test new architectures
arithmetics, encryption, energy, OS research
Real-scale testbed for formal methods

>Curiosity understar

>Opportu
to be impo

What is the most important industrial actor of AI?

#### >Research:

Invent and test new architectures
arithmetics, encryption, energy, OS research
Real-scale testbed for formal methods

>Curiosity understar

>Opportu
to be impo

Nvidia: 1000 billions !!!

# Designing your own processor, why? Very special needs





#### nature

Explore content >

About the journal ∨

Publish with us ∨

Subscribe

nature > technology features > article

TECHNOLOGY FEATURE | 07 December 2021

# How remouldable computer hardware is speeding up science Field-programmable gate arrays can speed up applications ranging from genomic alignment to deep learning. By Jeffrey M. Perkel







# Designing your own processor, why? Very special needs: the cosmological computer





>Invent idealized mechanism, Program it.

(Babbage, Lovelace)





## >Create a processor in Minecraft?







>Create a processor in Minecraft?







## >Build out of discrete components?







## >Build out of discrete components?





PineAppleOne – RISC-V \$\$\$\$, slow, difficult



## >You have access to a foundry?







## >You have access to a foundry?





Commercial: < 100K unit, forget it

Research: possibilities, using old process (SkyWaters, ...), but \$\$\$



>RiscV with exotic transistors?







>Create a quantum computers ?



















```
input
              clk,
                                                                                      assign mem addr = {ADDR PAD,
                                                                                                          state[WAIT INSTR_bit] | state[FETCH_INSTR_bit] ?
output [31:0] mem addr,
output [31:0] mem wdata,
                                                                                                          PC : loadstore addr};
output [3:0] mem wmask,
                                                                                      wire [31:0] writeBackData =
input [31:0] mem rdata,
                                                                                         (isSYSTEM
                                                                                                              ? cycles
                                                                                                                                      : 32'b0)
output
              mem rstrb,
                                                                                         (isLUI
                                                                                                              ? Uimm
                                                                                                                                      : 32'b0)
input
              mem rbusy,
                                                                                                                                      : 32'b0)
                                                                                         (isALU
                                                                                                              ? aluOut
              mem wbusy,
input
                                                                                         (isAUIPC
                                                                                                              ? {ADDR PAD, PCplusImm} : 32'b0)
input
              reset
                                                                                                              ? {ADDR_PAD,PCplus4 } : 32'b0)
                                                                                                  isJAL
                                                                                         (isJALR
                                                                                                              ? LOAD data
                                                                                         (isLoad
parameter RESET ADDR = 32'h000000000;
                                                                                      wire mem byteAccess
                                                                                                              = instr[13:12] == 2'b00;
parameter ADDR WIDTH = 24;
                                                                                      wire mem halfwordAccess = instr[13:12] == 2'b01;
localparam ADDR PAD = \{(32-ADDR WIDTH)\{1'b0\}\};
                                                                                      wire LOAD sign =
wire [4:0] rdId = instr[11:7];
                                                                                           !instr[14] & (mem byteAccess ? LOAD byte[7] : LOAD halfword[15]);
(* onehot *) wire [7:0] funct3Is = 8'b000000001 << instr[14:12];
                                                                                      wire [31:0] LOAD data =
                                                                                            mem_byteAccess ? {{24{LOAD_sign}},
                                                                                                                                    LOAD byte} :
wire [31:0] Uimm = {
                        instr[31], instr[30:12], {12{1'b0}}};
                                                                                        mem halfwordAccess ? {{16{LOAD sign}}, LOAD halfword} :
wire [31:0] Iimm = {{21{instr[31]}}, instr[30:20]};
                                                                                                              mem rdata ;
wire [31:0] Simm = {{21{instr[31]}}, instr[30:25],instr[11:7]};
                                                                                      wire [15:0] LOAD halfword =
wire [31:0] Bimm = {{20{instr[31]}}, instr[7],instr[30:25],instr[11:8],1'b0};
                                                                                                   loadstore_addr[1] ? mem_rdata[31:16] : mem_rdata[15:0];
wire [31:0] Jimm = {{12{instr[31]}}, instr[19:12],instr[20],instr[30:21],1'b0};
                                                                                      wire [7:0] LOAD byte =
                                                                                                   loadstore addr[0] ? LOAD halfword[15:8] : LOAD halfword[7:0];
wire isLoad
              = (instr[6:2] == 5'b00000); // rd <- mem[rs1+Iimm]
wire isALUimm = (instr[6:2] == 5'b00100); // rd <- rs1 OP Iimm</pre>
                                                                                      assign mem_wdata[ 7: 0] = rs2[7:0];
                                                                                      assign mem_wdata[15: 8] = loadstore addr[0] ? rs2[7:0] : rs2[15: 8];
assign mem_wdata[23:16] = loadstore addr[1] ? rs2[7:0] : rs2[23:16];
wire isAUIPC = (instr[6:2] == 5'b00101); // rd <- PC + Uimm</pre>
              = (instr[6:2] == 5'b01000); // mem[rs1+Simm] <- rs2
wire isStore
                                                                                      assign mem_wdata[31:24] = loadstore_addr[0] ? rs2[7:0] :
wire isALUreg = (instr[6:2] == 5'b01100); // rd <- rs1 OP rs2
               = (instr[6:2] == 5'b01101); // rd <- Uimm
                                                                                                                 loadstore addr[1] ? rs2[15:8] : rs2[31:24];
wire isLUI
wire isBranch = (instr[6:2] == 5'bl1000); // if(rs1 OP rs2) PC<-PC+Bimm</pre>
                                                                                      wire [3:0] STORE wmask = mem byteAccess ?
wire isJALR
              = (instr[6:2] == 5'b11001); // rd <- PC+4; PC<-rs1+Iimm
                                                                                                 (loadstore addr[1] ?
wire isJAL
               = (instr[6:2] == 5'bl1011); // rd <- PC+4; PC<-PC+Jimm
                                                                                                        (loadstore addr[0] ? 4'b1000 : 4'b0100) :
wire isSYSTEM = (instr[6:2] == 5'b11100); // rd <- cycles</pre>
                                                                                                        (loadstore_addr[0] ? 4'b0010 : 4'b0001) ) :
wire isALU = isALUimm | isALUreg;
                                                                                                  mem halfwordAccess ?
                                                                                                        (loadstore addr[1] ? 4'b1100 : 4'b0011) : 4'b1111;
reg [31:0] rs1;
rea [31:0] rs2:
                                                                                      localparam FETCH INSTR bit
                                                                                                                      = 0;
reg [31:0] registerFile [31:0];
                                                                                      localparam WAIT INSTR bit
                                                                                                                      = 1;
always @(posedge clk) begin
                                                                                      localparam EXECUTE bit
  if (writeBack)
                                                                                      localparam WAIT AL\overline{U} OR MEM bit = 3;
    if (rdId != 0)
                                                                                      localparam NB STATES
                                                                                                                      = 4:
      registerFile[rdId] <= writeBackData;</pre>
                                                                                      localparam FETCH INSTR
                                                                                                                  = 1 << FETCH INSTR bit:
                                                                                      localparam WAIT INSTR
                                                                                                                  = 1 << WAIT INSTR bit;
                                                                                                                 = 1 << EXECUTE bit;
                                                                                      localparam EXECUTE
wire [31:0] aluIn1 = rs1;
                                                                                      localparam WAIT_ALU_OR_MEM = 1 << WAIT_ALU_OR_MEM_bit;
wire [31:0] aluIn2 = isALUreg | isBranch ? rs2 : Iimm;
reg [31:0] aluReg;
                                                                                      reg [NB STATES-1:0] state:
reg [4:0] aluShamt;
                                                                                      wire writeBack = ~(isBranch | isStore ) &
wire aluBusy = |aluShamt;
                                                                                                        (state[EXECUTE bit] | state[WAIT ALU OR MEM bit]);
wire aluWr;
                                                                                      assign mem rstrb = state[EXECUTE bit] & isLoad | state[FETCH INSTR bit];
wire [31:0] aluPlus = aluIn1 + aluIn2;
                                                                                      assign mem wmask = {4{state[EXECUTE bit] & isStore}} & STORE wmask;
wire [32:0] aluMinus = {1'b1, ~aluIn2} + {1'b0,aluIn1} + 33'b1;
                                                                                      assign aluWr = state[EXECUTE bit] & isALU;
            LT = (aluIn1[31] ^ aluIn2[31]) ? aluIn1[31] : aluMinus[32];
wire
wire
            LTU = aluMinus[32];
                                                                                      wire jumpToPCplusImm = isJAL | (isBranch & predicate);
            EQ = (aluMinus[31:0] == 0);
                                                                                      wire needToWait = isLoad | isStore | isALU & funct3IsShift;
wire [31:0] aluOut =
  (funct3Is[0] ? instr[30] & instr[5] ? aluMinus[31:0] : aluPlus : 32'b0)
  (funct3Is[2] ? {31'b0, LT}
                                                                                      always @(posedge clk) begin
  (funct3Is[3] ? {31'b0, LTU}
                                                                    . 32'b0)
                                                                                          if(!reset) beain
  (funct3Is[4] ? aluIn1 ^ aluIn2
                                                                    : 32'b0)
                                                                                             state
                                                                                                        <= WAIT ALU OR MEM;
  (funct3Is[6] ? aluIn1 | aluIn2 (funct3Is[7] ? aluIn1 & aluIn2
                                                                    : 32'b0)
                                                                                             PC
                                                                                                        <= RESET ADDR[ADDR WIDTH-1:0];
                                                                    : 32'b0)
                                                                                          end else
  (funct3IsShift ? aluReg
                                                                    : 32'b0);
                                                                                          (* parallel case *)
wire funct3IsShift = funct3Is[1] | funct3Is[5];
                                                                                          case(1'b1)
always @(posedge clk) begin
                                                                                           state[WAIT INSTR bit]: begin
   if(aluWr) begin
                                                                                               if(!mem rbusy) begin
      if (funct3IsShift) begin // SLL, SRA, SRL
                                                                                                  rs1 <= registerFile[mem rdata[19:15]];
         aluReg <= aluIn1;
                                                                                                  rs2 <= registerFile[mem rdata[24:20]];
         aluShamt <= aluIn2[4:0];
                                                                                                  instr <= mem rdata[31:2];</pre>
      end
                                                                                                  state <= EXECUTE;
   end
                                                                                               end
   if (|aluShamt) begin
                                                                                            end
      aluShamt <= aluShamt - 1;
                                                                                            state[EXECUTE bit]: begin
      aluReg <= funct3Is[1] ? aluReg << 1 :
                                                                                               PC <= isJALR
                                                                                                                     ? {aluPlus[ADDR WIDTH-1:1],1'b0} :
                {instr[30] & aluReg[31], aluReg[31:1]};
                                                                                                     jumpToPCplusImm ? PCplusImm :
   end
                                                                                                     PCplus4;
end
                                                                                               state <= needToWait ? WAIT ALU OR MEM : FETCH INSTR;
wire predicate = funct3Is[0] & EQ | funct3Is[1] & !EQ | funct3Is[4] & LT
                                                                                            end
                 funct3Is[5] & !LT | funct3Is[6] & LTU | funct3Is[7] & !LTU |
                                                                                            state[WAIT ALU OR MEM bit]: begin
                                                                                                if(!aluBusy & !mem rbusy & !mem wbusy) state <= FETCH INSTR;</pre>
reg [ADDR WIDTH-1:0] PC;
reg [31:2] instr:
                                                                                            default: state <= WAIT INSTR;</pre>
wire [ADDR WIDTH-1:0] PCplus4 = PC + 4;
wire [ADDR WIDTH-1:0] PCplusImm = PC + ( instr[3] ? Jimm[ADDR WIDTH-1:0] :
                                                                                          endcase
                                          instr[4] ? Uimm[ADDR WIDTH-1:0] :
                                                                                       reg [31:0] cycles;
                                                     Bimm[ADDR WIDTH-1:0] );
                                                                                      always @(posedge clk) cycles <= cycles + 1;</pre>
wire [ADDR WIDTH-1:0] loadstore addr = rs1[ADDR WIDTH-1:0] +
                                                                                    endmodule
                (instr[5] ? Simm[ADDR WIDTH-1:0] : Iimm[ADDR WIDTH-1:0]);
```

module FemtoRV32(

## What makes it easy?

>RISC-V ISA + software ecosystem (gcc, Linux, ...)



## What makes it easy?

>RISC-V ISA + software ecosystem (gcc, Linux, ...)

>Open-source tools Yosys / NextPNR



## What makes it easy?

- >RISC-V ISA + software ecosystem (gcc, Linux, ...)
- >Open-source tools Yosys / NextPNR
- >Cheap FPGAs







## **Learn-FPGA project - Vision**

https://github.com/BrunoLevy/learn-fpga





- >It is going to be important (remember, Nvidia!)
- >People/competence is the critical resource



https://github.com/BrunoLevy/learn-fpga

- >Zero to RISC-V courses, teaching material
- >Small HW requirement (50 Euros / student)
- >Small Risc-V design (200 lines)







https://github.com/BrunoLevy/learn-fpga

>Pipelined design, FPU





https://github.com/BrunoLevy/TordBoyau

#### Performance (RV32I) (A35T/Vivado)

| branch prediction | CoreMarks/MHz | DMips/MHz | Raystones | LUTs | FFs |
|-------------------|---------------|-----------|-----------|------|-----|
| none              | 0.928         | 1.298     | 5.665     | 909  | 517 |
| static (BTFNT)    | 1.118         | 1.488     | 6.633     | 938  | 516 |
| static + RAS      | 1.147         | 1.528     | 6.795     | 1040 | 676 |
| gshare            | 1.124         | 1.562     | 7.186     | 1297 | 547 |
| gshare + RAS      | 1.153         | 1.606     | 7.375     | 1388 | 711 |

#### Performance (RV32IM) (A35T/Vivado)

| branch prediction | CoreMarks/MHz | DMips/MHz | Raystones | LUTs | FFs |
|-------------------|---------------|-----------|-----------|------|-----|
| none              | 2.387         | 1.341     | 15.296    | 1368 | 681 |
| static (BTFNT)    | 2.763         | 1.545     | 16.097    | 1363 | 680 |
| static + RAS      | 2.790         | 1.579     | 16.476    | 1478 | 840 |
| gshare            | 2.837         | 1.597     | 17.753    | 1760 | 711 |
| gshare + RAS      | 2.866         | 1.634     | 18.215    | 1801 | 875 |

```
Branch prediction
                nop
lw x6,16(x3)
[E] PC=000001fc [
                lw x6,16(x3) x6 <- 0x200 (51
   PC=00000200 [W ] and x6,x6,x5 rs1-0-00000000 [S] PC=00000204 [ ] bne x6,x0,0x1fd predict taken:0
                                          2) rs2=0x00000200 (512)
   Pipeline control (stall / flush)
      00000204 M bne x6,x0,0x1fc rs1=0x00000200 (512) rs2=0x00000000 (0) taken:1 predict miss
bne x6,x0,0x1fc
[E] PC=000001fc [ ] lw x6,16(x3) rs1=0x00400000 (4194304) rs2=0x00000000 (0)
[D] PC=00000200 (3) and x6,x6,x5
             Data hazard for rs1/rs2
   PC=000001fc
```



https://github.com/BrunoLevy/learn-fpga

- >To come (help wanted)
- > Priviledged instructions, running Linux
- > OoO, Tomasulo
- Port tutorials to other HDLs
- LiteX integration, LiteOS, porting RiscOS



### What's next?

## > PULP platform ETH Zurich







### What's next?

> Why don't we have a French RISC-V design?

