This FPGA Project was started November 2011. It is still in formative stage, but has been abandoned as a pipelined radix-2 solution was more efficient.

I'm going to implement a generic Radix4 division package in VHDL.

## Algorithm

Unlike simple binary division Radix 4 division generates two bits of the result for each operation, decreasing latency. To decide what bits to add to the quotient it compares against the divisor, 2*divisor and 3*divisor.

It can either be implemented as feedback (so the same gates are used for each bit pair) or as a pipeline.

See Binary division for information, and C code for it.

## Interface

• Dividend - Unsigned integer
• Divisor - Unsigned integer
• Clock signal - Unsigned integer

Constant (generic) information also needed:

• Width of Dividend
• Width of Divisor
• Width of Quotient

Outputs:

• Quotient

## Top-level VDHL entity

```entity radix4division is
Generic (divisor_width  : natural := 4;
dividend_width : natural;
quotient_width : natural;
);
Port ( Clk      : in  STD_LOGIC;
dividend : in  STD_LOGIC_VECTOR(dividend_width-1 downto 0);
divisor  : in  STD_LOGIC_VECTOR(divisor_width-1  downto 0);
quotient : out STD_LOGIC_VECTOR(quotient_width-1 downto 0);
);

```

## Each step

Inputs:

• Existing quotient
• Current dividend
• Divisor
• Clock signal
• Bit offset of bits to generate

Constant (generic) information also needed:

• Width of Divisor
• Width of Quotient
• Bit offset of bits to generate

Outputs:

• updated quotient
• updated dividend
• Divisor

It is possible to reduce the length of comparisons - rather than comparing n2-2 bits only 4 bits need subtraction - the lowwer order bits can just be ORed together.

Equations for first step of a/b:

• Comparing against divisor
• x1 <= (0 & a(n-1 ... n-2)))
• y1 <= (OR(b(n-1 ... 2)) & b(1 ... 0))
• out1 <= (x-y)(1 ... 0) & a(n-3 ... 0)
• Comparing against 2*divisor
• x2 <= (0 & a(n-1))
• y2 <= (OR(b(n-1 ... 1)) & b(0))
• out2 <= (x-y)(0) & a(n-2 ... 0)
• Comparing against 3*divisor
• sum <= (0 & b(1 ... 0)) + (0 & b(0) & 0)
• x3 <= (0 & a(n-1) & a(n-2))
• y3 <= (OR(d(n-1 ... 1),sum(2)) & sum(1 ... 0))
• out3 <= (x-y)(1 ... 0) & a(n-3 ... 0)

which 'out' is passed to the next stage is selected based on which of out1, out2 or out3 were positive (if any).

If you have n bits, you need 'n/2' stages, and if you number stages from (n/2-1) (dealing to the high bits) to 0 (dealing to bits 1 and 0) the equations become more generic:

For stage i (need to verify!):

• Comparing against divisor
• x1 <= (0 & a(n-1 ... 2i) )
• y1 <= (OR(b(n-1 ... n-2i)) & b(n-2i ... 0))
• out1 <= (x-y)(n/2-i-1 ... 0) & a(2i-1 ... 0)
• Comparing against 2*divisor
• x2 <= (0 & a(n-1 ... 2i+1) )
• y2 <= (OR(b(n-1 ... n-2i+1)) & b(0))
• out2 <= (x-y)(n/2-2i-2..0) & a(2i-1 ... 0)
• Comparing against 3*divisor
• sum <= (0 & b(n-2i-1 .. 0)) + (0 & b(n-2i-2 ... 0) & 0)
• x3 <= (0 & a(n-1 ... 2i))
• y3 <= (OR(d(n-1 ... n-2i),sum(2)) & sum(1 ... 0))
• out3 <= (x-y)(n/2-i-1 ... 0) & a(2i-1 ... 0)