Mandelbrot NG

From Hamsterworks Wiki!

Jump to: navigation, search

This |FPGA Project was written in May and June of 2015.

Inspired by a post on Hackaday I set about making a Mandelbrot view that would calculate every pixel in real time, using as much of an FPGA's resources as I can.

This is the top level of my fractal viewer. The big difference from my other attempts is that it doesn't have a freame buffer - all pixels are completely calculated every time they are shown.

This allows for very smooth scrolling and zooming, with no restrictions on the speed of scrolling. However the 'depth' is limited by the resources available on the FPGA and the pixel clock rate.

As you can seen in the block diagram, the design is pretty simple, and has no feedback which allows it to sprawl out over the FPGA without any timing issues:

Mandelbrot ng structure.png

The clocking, user interface and VGA generator is on the right, followed with the calculation stages, and then the VGA output.

The performance can be tuned to match the Fmax of the FPGA size, FPGA speed and the screen resolution.

Config rules:

constant clocks_per_pixel : integer := 9;

This is the number of FPGA clocks per display pixel. It can be anything from 1 to 12. If anything above 12 is used the pipeline in "stage" will eject a pixel too early.

constant stages : integer := 12;

Number of processing stages in the processing pipeline Alter this to change the number of DSP blocks used

With these values it will allow you to explore at depths of of 9*12 = 108 iterations. On a larger FPGA you can increase the values and see more patterns.

Also in stage.vhd, you have the option to implement some of the multiplications using LUTs rather than DSP blocks. Adjusting this can allow you to include extra stages. It also allows you a bit of flexibility in the latency of the multiplier used, allowing you to meet timing. It is currently configured to use 7 DSP blocks per stage, out of a maximum of 10 (the other three are implemented in LUTs). This gives 120 multipliers on a FPGA with only 90 DSP blocks.

Oh, it is also quite power efficient, at under 2W for 27,000 multiplications per second.

Important note: If you do extend the iteration depth, remember to add more colours in vga_output.vhd.

Contents

Source files

Here's a zip file for all source files, except the multiplier IPs. You will have to make them yourself - just pay special attention to the data width and that they are unsigned.

mandelbrot_ng.zip

top_level.vhd

This is the top level to tie everything together.

-----------------------------------------------------------------------------
-- Project: mandelbrot_ng - my next-gen FPGA Mandelbrot Fractal Viewer
--
-- File : top_level.vhd
--
-- Author : Mike Field <hamster@snap.net.nz>
--
-- Date    : 9th May 2015
--
-- This is the top level of my fractal viewer. The big difference from 
-- my other attempts is that it doesn't have a freame buffer - all
-- pixels are completely calculated every time they are shown.
--
-- This allows for very smooth scrolling and zooming, with no 
-- restrictions on the speed of scrolling. However the 'depth' is 
-- limited by the resources available on the FPGA and the pixel 
-- clock rate.
--
-- The performance can be tuned to match the Fmax of the FPGA size,
-- FPGA speed and the screen resolution.
--
-- Config rules:
-- 
-- constant clocks_per_pixel : integer := 9;
--     This is the number of clocks per display pixel
--     It can be anything from 1 to 12. If anything above 12  
--     is used the pipeline in "stage" will eject a pxel too
--     early.
--
-- constant stages : integer := 12;
--     Number of processing stages in the processing pipeline
--     Alter this to change the number of DSP blocks used
--
-- With these values it will allow you to explore at depths of  
-- of 9*12 = 108 iterations. On a larger FPGA you can increase
-- the values.
--
-- Also in stage.vhd, you have the option to implement some of the 
-- multiplications using LUTs rather than DSP blocks. Adjusting
-- this can allow you to include extra stages.
-- 
-- It is currently configured to use 7 DSP blocks per stage, out
-- of a maximum of 10 (the other three are implemented in LUTs). 
--
--
  
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

Library UNISIM;
use UNISIM.vcomponents.all;

entity top_level is
    Port ( 
        clk100    : in STD_LOGIC;
        
        btnU      : in STD_LOGIC;
        btnD      : in STD_LOGIC;
        btnL      : in STD_LOGIC;
        btnR      : in STD_LOGIC;
        btnC      : in STD_LOGIC;
        vga_hsync : out std_logic;
        vga_vsync : out std_logic;
        vga_red   : out std_logic_vector(3 downto 0);
        vga_green : out std_logic_vector(3 downto 0);
        vga_blue  : out std_logic_vector(3 downto 0)
        );
end top_level;

architecture Behavioral of top_level is
    constant stages : integer := 12;
    constant clocks_per_pixel : integer := 9;
    signal clk : std_logic;

    signal blank : std_logic := '0';
    signal hsync : std_logic := '0';
    signal vsync : std_logic := '0';
    component vga_gen is
        Generic ( 
            pixel_len : integer
        );
        port (
           clk   : in  std_logic;
            
           blank : out std_logic;
           hsync : out std_logic;
           vsync : out std_logic
           );
    end component;    

    component user_interface is
        port (
            clk       : in STD_LOGIC;
            btnU      : in STD_LOGIC;
            btnD      : in STD_LOGIC;
            btnL      : in STD_LOGIC;
            btnR      : in STD_LOGIC;
            btnC      : in STD_LOGIC;
            vsync     : in STD_LOGIC;
            x         : out std_logic_vector(34 downto 0);
            y         : out std_logic_vector(34 downto 0);
            scale     : out std_logic_vector(34 downto 0)
           );
    end component;
                                                --- these are in 4.31 fixed-point signed binary
    signal x      : std_logic_vector(34 downto 0) := (others => '0');
    signal y      : std_logic_vector(34 downto 0) := (others => '0');
    signal scale  : std_logic_vector(34 downto 0) := (others => '0');
    
    signal ca_new   : std_logic_vector(34 downto 0) := (others => '0');
    signal cb_new   : std_logic_vector(34 downto 0) := (others => '0');
    signal sync_new : std_logic_vector( 2 downto 0) := (others => '0');


    type a_fixed_point is array (0 to stages) of std_logic_vector(34 downto 0);
    type a_count       is array (0 to stages) of std_logic_vector(7 downto 0);
    type a_sync        is array (0 to stages) of std_logic_vector(2 downto 0);

    signal ca         : a_fixed_point := (others => (others => '0'));
    signal cb         : a_fixed_point := (others => (others => '0'));
    signal a          : a_fixed_point := (others => (others => '0'));
    signal b          : a_fixed_point := (others => (others => '0'));
    signal iterations : a_count       := (0 => "00000001", others => (others => '0'));
    signal sync       : a_sync        := (others => (others => '0'));
    signal overflow   : std_logic_vector(stages downto 0) := (others => '0');

    component generate_constants is
        Generic ( 
            pixel_len : integer
        );
        port (
            clk       : in std_logic;

            blank_in  : in std_logic;
            hsync_in  : in std_logic;
            vsync_in  : in std_logic;

            x         : in  std_logic_vector;
            y         : in  std_logic_vector;
            x_step    : in  std_logic_vector;
            y_step    : in  std_logic_vector;
            
            blank_out : out std_logic;
            hsync_out : out std_logic;
            vsync_out : out std_logic;
            
            ca        : out std_logic_vector;
            cb        : out std_logic_vector
        );
    end component;
    
    component stage is 
    generic (
        phase_len : integer 
    );
    port (
        clk    : std_logic;
        -- Inputs
        ca_in        : in std_logic_vector; -- The real constant
        cb_in        : in std_logic_vector; -- The imaginary constant
        a_in         : in std_logic_vector; -- the current real value
        b_in         : in std_logic_vector; -- the current imaginary value
        i_in         : in std_logic_vector; -- the current increment count
        overflow_in  : in std_logic;        -- has an overflow occured?
        sync_in      : in std_logic_vector; -- any control/video signals along for the ride

        ca_out       : out std_logic_vector;
        cb_out       : out std_logic_vector;
        a_out        : out std_logic_vector;
        b_out        : out std_logic_vector;
        i_out        : out std_logic_vector;
        overflow_out : out std_logic;
        sync_out     : out std_logic_vector
    );
    end component;

    component vga_output is
        Generic ( 
            pixel_len : integer
            );
        Port ( clk : in STD_LOGIC;
               hsync_in : in STD_LOGIC;
               vsync_in : in STD_LOGIC;
               blank_in : in STD_LOGIC;
               iterations_in : in STD_LOGIC_VECTOR(7 downto 0);
               vga_hsync : out std_logic;
               vga_vsync : out std_logic;
               vga_red   : out std_logic_vector(3 downto 0);
               vga_green : out std_logic_vector(3 downto 0);
               vga_blue  : out std_logic_vector(3 downto 0));
    end component;

    signal clkfb : std_logic;
begin


   MMCME2_BASE_inst : MMCME2_BASE
   generic map (
      BANDWIDTH => "OPTIMIZED",  -- Jitter programming (OPTIMIZED, HIGH, LOW)
      CLKFBOUT_MULT_F => 9.0,    -- Multiply value for all CLKOUT (2.000-64.000).
      CLKFBOUT_PHASE => 0.0,     -- Phase offset in degrees of CLKFB (-360.000-360.000).
      CLKIN1_PERIOD => 10.0,      -- Input clock period in ns to ps resolution (i.e. 33.333 is 30 MHz).
      -- CLKOUT0_DIVIDE - CLKOUT6_DIVIDE: Divide amount for each CLKOUT (1-128)
      CLKOUT1_DIVIDE => 1,
      CLKOUT2_DIVIDE => 1,
      CLKOUT3_DIVIDE => 1,
      CLKOUT4_DIVIDE => 1,
      CLKOUT5_DIVIDE => 1,
      CLKOUT6_DIVIDE => 1,
      CLKOUT0_DIVIDE_F => 4.0,   -- Divide amount for CLKOUT0 (1.000-128.000).
      -- CLKOUT0_DUTY_CYCLE - CLKOUT6_DUTY_CYCLE: Duty cycle for each CLKOUT (0.01-0.99).
      CLKOUT0_DUTY_CYCLE => 0.5,
      CLKOUT1_DUTY_CYCLE => 0.5,
      CLKOUT2_DUTY_CYCLE => 0.5,
      CLKOUT3_DUTY_CYCLE => 0.5,
      CLKOUT4_DUTY_CYCLE => 0.5,
      CLKOUT5_DUTY_CYCLE => 0.5,
      CLKOUT6_DUTY_CYCLE => 0.5,
      -- CLKOUT0_PHASE - CLKOUT6_PHASE: Phase offset for each CLKOUT (-360.000-360.000).
      CLKOUT0_PHASE => 0.0,
      CLKOUT1_PHASE => 0.0,
      CLKOUT2_PHASE => 0.0,
      CLKOUT3_PHASE => 0.0,
      CLKOUT4_PHASE => 0.0,
      CLKOUT5_PHASE => 0.0,
      CLKOUT6_PHASE => 0.0,
      CLKOUT4_CASCADE => FALSE,  -- Cascade CLKOUT4 counter with CLKOUT6 (FALSE, TRUE)
      DIVCLK_DIVIDE => 1,        -- Master division value (1-106)
      REF_JITTER1 => 0.0,        -- Reference input jitter in UI (0.000-0.999).
      STARTUP_WAIT => FALSE      -- Delays DONE until MMCM is locked (FALSE, TRUE)
   )
   port map (
      -- Clock Outputs: 1-bit (each) output: User configurable clock outputs
      CLKOUT0   => clk,    -- 1-bit output: CLKOUT0
      CLKOUT0B  => open,   -- 1-bit output: Inverted CLKOUT0
      CLKOUT1   => open,   -- 1-bit output: CLKOUT1
      CLKOUT1B  => open,   -- 1-bit output: Inverted CLKOUT1
      CLKOUT2   => open,   -- 1-bit output: CLKOUT2
      CLKOUT2B  => open,   -- 1-bit output: Inverted CLKOUT2
      CLKOUT3   => open,   -- 1-bit output: CLKOUT3
      CLKOUT3B  => open,   -- 1-bit output: Inverted CLKOUT3
      CLKOUT4   => open,   -- 1-bit output: CLKOUT4
      CLKOUT5   => open,   -- 1-bit output: CLKOUT5
      CLKOUT6   => open,   -- 1-bit output: CLKOUT6
      -- Feedback Clocks: 1-bit (each) output: Clock feedback ports
      CLKFBOUT  => clkfb,  -- 1-bit output: Feedback clock
      CLKFBOUTB => open,   -- 1-bit output: Inverted CLKFBOUT
      -- Status Ports: 1-bit (each) output: MMCM status ports
      LOCKED    => open,   -- 1-bit output: LOCK
      -- Clock Inputs: 1-bit (each) input: Clock input
      CLKIN1    => clk100, -- 1-bit input: Clock
      -- Control Ports: 1-bit (each) input: MMCM control ports
      PWRDWN    => '0',    -- 1-bit input: Power-down
      RST       => '0',    -- 1-bit input: Reset
      -- Feedback Clocks: 1-bit (each) input: Clock feedback ports
      CLKFBIN   => clkfb   -- 1-bit input: Feedback clock
   );

i_ui: user_interface  port map (
           clk   => clk,
           btnU  => btnU,
           btnD  => btnD,
           btnL  => btnL,
           btnR  => btnR,
           btnC  => btnC,
           vsync => vsync,
           x     => x,
           y     => y,
           scale => scale
          );

i_vga_gen: vga_gen generic map (
            pixel_len => clocks_per_pixel
    ) port map (
        clk   => clk,
        blank => blank, 
        hsync => hsync,
        vsync => vsync
    );

i_generate_constants: generate_constants Generic map ( 
            pixel_len => clocks_per_pixel
    )
    port map (
        clk       => clk,

        blank_in  => blank,
        hsync_in  => hsync,
        vsync_in  => vsync,
        
        x         => x,
        y         => y,
        x_step    => scale, 
        y_step    => scale,
        
        blank_out => sync(0)(0),
        hsync_out => sync(0)(1),
        vsync_out => sync(0)(2),
        
        ca        => ca(0),
        cb        => cb(0)
    );
    a(0)          <= (others =>'0'); 
    b(0)          <= (others =>'0'); 
    iterations(0) <= (others =>'1');
    overflow(0)   <= '0';  

g1: for i in 1 to stages generate
i_stage_1: stage generic map (
            phase_len => clocks_per_pixel
        ) port map (
            clk        => clk,
            -- Inputs
            ca_in        => ca(i-1),
            cb_in        => cb(i-1),
            a_in         => a(i-1), 
            b_in         => b(i-1), 
            i_in         => iterations(i-1), 
            overflow_in  => overflow(i-1),
            sync_in      => sync(i-1),
    
            ca_out       => ca(i),
            cb_out       => cb(i),
            a_out        => a(i),
            b_out        => b(i),
            i_out        => iterations(i),
            overflow_out => overflow(i),
            sync_out     => sync(i)
        );
end generate;

i_vga_output: vga_output Generic map ( 
            pixel_len => clocks_per_pixel
            )
        Port map ( 
            clk => clk,
            hsync_in => sync(stages)(1),
            vsync_in => sync(stages)(2),
            blank_in => sync(stages)(0),
            iterations_in => iterations(stages),
            vga_hsync => vga_hsync, 
            vga_vsync => vga_vsync,
            vga_red   => vga_red,
            vga_green => vga_green,
            vga_blue  => vga_blue
       );
end Behavioral;

user_interface.vhd

-----------------------------------------------------------------------------
-- Project: mandelbrot_ng - my next-gen FPGA Mandelbrot Fractal Viewer
--
-- File : user_interface.vhd
--
-- Author : Mike Field <hamster@snap.net.nz>
--
-- This is the the user interface of my fractal viewer. All it does is
-- wait for the VGA vertical sync to be asserted, and then updates the 
-- top/left of the screen and the zoom/scale factor based on the 
-- four buttons
--
-- It had to be pipelined a little to meet timing, so is a little bit
-- ungainly
--
------------------------------------------------------------------------------

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity user_interface is
    port (
            clk       : in STD_LOGIC;
            btnU      : in STD_LOGIC;
            btnD      : in STD_LOGIC;
            btnL      : in STD_LOGIC;
            btnR      : in STD_LOGIC;
            btnC      : in STD_LOGIC;
            vsync     : in STD_LOGIC;
            x         : out std_logic_vector(34 downto 0);
            y         : out std_logic_vector(34 downto 0);
            scale     : out std_logic_vector(34 downto 0)
    );
end user_interface;

architecture Behavioral of user_interface is
                                                --- these are in 4.32 fixed-point signed binary
    signal x_internal          : unsigned(34 downto 0)  := (others => '0');
    signal y_internal          : unsigned(34 downto 0)  := (others => '0');

    signal x_left              : unsigned(34 downto 0)         := (others => '0');
    signal x_right             : unsigned(34 downto 0)         := (others => '0');
    signal y_up                : unsigned(34 downto 0)         := (others => '0');
    signal y_down              : unsigned(34 downto 0)         := (others => '0');

    signal scale_left          : unsigned(34 downto 0)     := (others => '0');
    signal scale_right         : unsigned(34 downto 0)     := (others => '0');
    signal scale_right_sub     : unsigned(28 downto 0)     := (others => '0');
    
    signal scale_internal      : unsigned(34 downto 0)  := (23 => '1', others => '0');
    signal scale_internal_last : unsigned(34 downto 0) := (23 => '1', others => '0');

    signal x_buffer            : std_logic_vector(34 downto 0) := (others => '0');
    signal y_buffer            : std_logic_vector(34 downto 0) := (others => '0');
    signal scale_buffer        : std_logic_vector(34 downto 0) := (others => '0');

    signal vsync_last     : std_logic;
    signal update_now     : std_logic;

    signal btnU_sync      : STD_LOGIC := '0';
    signal btnD_sync      : STD_LOGIC := '0';
    signal btnL_sync      : STD_LOGIC := '0';
    signal btnR_sync      : STD_LOGIC := '0';
    signal btnC_sync      : STD_LOGIC := '0';

begin
    x     <= std_logic_vector(x_buffer);
    y     <= std_logic_vector(y_buffer);
    scale <= std_logic_vector(scale_buffer);
clk_proc: process(clk)
    begin
        if rising_edge(clk) then            
            x_buffer     <= std_logic_vector(x_internal
                          - (scale_internal(scale_internal'high-8 downto 0)&"00000000")
                          - (scale_internal(scale_internal'high-6 downto 0)&"000000"));
            y_buffer     <= std_logic_vector(y_internal
                          - (scale_internal(scale_internal'high-8 downto 0)&"00000000")
                          + (scale_internal(scale_internal'high-4 downto 0)&"0000"));
            scale_buffer <= std_logic_vector(scale_internal);


            if update_now = '1' then
                if btnC_sync = '0' then
                    if btnL_sync = '1' then
                        x_internal <= x_left;
                    end if;
                    if btnR_sync = '1' then
                        x_internal <= x_right;
                    end if;
                    if btnU_sync = '1' then
                        y_internal <= y_up;
                    end if;
                    if btnD_sync = '1' then
                        y_internal <= y_down;
                    end if;
                else
                    if btnL_sync = '1' then
                        scale_internal <= scale_left;
                    end if;
                    if btnR_sync = '1' then
                        scale_internal <= scale_right;
                    end if;
                end if;                
            end if;

            x_left  <= x_internal - (scale_internal(scale_internal'high-1 downto 0) & '0');
            x_right <= x_internal + (scale_internal(scale_internal'high-1 downto 0) & '0');
            y_up    <= y_internal - (scale_internal(scale_internal'high-1 downto 0) & '0');
            y_down  <= y_internal + (scale_internal(scale_internal'high-1 downto 0) & '0');

           scale_right <= scale_internal - scale_right_sub;
            
            if scale_internal(scale_internal'high downto 6) = 0 then
                if scale_internal(10 downto 0) /= 1 then
                   scale_right_sub <= (0=>'1', others => '0');
                else 
                    scale_right_sub <= (others => '0');
                end if;
            else
               scale_right_sub <= scale_internal(scale_internal'high downto 6); 
            end if;

            if scale_internal(scale_internal'high downto 6) = 0 then
               scale_left <= scale_internal + 1;
            else
               scale_left <= scale_internal + scale_internal(scale_internal'high downto 6);
            end if;
        
            if vsync_last = '0' and vsync = '1' then
                update_now <= '1';
            else
                update_now <= '0';
            end if;

            btnU_sync <= btnU;
            btnD_sync <= btnD;
            btnL_sync <= btnL;
            btnR_sync <= btnR;
            btnC_sync <= btnC;

            vsync_last <= vsync;
        end if;
    end process;
end Behavioral;

vga_gen.vhd

-----------------------------------------------------------------------------
-- Project: mandelbrot_ng - my next-gen FPGA Mandelbrot Fractal Viewer
--
-- File : vga_gen.vhd
--
-- Author : Mike Field <hamster@snap.net.nz>
--
-- Date    : 9th May 2015
--
-- Generate the VGA 640x480 timing signals, where the pixel clock is 
-- clk/pixel_len. 
-- 
-----------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity vga_gen is
    Generic ( 
        pixel_len : integer
    );
    Port ( clk : in STD_LOGIC;
           blank : out STD_LOGIC := '0';
           hsync : out STD_LOGIC := '0';
           vsync : out STD_LOGIC := '0');
end vga_gen;

architecture Behavioral of vga_gen is
    signal x : unsigned(10 downto 0) := (others => '0');
    signal y : unsigned(10 downto 0) := (others => '0');
    signal phase  : std_logic_vector(pixel_len-1 downto 0) := (0 => '1', others => '0');                                 
begin

clk_proc: process(clk)
    begin
        if rising_edge(clk) then
            if phase(0) = '1' then
                if x = 639 then
                    blank <= '1';
                elsif x = 799 and (y = 524 or y < 479) then
                    blank <= '0';            
                end if;
                
                if x = 640+16-1 then
                    hsync <= '1';
                elsif x = 640+16+96-1 then
                    hsync <= '0';
                end if;
    
                if x = 799 then
                    x <= (others => '0');
                    
                    if y = 480+10-1 then
                        vsync  <= '1';
                    elsif y = 480+10+2-1 then
                        vsync  <= '0';
                    end if;
                    
                    if y = 524 then
                        y <= (others => '0');
                    else
                        y <= y +1;
                    end if;
                else
                    x <= x + 1;        
                end if;            
            end if;            
            phase <= phase(0) & phase(phase'high downto 1);
        end if;
    end process;
end Behavioral;

generate_constants.vhd

-----------------------------------------------------------------------------
-- Project: mandelbrot_ng - my next-gen FPGA Mandelbrot Fractal Viewer
--
-- File : generate_constants.vhd
--
-- Author : Mike Field <hamster@snap.net.nz>
--
-- Date    : 9th May 2015
--
-- By following the sync signals, generate the 'a' and 'b' values that 
-- are sent through to the calculation pipeline.
--
------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity generate_constants is
    Generic ( 
        pixel_len : integer
    );
    port (
        clk       : in std_logic;

        blank_in  : in std_logic;
        hsync_in  : in std_logic;
        vsync_in  : in std_logic;

        x         : in  std_logic_vector;
        y         : in  std_logic_vector;
        x_step    : in  std_logic_vector;
        y_step    : in  std_logic_vector;

        blank_out : out std_logic;
        hsync_out : out std_logic;
        vsync_out : out std_logic;
        
        ca        : out std_logic_vector;
        cb        : out std_logic_vector
);
end generate_constants;

architecture Behavioral of generate_constants is
    signal current_x  : std_logic_vector(x'range) := (34 => '1', 33=> '1', 32=> '1', 31=> '1', others => '0');
    signal current_y  : std_logic_vector(y'range) := (34 => '1', 33=> '1', 32=> '1', 31=> '1', others => '0');
    signal blank_last : std_logic                 := '0';
    signal phase  : std_logic_vector(pixel_len-1 downto 0) := (0 => '1', others => '0');                                 
begin
    ca <= current_x;
    cb <= current_y;
    
process(clk)
    begin
        if rising_edge(clk) then
            if phase(0) = '1' then
                if blank_in = '1' then
                    current_x <= x;
                else
                    current_x <= std_logic_vector(unsigned(current_x) + unsigned(x_step));
                end if; 
    
                if vsync_in = '1' then
                    current_y <= y;
                elsif blank_last = '1' and blank_in = '0' then
                    current_y <= std_logic_vector(unsigned(current_y) + unsigned(y_step));
                end if;
                
                blank_last <= blank_in;
                blank_out  <= blank_in;
                hsync_out  <= hsync_in;
                vsync_out  <= vsync_in;
            end if;        
            phase <= phase(0) & phase(phase'high downto 1);
        end if;
    end process;
end Behavioral;

stage.vhd

-----------------------------------------------------------------------------
-- Project: mandelbrot_ng - my next-gen FPGA Mandelbrot Fractal Viewer
--
-- File : stage.vhd
--
-- Author : Mike Field <hamster@snap.net.nz>
--
-- Date    : 9th May 2015
--
-- This is the the calculation engine my fractal viewer. It is a 13 stage
-- pipeline, that evicts one set of values when each new input value is presented.
-- If you present new input one cycle in three, then each item will go around the 
-- pipeline three times before being ejected. 
--
-- Multiple instances of stages can be cascaded together to get the depth you wish.
--
-- Although the a*b multiplication is inferred, the a*a multiplicaiton is acheived
-- using primatives. This has two benefits:
--
-- * It saves DSP blocks, as a:b * a:b can be calculated as a*a:0:0 + 2*a*b:0 + b*b,
--   rather than a*a:0:0 + a*b:0:0 + a*b:0:0 + b*b, which uses four DSP blocks.
--   This allows a stage to use 10 multiplications, rather than the 12 that is 
--   inferred.
-- * Some of the multiplcations can be implemented using LUTs rather than DSP blocks
--   incresing the FPGA LUT rersource usage and increasing the number of stages that
--   can fit on a given device.
--
-- The pipeline is set up using long/wide shift registers that cover all steps of
-- the pipeline, and it is left up to the optimizer to trim out the large number
-- of unused registers. It makes the code a lot simpler.
------------------------------------------------------------------------------

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity stage is
    generic (
        phase_len : integer 
    );
      port (
        clk          : in std_logic;
        -- Inputs
        ca_in        : in std_logic_vector; -- The real constant
        cb_in        : in std_logic_vector; -- The imaginary constant
        a_in         : in std_logic_vector; -- the current real value
        b_in         : in std_logic_vector; -- the current imaginary value
        i_in         : in std_logic_vector; -- the current increment count
        overflow_in  : in std_logic;        -- has an overflow occured?
        sync_in      : in std_logic_vector; -- any control/video signals along for the ride

        ca_out       : out std_logic_vector;
        cb_out       : out std_logic_vector;
        a_out        : out std_logic_vector;
        b_out        : out std_logic_vector;
        i_out        : out std_logic_vector;
        overflow_out : out std_logic;
        sync_out     : out std_logic_vector
    );
end entity;

architecture stage_arch of stage is
   constant latency : integer := 13;
   constant scale   : integer := 4;
   constant mult_size : integer := 35;

  component mult_u17_u17_l4 IS
  PORT (
    CLK : IN STD_LOGIC;
    A : IN STD_LOGIC_VECTOR(16 DOWNTO 0);
    B : IN STD_LOGIC_VECTOR(16 DOWNTO 0);
    P : OUT STD_LOGIC_VECTOR(33 DOWNTO 0)
  );
  end component;
  component mult_u17_u17_l4_lut IS
  PORT (
    CLK : IN STD_LOGIC;
    A : IN STD_LOGIC_VECTOR(16 DOWNTO 0);
    B : IN STD_LOGIC_VECTOR(16 DOWNTO 0);
    P : OUT STD_LOGIC_VECTOR(33 DOWNTO 0)
  );
  end component;

  component mult_u17_u17_l5_lut IS
  PORT (
    CLK : IN STD_LOGIC;
    A : IN STD_LOGIC_VECTOR(16 DOWNTO 0);
    B : IN STD_LOGIC_VECTOR(16 DOWNTO 0);
    P : OUT STD_LOGIC_VECTOR(33 DOWNTO 0)
  );
  end component;
   
   signal phase  : std_logic_vector(phase_len-1 downto 0) := (0 => '1', others => '0');                                 

   type a_sync is array(latency-1 downto 0) of std_logic_vector(sync_in'range);
   signal sync : a_sync := (others => (others =>'0'));

   type a_i is array (latency-1 downto 0) of unsigned(i_in'range);
   signal i : a_i  := (others => (others =>'0'));

   type a_ca is array (latency-1 downto 0) of signed(ca_in'range);
   signal ca : a_ca  := (others => (others =>'0'));

   type a_cb is array (latency-1 downto 0) of signed(cb_in'range);
   signal cb : a_cb  := (others => (others =>'0'));

   type a_a is array (latency-1 downto 0) of signed(a_in'range);
   signal a : a_a  := (others => (others =>'0'));


   type a_b is array (latency-1 downto 0) of signed(b_in'range);
   signal b : a_b  := (others => (others =>'0'));
  
  ----------------------------------------------------- 
   -- Working storage. Most of this gets optimized away
  ----------------------------------------------------- 
   type a_a_abs is array (latency-1 downto 0) of unsigned(a_in'range);
   signal a_abs : a_a_abs := (others => (others =>'0'));

   type a_b_abs is array (latency-1 downto 0) of unsigned(a_in'range);
   signal b_abs : a_b_abs := (others => (others =>'0'));

   type a_a_times_b is array (latency-1 downto 0) of signed(a_in'length+b_in'length-1 downto 0);
   signal a_times_b : a_a_times_b  := (others => (others =>'0'));

   type a_a_squared is array (latency-1 downto 0) of unsigned(a_in'length+a_in'length-1 downto 0);
   signal a_squared      : a_a_squared  := (others => (others =>'0'));

   type a_a_squared_partial is array (latency-1 downto 0) of unsigned(33 downto 0);
   signal a_squared_hh : a_a_squared_partial  := (others => (others =>'0'));
   signal a_squared_hl : a_a_squared_partial  := (others => (others =>'0'));
   signal a_squared_ll : a_a_squared_partial  := (others => (others =>'0'));

   type a_b_squared is array (latency-1 downto 0) of unsigned(b_in'length+b_in'length-1 downto 0);
   signal b_squared      : a_b_squared  := (others => (others =>'0'));
   
   type a_b_squared_partial is array (latency-1 downto 0) of unsigned(33 downto 0);
   signal b_squared_hh : a_b_squared_partial  := (others => (others =>'0'));
   signal b_squared_hl : a_b_squared_partial  := (others => (others =>'0'));
   signal b_squared_ll : a_b_squared_partial  := (others => (others =>'0'));

   type a_magnitude is array (latency-1 downto 0) of unsigned(a_in'length-1 downto 0);
   signal magnitude : a_magnitude  := (others => (others =>'0'));

   signal overflow   : std_logic_vector(latency -1 downto 0) := (others => '0');
   signal mult_fault : std_logic; 
   
   signal a_s_hh : std_logic_vector(33 downto 0);
   signal a_s_hl : std_logic_vector(33 downto 0);
   signal a_s_ll : std_logic_vector(33 downto 0);

   signal b_s_hh : std_logic_vector(33 downto 0);
   signal b_s_hl : std_logic_vector(33 downto 0);
   signal b_s_ll : std_logic_vector(33 downto 0);
   
begin
--   mult_fault <= '0' when signed(a_squared(2)) = a_squared_orig(2) else '1';
clk_proc: process(clk)
    begin
        if rising_edge(clk) then
           -----------------------------------------------------------------
           -- First things, pass through all the signals, we will later
           -- overwrite the intermediate values with results of calculations
           --
           -- We will also rely on the optimizer to remove any infered shift
           -- registers that lead to dead-ends. This will give lots of 
           -- warning but will allow for easy incremental development.
           ------------------------------------------------------------

           if phase(0) = '1' then
               overflow <= overflow_in    & overflow(latency-1 downto 1);
               sync     <= sync_in        & sync(latency-1 downto 1);
               ca       <= signed(ca_in)  & ca(latency-1 downto 1);  
               cb       <= signed(cb_in)  & cb(latency-1 downto 1);
               a        <= signed(a_in)   & a(latency-1 downto 1);
               b        <= signed(b_in)   & b(latency-1 downto 1);
               i        <= unsigned(i_in) & i(latency-1 downto 1);
               
               overflow_out <= overflow(0);
               sync_out     <= sync(0);
               ca_out       <= std_logic_vector(ca(0));
               cb_out       <= std_logic_vector(cb(0));
               a_out        <= std_logic_vector(a(0));
               b_out        <= std_logic_vector(b(0)); 
               i_out        <= std_logic_vector(i(0));
           else
               overflow <= overflow(0) & overflow(latency-1 downto 1);
               sync     <= sync(0)     & sync(latency-1 downto 1);
               ca       <= ca(0)       & ca(latency-1 downto 1);  
               cb       <= cb(0)       & cb(latency-1 downto 1);
               a        <= a(0)        & a(latency-1 downto 1);
               b        <= b(0)        & b(latency-1 downto 1);
               i        <= i(0)        & i(latency-1 downto 1);
           end if;
           phase <= phase(0) & phase(phase'high downto 1);

           ----------------------------------------------------
           -- Now do the same for the working storage
           ----------------------------------------------------
           a_times_b      <= a_times_b(latency-1)      & a_times_b(latency-1 downto 1);

           a_abs          <= a_abs(latency-1)          & a_abs(latency-1 downto 1);
           a_squared      <= a_squared(latency-1)      & a_squared(latency-1 downto 1);
           a_squared_hh(3 downto 0) <= a_squared_hh(4 downto 1);
           a_squared_hl(4 downto 0) <= a_squared_hl(5 downto 1);
           a_squared_ll(4 downto 0) <= a_squared_ll(5 downto 1);

           b_abs          <= b_abs(latency-1)          & b_abs(latency-1 downto 1);
           b_squared      <= b_squared(latency-1)      & b_squared(latency-1 downto 1);
           b_squared_hh(2 downto 0) <= b_squared_hh(4 downto 1);
           b_squared_hl(4 downto 0) <= b_squared_hl(5 downto 1);
           b_squared_ll(4 downto 0) <= b_squared_ll(5 downto 1);
           magnitude      <= magnitude(latency-1)      & magnitude(latency-1 downto 1);

-------------------
-- Pipeline stage 0
-------------------
           ----------------------------------------------------
           -- Increase the iteration count, as long as we have
           -- not had an overflow. We don't need to check if 'i' 
           -- will roll over as we will have a fixed number of 
           -- stages it will pass though.
           ----------------------------------------------------
           if overflow(1) = '0' then
             i(0) <= i(1)+1;
           end if;

           ----------------------------------------------------
           -- Add on the constants to the real and imaginary
           -- parts
           ----------------------------------------------------
           a(0) <= a(1) + ca(1);
           b(0) <= b(1) + cb(1);

-------------------
-- Pipeline stage 1
-------------------
           --------------------------------------
           -- Check for overflow in the magnitude
           --------------------------------------
           if magnitude(2)(magnitude(2)'high downto magnitude(2)'high-1) /= "00" then
             overflow(1) <= '1';
           end if;
-------------------
-- Pipeline stage 2
-------------------
           ----------------------------------------------------
           -- Compute
           --  a <= a*a-b*b
           --  b <= 2*a*b;
           ----------------------------------------------------
           a(2) <= (others => '0');
           a(2)(a(2)'high downto a(2)'high-mult_size+1) <= signed(a_squared(3)(65 downto 65-mult_size+1) - b_squared(3)(65 downto 65-mult_size+1));
           b(2) <= (others => '0');
           b(2)(b(2)'high downto b(2)'high-mult_size+1) <= a_times_b(3)(64 downto 64-mult_size+1); -- Note - implicit scaling by 2 in the bit slice used
           magnitude(2) <= (others => '0');
           magnitude(2)(magnitude(2)'high downto magnitude(2)'high-mult_size+1) <= a_squared(3)(65 downto 65-mult_size+1) + b_squared(3)(65 downto 65-mult_size+1);
         
           if a_squared(3)(a_squared(3)'high downto a_squared(3)'high-5) /= "000000" or 
              b_squared(3)(b_squared(3)'high downto b_squared(3)'high-5) /= "000000"  then
             overflow(2) <= '1';
           end if;

-------------------
-- Pipeline stage 3
-------------------
            -- No processing done to allow for pipelinging the output of the multipliers
            a_squared(3) <=  a_squared(4) + (a_squared_hh(4) & "00" & x"00000000"); 
            b_squared(3) <=  b_squared(4) + (b_squared_hh(4) & "00" & x"00000000"); 
-------------------
-- Pipeline stage 4
-------------------
            -- No processing done to allow for pipelinging the output of the multipliers 
            A_squared(4) <= (others => '0'); 
            a_squared(4)(52 downto 0) <=  ("0" & a_squared_hl(5) & "00" & x"0000") + a_squared_ll(5);
            b_squared(4) <= (others => '0'); 
            b_squared(4)(52 downto 0) <=  ("0" & b_squared_hl(5) & "00" & x"0000") + b_squared_ll(5); 
-------------------
-- Pipeline stage 8
-------------------

--   These have been replaced with explicitly defined multipliers
--   to optimize resource usage.
--              
--            a_squared_orig(8) <= a(9) * a(9);
--            b_squared_orig(8) <= b(9) * b(9);

            a_times_b(8) <= a(9) * b(9);
            if a(9)(a(9)'high) /= a(9)(a(9)'high-1) or b(9)(b(9)'high) /= b(9)(b(9)'high-1) then
             overflow(8) <= '1';
            end if;   
-------------------
-- Pipeline stage 10
-------------------
            if a(11)(a(11)'high) = '1' then
                a_abs(10) <= unsigned((not a(11)) + 1);
            else
                a_abs(10) <= unsigned(a(11));
            end if;

            if b(11)(b(11)'high) = '1' then
                b_abs(10) <= unsigned((not b(11)) + 1);
            else
                b_abs(10) <= unsigned(b(11));
            end if;
        end if;
    end process;
    
    
m_a_hh: mult_u17_u17_l5_lut PORT MAP (
        clk => clk,
        a => std_logic_vector(a_abs(9)(33 downto 17)),
        b => std_logic_vector(a_abs(9)(33 downto 17)),
        p => a_s_hh
    );
    a_squared_hh(4) <= unsigned(a_s_hh);

m_a_hl: mult_u17_u17_l4 PORT MAP (
        clk => clk,
        a => std_logic_vector(a_abs(9)(33 downto 17)),
        b => std_logic_vector(a_abs(9)(16 downto  0)),
        p => a_s_hl
    );
    a_squared_hl(5) <= unsigned(a_s_hl);

-- NOTE - this is picking up the value one cycle early
-- (and using a multiplier with a latency of 5) to allow
-- optimal performance and meet timing on a Artix-7 -1 part
m_a_ll: mult_u17_u17_l5_lut PORT MAP (
        clk => clk,
        a => std_logic_vector(a_abs(10)(16 downto  0)),
        b => std_logic_vector(a_abs(10)(16 downto  0)),
        p => a_s_ll
    );
    a_squared_ll(5) <= unsigned(a_s_ll);

m_b_hh: mult_u17_u17_l5_lut PORT MAP (
        clk => clk,
        a => std_logic_vector(b_abs(9)(33 downto 17)),
        b => std_logic_vector(b_abs(9)(33 downto 17)),
        p => b_s_hh
    );
    b_squared_hh(4) <= unsigned(b_s_hh);
m_b_hl: mult_u17_u17_l4 PORT MAP (
        clk => clk,
        a => std_logic_vector(b_abs(9)(33 downto 17)),
        b => std_logic_vector(b_abs(9)(16 downto  0)),
        p => b_s_hl
    );
    b_squared_hl(5) <= unsigned(b_s_hl);

m_b_ll: mult_u17_u17_l4 PORT MAP (
        clk => clk,
        a => std_logic_vector(b_abs(9)(16 downto 0)),
        b => std_logic_vector(b_abs(9)(16 downto 0)),
        p => b_s_ll
    );
    b_squared_ll(5) <= unsigned(b_s_ll);                      
end architecture;

vga_output.vhd

-----------------------------------------------------------------------------
-- Project: mandelbrot_ng - my next-gen FPGA Mandelbrot Fractal Viewer
--
-- File : stage.vhd
--
-- Author : Mike Field <hamster@snap.net.nz>
--
-- Date    : 9th May 2015
--
-- Convert the iteration count into a colour
--
----------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity vga_output is
    Generic ( 
        pixel_len : integer
        );
    Port ( clk : in STD_LOGIC;
           hsync_in : in STD_LOGIC;
           vsync_in : in STD_LOGIC;
           blank_in : in STD_LOGIC;
           iterations_in : in STD_LOGIC_VECTOR(7 downto 0);
           vga_hsync : out std_logic;
           vga_vsync : out std_logic;
           vga_red   : out std_logic_vector(3 downto 0);
           vga_green : out std_logic_vector(3 downto 0);
           vga_blue  : out std_logic_vector(3 downto 0));
end vga_output;

architecture Behavioral of vga_output is
    signal phase  : std_logic_vector(pixel_len-1 downto 0) := (0 => '1', others => '0');
    signal colour : std_logic_vector(11 downto 0) := (others => '0');
    type a_palette is array(0 to 255) of std_logic_vector(11 downto 0);
    signal palette : a_palette := (
        x"000",x"001",x"002",x"003",  x"004",x"005",x"006",x"007",
        x"008",x"009",x"00A",x"00B",  x"00C",x"00D",x"00E",x"00F",
        x"01F",x"02F",x"03F",x"04F",  x"05F",x"06F",x"07F",x"08F",
        x"09F",x"0AF",x"0BF",x"0CF",  x"0DF",x"0EF",x"0FF",x"0FE",
--32
        x"0FD",x"0FC",x"0FB",x"0FA",  x"0F9",x"0F8",x"0F7",x"0F6",
        x"0F5",x"0F4",x"0F3",x"0F2",  x"0F1",x"0F0",x"1F0",x"2F0",
        x"3F0",x"4F0",x"5F0",x"6F0",  x"7F0",x"9F0",x"9F0",x"AF0",
        x"BF0",x"CF0",x"DF0",x"EF0",  x"EF0",x"FF0",x"EE1",x"DD2",
--64
        x"CC3",x"BB4",x"AA5",x"996",  x"887",x"778",x"669",x"55A",
        x"44B",x"33C",x"22D",x"11E",  x"00F",x"10F",x"20F",x"30F",
        x"40F",x"50F",x"60F",x"70F",  x"80F",x"90F",x"A0F",x"B0F",
        x"C0F",x"D0F",x"E0F",x"F0F",  x"f1f",x"F2F",x"F3F",x"F4F",
--96
        x"F5F",x"F6F",x"F7F",x"F8F",  x"F9F",x"FAF",x"FBF",x"FCF",
        x"FDF",x"FEF",x"FFF",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
-- 128                                 
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
-- 160
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
-- 192
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
-- 224
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000",
        x"000",x"000",x"000",x"000",  x"000",x"000",x"000",x"000");
begin
    vga_red   <= colour(11 downto 8);
    vga_green <= colour( 7 downto 4);
    vga_blue  <= colour( 3 downto 0);
       
vga_buffer: process(clk)
    begin
        if rising_edge(clk) then
            if phase(1) = '1' then
                vga_hsync <= not hsync_in; -- Negative sync pulse
                vga_vsync <= not vsync_in; -- Negative sync pulse
                if blank_in = '0' then  
                    colour <= palette(to_integer(unsigned(iterations_in)));
                else
                    colour <= (others =>'0');
                end if;
            end if;            
            -- Control the loop's phase
            phase <= phase(0) & phase(phase'high downto 1);
        end if;
    end process;

end Behavioral;

basys3.xdc

###########################
# Project: mandelbrot_ng - my next-gen FPGA Mandelbrot Fractal Viewer
#
# File : basys3.vhd
#
# Author : Mike Field <hamster@snap.net.nz>
#
###########################

set_property PACKAGE_PIN W5 [get_ports clk100]							
	set_property IOSTANDARD LVCMOS33 [get_ports clk100]
	create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clk100]

##Buttons
set_property PACKAGE_PIN U18 [get_ports btnC]						
	set_property IOSTANDARD LVCMOS33 [get_ports btnC]
set_property PACKAGE_PIN T18 [get_ports btnU]						
	set_property IOSTANDARD LVCMOS33 [get_ports btnU]
set_property PACKAGE_PIN W19 [get_ports btnL]						
	set_property IOSTANDARD LVCMOS33 [get_ports btnL]
set_property PACKAGE_PIN T17 [get_ports btnR]						
	set_property IOSTANDARD LVCMOS33 [get_ports btnR]
set_property PACKAGE_PIN U17 [get_ports btnD]						
	set_property IOSTANDARD LVCMOS33 [get_ports btnD]
		
##VGA Connector
    set_property PACKAGE_PIN G19 [get_ports {vga_red[0]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_red[0]}]
    set_property PACKAGE_PIN H19 [get_ports {vga_red[1]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_red[1]}]
    set_property PACKAGE_PIN J19 [get_ports {vga_red[2]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_red[2]}]
    set_property PACKAGE_PIN N19 [get_ports {vga_red[3]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_red[3]}]
    set_property PACKAGE_PIN N18 [get_ports {vga_blue[0]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_blue[0]}]
    set_property PACKAGE_PIN L18 [get_ports {vga_blue[1]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_blue[1]}]
    set_property PACKAGE_PIN K18 [get_ports {vga_blue[2]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_blue[2]}]
    set_property PACKAGE_PIN J18 [get_ports {vga_blue[3]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_blue[3]}]
    set_property PACKAGE_PIN J17 [get_ports {vga_green[0]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_green[0]}]
    set_property PACKAGE_PIN H17 [get_ports {vga_green[1]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_green[1]}]
    set_property PACKAGE_PIN G17 [get_ports {vga_green[2]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_green[2]}]
    set_property PACKAGE_PIN D17 [get_ports {vga_green[3]}]                
        set_property IOSTANDARD LVCMOS33 [get_ports {vga_green[3]}]
    set_property PACKAGE_PIN P19 [get_ports vga_hsync]                        
        set_property IOSTANDARD LVCMOS33 [get_ports vga_hsync]
    set_property PACKAGE_PIN R19 [get_ports vga_vsync]                        
        set_property IOSTANDARD LVCMOS33 [get_ports vga_vsync]

Prebuilt file for Basys3

Here is a taster of it, but only if you own a Basys3 board.

Mandelbrot ng basys3 usage.jpg

File:Mandelbrog ng basys3.zip

Use the buttons to scroll, and then the left+center or right+center to zoom.

I've still got to fix up the user interface and colour palette...

Personal tools