Minimal DVI-D

From Hamsterworks Wiki!

Jump to: navigation, search

This FPGA Project was completed in March 2015.

Minimal dvid.jpg

In the good old days you could generate video signals for VGA with ease - just set up the appropriate video clock, waggle the horizontal and vertical signals appropriately, and then send low colour depth VGA using DACs made out of a few resistors. It is not so easy now that most monitors and more and more FPGA development boards have all digital interfaces, forcing you to implement and debug complex high speed physical layers and protocols to bring up the simplest of displays.

Well, the good old days are (almost) back!

By keeping the resolution relatively low, and by using only a carefully selecting subset of TMDS symbols to use, the simplest of interfaces (DVI-D) can be up and running in a couple of pages of VHDL.

If you find this project interesting, you might also be interested in my Minimal HDMI project.

Contents

What are DVI-D and TMDS

DVI-D (and HDMI) carry video data over three high-speed serial channels and a clock channel. For each pixel 8 bits is sent for each of the pixel's red, green or blue components, along with the pixel clock. To enable the receiver to correctly synchronize the incoming data the 8-bit values are converted to 10-bit values using a coding scheme called TMDS. For pixel data, these symbols all have four or less edges (e.g. for example 0001111100 contains two edges).

For the 10-bit symbols where the number of 1s and 0s is unbalanced two alternate symbols can be used, one with more ones than zeros, and one with less ones than zeros. These symbols are almost the inverse of each other, allowing the source to generate a symbol that averages out at 50% ones and 50% zeros even if the entire picture is completely black.

Along with the video data, there are four CTL (control period) symbols that are sent when the data channels are not sending pixels. These all have five 1s and five 0s, and have 7 or more edges, and have been selected to allow unique identification of the start and end of the 10-bit symbols. In addition to this role, the CTL frames on channel 0 are used to carry the horizontal and vertical sync signals, whereas channels 1 and 2 they are just send the same symbol over and over until the next pixel needs to be sent..

Running disparity

As already mentioned, by counting the number of ones and zeros sent over time, and then when an option exists selecting the appropriate symbol to send the number of 1s and 0s can be kept very close to 50%. The is great as even with a greatly attenuated signal the receiver will still see the incoming signal waggling up and below the medium-term average of a dozen or so symbols - if there was long runs of just 0s or just 1s this average would drift, making receiving the signal much harder.

Keeping track of this long-term average is most of the hard work in implementing DVI-D on an FPGA - you have a very short feedback loop where you have to accumulate the count the ones and zeros, then select which symbol to send next to keep the disparity at a minimum. Getting this bit right is really quite tricky, as the protocol standard expresses it in quite a functional way (see page 29 of http://www.cs.unc.edu/~stc/FAQs/Video/dvi_spec-V1_0.pdf if you want to make your head hurt!).

My hack

The true minimal implementation of DVI-D would require the ability to send six symbols - the four control symbols and two pixel values:

  • CTL0 = 1101010100
  • CTL1 = 0010101011
  • CTL2 = 0101010100
  • CTL3 = 1010101011
  • Pixel value #1 - 0111110000 - the TMDS symbol for 0x10
  • Pixel value #2 - 1011110000 - the TMDS symbol for 0xEF

These two pixel values where chosen as they are the highest and lowest values that are 'balanced' (with five 1s and 0s), which eliminates the need to count the bits in the stream.

A (very short) line of video could be

Channel 0 : 1101010100-1101010100-0010101011-0010101011-1101010100-1101010100-1011110000-0111110000-0111110000-1011110000

Channel 1 : 1101010100-1101010100-1101010100-1101010100-1101010100-1101010100-1011110000-0111110000-1011110000-0111110000

Channel 2 : 1101010100-1101010100-1101010100-1101010100-1101010100-1101010100-1011110000-0111110000-0111110000-1011110000

This can be decoded as:

Channel 0 : CTL0-CTL0-CLTL1-CTL1-CTL0-CTL0-0xEF-0x10-0x10-0xEF

Channel 1 : CTL0-CTL0-CLTL0-CTL0-CTL0-CTL0-0xEF-0x10-0xEF-0x10

Channel 2 : CTL0-CTL0-CLTL0-CTL0-CTL0-CTL0-0xEF-0x10-0x10-0xEF

However, 8-colour displays is just a bit too old school, so I've found eight symbols that are more-or-less evenly spaced between the high and low values, and can send nine bits per pixel.

With that out of the way, you should now be able to follow what is going on!

Size

This is very compact - it requires only 17 Spartan 6 slices (72 registers and 34 LUTs) and one PLL to implement. This is less than half a percent of a Xilinx Spartan 6 LX9, and it only requires 8 I/O pins too!

Enhancements

Here are some possible enhancements that can be made:

  • Add the 16 HDMI TERC4 symbols, the guard band symbols, and then add the ability to send a Video Infoframe. This will allow you to switch to "studio levels" (16-239), allowing access to the full range of the display with the existing TMDS codes. Once working it could also allow you to send 8-channel audio packets to the display for audio too.
  • Implement a proper TMDS encoder, allowing access to 24-bit RGB.
  • Use a serializer to generate the output - at the moment using DDR limits the design to about 50MHz pixel clocks. If you are using a serialiser you can use the full bandwidth of the output pins (1,050 Mb/s for Spartan 6).
  • Extend the existing system to 4 bits per channel, or reduce the blue channel to 2 bits, giving the equivilent of the 8-bit analog VGA more commonly found on FPGA dev boards.

Source file

Here are two versions of the source. They both implement exactly the same thing but in different ways.

The first uses a couple of 'for loops' and 'for generate' structures to keep the code small:

-------------------------------------------------------------------
-- minimalDVID_encoder.vhd : A quick and dirty DVI-D implementation
--
-- Author: Mike Field <hamster@snap.net.nz>
--
-- DVI-D uses TMDS as the 'on the wire' protocol, where each 8-bit
-- value is mapped to one or two 10-bit symbols, depending on how
-- many 1s or 0s have been sent. This makes it a DC balanced protocol,
-- as a correctly implemented stream will have (almost) an equal 
-- number of 1s and 0s. 
--
-- Because of this implementation quite complex. By restricting the 
-- symbols to a subset of eight symbols, all of which having have 
-- five ones (and therefore five zeros) this complexity drops away
-- leaving a simple implementation. Combined with a DDR register to 
-- send the symbols the complexity is kept very low.
--
-------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
library UNISIM;
use UNISIM.VComponents.all;

entity MinimalDVID_encoder is
    Port ( clk                   : in  STD_LOGIC;
           hsync,  vsync,  blank : in  STD_LOGIC;
           red,    green,  blue  : in  STD_LOGIC_VECTOR (2 downto 0);
           hdmi_p, hdmi_n        : out STD_LOGIC_VECTOR (3 downto 0));
end MinimalDVID_encoder;

architecture Behavioral of MinimalDVID_encoder is
   type a_symbols     is array (0 to 3) of std_logic_vector(9 downto 0);
   type a_colours     is array (0 to 2) of std_logic_vector(2 downto 0);
   type a_ctls        is array (0 to 2) of std_logic_vector(1 downto 0);
   type a_output_bits is array (0 to 3) of std_logic_vector(1 downto 0);

   signal symbols        : a_symbols     := (others => (others => '0'));
   signal high_speed_sr  : a_symbols     := (others => (others => '0'));
   signal colours        : a_colours     := (others => (others => '0'));
   signal ctls           : a_ctls        := (others => (others => '0'));
   signal output_bits    : a_output_bits := (others => (others => '0'));
   
   -- Controlling when the transfers into the high speed domain occur
   signal latch_high_speed : std_logic_vector(4 downto 0) := "00001";
   
   -- The signals from the DDR outputs to the output buffers
   signal serial_outputs : std_logic_vector(3 downto 0);

   -- For generating the x5 clocks
   signal clk_x5,  clk_x5_unbuffered  : std_logic;
   signal clk_feedback    : std_logic;

begin
   ctls(0) <= vsync & hsync; -- syncs are set in the channel 0 CTL periods

   colours(0) <= blue;
   colours(1) <= green;
   colours(2) <= red;

   symbols(3) <= "0000011111"; -- the clock channel symbol is static

clk_proc: process(clk)
   begin
      if rising_edge(clk) then
         for i in 0 to 2 loop
            if blank = '1' then
               case ctls(i) is 
                  when "00"   => symbols(i) <= "1101010100";
                  when "01"   => symbols(i) <= "0010101011";
                  when "10"   => symbols(i) <= "0101010100";
                  when others => symbols(i) <= "1010101011";      
               end case;
            else
               case colours(i) is 
                  ---  Colour                   TMDS symbol   Value 
                  when "000"  => symbols(i) <= "0111110000"; -- 0x10
                  when "001"  => symbols(i) <= "0001001111"; -- 0x2F
                  when "010"  => symbols(i) <= "0111001100"; -- 0x54
                  when "011"  => symbols(i) <= "0010001111"; -- 0x6F
                  when "100"  => symbols(i) <= "0000101111"; -- 0x8F
                  when "101"  => symbols(i) <= "1000111001"; -- 0xB4
                  when "110"  => symbols(i) <= "1000011011"; -- 0xD2
                  when others => symbols(i) <= "1011110000"; -- 0xEF
               end case;
            end if;
         end loop;
       end if;
   end process;

process(clk_x5)
   begin
      ---------------------------------------------------------------
      -- Now take the 10-bit words and take it into the high-speed
      -- clock domain once every five cycles. 
      -- 
      -- Then send out two bits every clock cycle using DDR output
      -- registers.
      ---------------------------------------------------------------   
      if rising_edge(clk_x5) then
         for i in 0 to 3 loop
            output_bits(i)  <= high_speed_sr(i)(1 downto 0);
            if latch_high_speed(0) = '1' then
               high_speed_sr(i) <= symbols(i);
            else
               high_speed_sr(i) <= "00" & high_speed_sr(i)(9 downto 2);
            end if;
         end loop;
         latch_high_speed <= latch_high_speed(0) & latch_high_speed(4 downto 1);
      end if;
   end process;

g1:   for i in 0 to 3 generate
   --------------------------------------------------------
   -- Convert the TMDS codes into a serial stream, two bits 
   -- at a time using a DDR register
   --------------------------------------------------------
      to_serial: ODDR2
         generic map(DDR_ALIGNMENT => "C0", INIT => '0', SRTYPE => "ASYNC") 
         port map (C0 => clk_x5,  C1 => not clk_x5, CE => '1', R => '0', S => '0',
                   D0 => output_bits(i)(0), D1 => output_bits(i)(1), Q => serial_outputs(i));
      OBUFDS_c0  : OBUFDS port map ( O  => hdmi_p(i), OB => hdmi_n(i), I => serial_outputs(i));
   end generate;
    
   ------------------------------------------------------------------
   -- Use a PLL to generate a x5 clock, which is used to drive 
   -- the DDR registers.This allows 10 bits to be sent for every 
   -- pixel clock
   ------------------------------------------------------------------
PLL_BASE_inst : PLL_BASE generic map (
      CLKFBOUT_MULT => 10,                  
      CLKOUT0_DIVIDE => 2,
      CLKOUT0_PHASE => 0.0,   -- Output 5x original frequency
      CLK_FEEDBACK => "CLKFBOUT",
      CLKIN_PERIOD => 13.33,
      DIVCLK_DIVIDE => 1
   ) port map (
      CLKFBOUT => clk_feedback, 
      CLKOUT0  => clk_x5_unbuffered,
      CLKFBIN  => clk_feedback,    
      CLKIN    => clk, 
      RST      => '0'
   );

BUFG_pclkx5  : BUFG port map ( I => clk_x5_unbuffered,  O => clk_x5);

end Behavioral;

The second is more explicit, and might be more suited to experimentation if you want to do different things on different channels, like making an 8-bit (RRRGGGBB) VGA output.

-------------------------------------------------------------------
-- minimalDVID_encoder.vhd : A quick and dirty DVI-D implementation
--
-- Author: Mike Field <hamster@snap.net.nz>
--
-- DVI-D uses TMDS as the 'on the wire' protocol, where each 8-bit
-- value is mapped to one or two 10-bit symbols, depending on how
-- many 1s or 0s have been sent. This makes it a DC balanced protocol,
-- as a correctly implemented stream will have (almost) an equal 
-- number of 1s and 0s. 
--
-- Because of this implementation quite complex. By restricting the 
-- symbols to a subset of eight symbols, all of which having have 
-- five ones (and therefore five zeros) this complexity drops away
-- leaving a simple implementation. Combined with a DDR register to 
-- send the symbols the complexity is kept very low.
--
-------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
library UNISIM;
use UNISIM.VComponents.all;

entity MinimalDVID_encoder is
    Port ( clk    : in  STD_LOGIC;
           blank  : in  STD_LOGIC;
           hsync  : in  STD_LOGIC;
           vsync  : in  STD_LOGIC;
           red    : in  STD_LOGIC_VECTOR (2 downto 0);
           green  : in  STD_LOGIC_VECTOR (2 downto 0);
           blue   : in  STD_LOGIC_VECTOR (2 downto 0);
           hdmi_p : out STD_LOGIC_VECTOR (3 downto 0);
           hdmi_n : out STD_LOGIC_VECTOR (3 downto 0));
end MinimalDVID_encoder;

architecture Behavioral of MinimalDVID_encoder is
   -- For holding the outward bound TMDS symbols in the slow and fast domain
   signal c0_symbol, c0_high_speed  : std_logic_vector(9 downto 0) := (others => '0');
   signal c1_symbol, c1_high_speed  : std_logic_vector(9 downto 0) := (others => '0');
   signal c2_symbol, c2_high_speed  : std_logic_vector(9 downto 0) := (others => '0');   
   signal clk_high_speed            : std_logic_vector(9 downto 0) := (others => '0');
   signal c2_output_bits            : std_logic_vector(1 downto 0) := "00";
   signal c1_output_bits            : std_logic_vector(1 downto 0) := "00";
   signal c0_output_bits            : std_logic_vector(1 downto 0) := "00";
   signal clk_output_bits           : std_logic_vector(1 downto 0) := "00";

   -- Controlling the transfers into the high speed domain
   signal latch_high_speed : std_logic_vector(4 downto 0) := "00001";
   
   -- From the DDR outputs to the output buffers
   signal c0_serial, c1_serial, c2_serial, clk_serial : std_logic;

   -- For generating the x5 clocks
   signal clk_x5,  clk_x5_unbuffered  : std_logic;
   signal clk_feedback    : std_logic;

   -- To glue the HSYNC and VSYNC into the control character.
   signal syncs           : std_logic_vector(1 downto 0);

begin
   syncs <= vsync & hsync;

clk_proc: process(clk)
   begin
      if rising_edge(clk) then
         -----------------------------------------------
         -- Channel 0 carries the blue pixels, and also
         -- includes the HSYNC and VSYNCs during
         -- the CTL (blanking) periods.
        -----------------------------------------------
         if blank = '1' then
            case syncs is 
               when "00"   => c0_symbol <= "1101010100";
               when "01"   => c0_symbol <= "0010101011";
               when "10"   => c0_symbol <= "0101010100";
               when others => c0_symbol <= "1010101011";      
            end case;
         else
            case blue is 
               ---  Colour                   TMDS symbol   Value 
               when "000"  => c0_symbol <= "0111110000"; -- 0x10
               when "001"  => c0_symbol <= "0001001111"; -- 0x2F
               when "010"  => c0_symbol <= "0111001100"; -- 0x54
               when "011"  => c0_symbol <= "0010001111"; -- 0x6F
               when "100"  => c0_symbol <= "0000101111"; -- 0x8F
               when "101"  => c0_symbol <= "1000111001"; -- 0xB4
               when "110"  => c0_symbol <= "1000011011"; -- 0xD2
               when others => c0_symbol <= "1011110000"; -- 0xEF
            end case;
         end if;

         -----------------------------------------------
         -- Channel 1 carries the Green pixels
         -----------------------------------------------
         if blank = '1' then
            c1_symbol <= "1101010100";
         else
            case green is 
               when "000"  => c1_symbol <= "0111110000"; -- 0x10
               when "001"  => c1_symbol <= "0001001111"; -- 0x2F
               when "010"  => c1_symbol <= "0111001100"; -- 0x54
               when "011"  => c1_symbol <= "0010001111"; -- 0x6F
               when "100"  => c1_symbol <= "0000101111"; -- 0x8F
               when "101"  => c1_symbol <= "1000111001"; -- 0xB4
               when "110"  => c1_symbol <= "1000011011"; -- 0xD2
               when others => c1_symbol <= "1011110000"; -- 0xEF
            end case;
         end if;

        -----------------------------------------------
         -- Channel 2 carries the Red pixels
         -----------------------------------------------
          if blank = '1' then
            c2_symbol <= "1101010100";
         else
            case red is 
               when "000"  => c2_symbol <= "0111110000"; -- 0x10
               when "001"  => c2_symbol <= "0001001111"; -- 0x2F
               when "010"  => c2_symbol <= "0111001100"; -- 0x54
               when "011"  => c2_symbol <= "0010001111"; -- 0x6F
               when "100"  => c2_symbol <= "0000101111"; -- 0x8F
               when "101"  => c2_symbol <= "1000111001"; -- 0xB4
               when "110"  => c2_symbol <= "1000011011"; -- 0xD2
               when others => c2_symbol <= "1011110000"; -- 0xEF
            end case;
          end if;
       end if;
   end process;

process(clk_x5)
   begin
      ---------------------------------------------------------------
      -- Now take the 10-bit words and take it into the high-speed
      -- clock domain once every five cycles. 
      -- 
      -- Then send out two bits every clock cycle using DDR output
      -- registers.
      ---------------------------------------------------------------   
      if rising_edge(clk_x5) then
         c0_output_bits  <= c0_high_speed(1 downto 0);
         c1_output_bits  <= c1_high_speed(1 downto 0);
         c2_output_bits  <= c2_high_speed(1 downto 0);
         clk_output_bits <= clk_high_speed(1 downto 0);

         if latch_high_speed(0) = '1' then
            c0_high_speed   <= c0_symbol;
            c1_high_speed   <= c1_symbol;
            c2_high_speed   <= c2_symbol;
            clk_high_speed  <= "0000011111";
         else
            c0_high_speed   <= "00" & c0_high_speed(9 downto 2);
            c1_high_speed   <= "00" & c1_high_speed(9 downto 2);
            c2_high_speed   <= "00" & c2_high_speed(9 downto 2);
            clk_high_speed  <= "00" & clk_high_speed(9 downto 2);
         end if;
         latch_high_speed <= latch_high_speed(0) & latch_high_speed(4 downto 1);
      end if;
   end process;

   ------------------------------------------------------------------
   -- Convert the TMDS codes into a serial stream, two bits at a time
   ------------------------------------------------------------------
c0_to_serial: ODDR2
   generic map(DDR_ALIGNMENT => "C0", INIT => '0', SRTYPE => "ASYNC") 
   port map (C0 => clk_x5,  C1 => not clk_x5, CE => '1', R => '0', S => '0',
             D0 => C0_output_bits(0), D1 => C0_output_bits(1), Q => c0_serial);
OBUFDS_c0  : OBUFDS port map ( O  => hdmi_p(2), OB => hdmi_n(2), I => c0_serial);

c1_to_serial: ODDR2
   generic map(DDR_ALIGNMENT => "C0", INIT => '0', SRTYPE => "ASYNC") 
   port map (C0 => clk_x5,  C1 => not clk_x5, CE => '1', R => '0', S => '0',
             D0 => C1_output_bits(0), D1 => C1_output_bits(1), Q  => c1_serial);
OBUFDS_c1  : OBUFDS port map ( O  => hdmi_p(1), OB => hdmi_n(1), I => c1_serial);
   
c2_to_serial: ODDR2
   generic map(DDR_ALIGNMENT => "C0", INIT => '0', SRTYPE => "ASYNC") 
   port map (C0 => clk_x5,  C1 => not clk_x5, CE => '1', R => '0', S => '0',
             D0 => C2_output_bits(0), D1 => C2_output_bits(1), Q  => c2_serial);
OBUFDS_c2  : OBUFDS port map ( O  => hdmi_p(0), OB => hdmi_n(0), I => c2_serial);

clk_to_serial: ODDR2
   generic map(DDR_ALIGNMENT => "C0", INIT => '0', SRTYPE => "ASYNC") 
   port map (C0 => clk_x5,  C1 => not clk_x5, CE => '1', R => '0', S => '0',
             D0 => Clk_output_bits(0), D1 => Clk_output_bits(1), Q  => clk_serial);
OBUFDS_clk : OBUFDS port map ( O  => hdmi_p(3), OB => hdmi_n(3), I => clk_serial);
   
   ------------------------------------------------------------------
   -- Use a PLL to generate a x5 clock, which is used to drive 
   -- the DDR registers.This allows 10 bits to be sent for every 
   -- pixel clock
   ------------------------------------------------------------------
PLL_BASE_inst : PLL_BASE
   generic map (
      CLKFBOUT_MULT => 10,                  
      CLKOUT0_DIVIDE => 2,       CLKOUT0_PHASE => 0.0,   -- Output 5x original frequency
      CLK_FEEDBACK => "CLKFBOUT",
      CLKIN_PERIOD => 13.33,
      DIVCLK_DIVIDE => 1
   )
      port map (
      CLKFBOUT => clk_feedback, 
      CLKOUT0  => clk_x5_unbuffered,
      CLKFBIN  => clk_feedback,    
      CLKIN    => clk, 
      RST      => '0'
   );

BUFG_pclkx5  : BUFG port map ( I => clk_x5_unbuffered,  O => clk_x5);

end Behavioral;

Using it

Using it is pretty simple - just plumb it in as you would with a standard VGA port, and take the TMDS pairs to the outside world.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity dvid_output_test is
    Port ( clk50         : in  STD_LOGIC;

           hdmi_out_p : out  STD_LOGIC_VECTOR(3 downto 0);
           hdmi_out_n : out  STD_LOGIC_VECTOR(3 downto 0));
end dvid_output_test;

architecture Behavioral of dvid_output_test is

   COMPONENT vga_gen
   PORT(
      clk50           : IN std_logic;          
      pixel_clock     : OUT std_logic;
      red_p           : OUT std_logic_vector(7 downto 0);
      green_p         : OUT std_logic_vector(7 downto 0);
      blue_p          : OUT std_logic_vector(7 downto 0);
      blank           : OUT std_logic;
      hsync           : OUT std_logic;
      vsync           : OUT std_logic
      );
   END COMPONENT;


   COMPONENT MinimalDVID_encoder
   PORT(
      clk : IN std_logic;
      blank : IN std_logic;
      hsync : IN std_logic;
      vsync : IN std_logic;
      red : IN std_logic_vector(2 downto 0);
      green : IN std_logic_vector(2 downto 0);
      blue : IN std_logic_vector(2 downto 0);          
      hdmi_p : OUT std_logic_vector(3 downto 0);
      hdmi_n : OUT std_logic_vector(3 downto 0)
      );
   END COMPONENT;
      
   signal pixel_clock     : std_logic;

   signal red_p   : std_logic_vector(7 downto 0);
   signal green_p : std_logic_vector(7 downto 0);
   signal blue_p  : std_logic_vector(7 downto 0);
   signal blank   : std_logic;
   signal hsync   : std_logic;
   signal vsync   : std_logic;          

begin

---------------------------------------
-- Generate a 800x600 VGA test pattern
---------------------------------------
Inst_vga_gen: vga_gen PORT MAP(
      clk50 => clk50,
      pixel_clock     => pixel_clock,      
      red_p           => red_p,
      green_p         => green_p,
      blue_p          => blue_p,
      blank           => blank,
      hsync           => hsync,
      vsync           => vsync
   );

---------------------------------------------------
-- Convert 9 bits of the VGA signals to the DVI-D/TMDS output 
---------------------------------------------------
Inst_MinimalDVID_encoder: MinimalDVID_encoder PORT MAP(
      clk    => pixel_clock,
      blank  => blank,
      hsync  => hsync,
      vsync  => vsync,
      red    => red_p(7 downto 5),
      green  => green_p(7 downto 5),
      blue   => blue_p(7 downto 5),
      hdmi_p => hdmi_out_p,
      hdmi_n => hdmi_out_n
   );

end Behavioral;

Personal tools