overview software hardware service


spe_gen2
enlarge


Some great advantages of our second generation SPE:

  - most flexible and fastest routing concept (unlimited routing)

  - for co-simulation it provides a high speed interface (PCIe Gen2)

  - fastest interface for FPGA debugging (4Gbyte/sec)

  - two 4GB DDR3 SDRAM on board




The SPE-Systems (System-Prototyping-Emulator) have a pre-assembled collection of FPGA-Boards and Switch-Routing-Boards in between. The second generation SPE Gen2 is plugged on a motherboard with 7 PCIe Gen2 slots providing a wide range of technology enhancements to our customers. SPE Gen2 also supports two 4GB DDR3 SDRAMs on each FPGA board.

Structure:

The Complete SPE-Systems can host FPGA-Boards with up to 1200 user pins (typically ~960). FPGA-Boards of different FPGA providers Altera / Xilinx and different FPGA families e.g. Virtex4/5/6 can be mixed, so that each system can grow successively over the years. In between the FPGA-Boards Switch-Routing-Boards are placed. They connect the various IOs of the FPGAs. By default the systems are pre-assembled with Switch-Routing-Boards to guarantee the best ever routing structure. Alternatively Fixed-Routing-Boards can be used instead. 

Each FPGA-Board has a main, central FPGA and a support FPGA. They are connected  via some datalines. The support FPGA has a PCIe Gen2 interface implemented. Each FPGA board can therefor communicate to the CPU on the motherboard. This interface is used for co-simulation (SCEMi, ...), high speed debugging (TotalScope) and system configuration.

The connected FPGA-Boards are plugged on an ASUS P6T7 WS SuperComputer motherboard to provide a tightly coupled prototyping-PC system. The system is always provided with state-of-the-art PC configurations (CPU, ...).

Routing:

The patented switch routing solution guarantees the best ever routing structure. Every connectivity between n (n = 2, ..., 4) FPGAs that is mathematically possible can be routed on the system. EDAptability guarantees that no routing-limitations can result from that structure.

Due to the short distance between the FPGAs and switches, it is also the fastest possible routing structure. The worst case system frequency (register-to-register) in a 4 Altera FPGA based system is 108 MHz (1200 ios switching at the same time/clock-edge). This switch routing structure supports an unique pin multiplexing scheme for 333 MHz wave pipelining data rate connectivity over 1200 signals.  

For more detailed information about the routing structure, please read the IEEE Transactions on VLSI paper here.

IO-Access:

There are two different ways to access the system. One way is to use the connections on each FPGA-Board. This enables a direct access to the central FPGA on the FPGA-Board and standard or customer specific application specific boards (ASBs) can be used for that. The number of connections is 400 on the top and 200 on the back. The second way is to use the front or backside of a complete system as an alternative to Switch-Boards. The user has then access to 2400 signals on the front and 2400 signals on the back if FPGAs with 1200 IOs are used. Each signal can be routed to one pin of the individual FPGAs. If FPGAs with 960 IOs are used, the number of accessible signals at the front/back are 1920 each.


FPGA debugging:

Standard FPGA debugging technologies can be used. EDAptability provides the integrated TotalScope debugger for the SPE Gen2. TotalScope provides you with 100% RTL at speed signal visibility without the need of resynthesis. A teststructure is automatically inserted in the RTL. After synthesis, place and route, register values are streamed out during the test via PCIe Gen2 x8 at a speed of 4Gbyte/sec. These register values are used to postsimulate a temporarily generated local model of the design. The signal values are dumped in a VCD file.

From the user perspective the TotalScope technology looks like a simulator. Signal selection can be done on RTL level, the "simulation" might need to be rerun again and the signals are displayed. Then other signals and time-zones of interests are selected and the process is repeated until the bug is found. The difference here is, that the "simulation" runs on the FPGA FPGA based system prototyping system (running at for example 80 MHz) and true register behaviors are traced. It is important to notice, that no re-synthesis needs to be done during the debugging process or due to debugging related issues at all. All RTL-signals can be displayed, including registers, memory-content and non register signals (combinatorial signals). Also the type of the signals remains as defined (enumeration, arrays, records, ...). The signals are dumped in a VCD-file, which can be accessed by EDAptability's VCD viewer or any third party VCD viewer. The clocks don't need to be controlled. The system can run freely and continuously and clocks can be driven from external as well. The postsimulation starts shortly before the user selected timeframe. The time between the start and the selected timeframe don't need to be simulated. There is no direct impact on the timing behavior of the design and only a minor impact on the area.

Co-Simulation:

All known co-simulation (cycle- or transaction driven) techniques can be applied. The high speed interface PCIe Gen2 on the state-of-the-art ASUS board allows an extremely fast co-simulation, most likely the fastest provided in the industry.

General:

The system has 10 global clock resources. The clock management devices on each FPGA-Board guarantee a global clock skew below 200 ps. Each global clock resource can be driven from any FPGA, from an external clock source or it can be driven by one of the multipliers (factor 4...40) or dividers (divider 2...40) in one of the clock management devices on each FPGA-Board. These multipliers/dividers again can be driven by any FPGA, global clock source or external clock source. The insertion delay of external clocks can be modified.

A license free software SPE-Control, running an all operating systems handles the switch setting mechanism and controls the download of configuration data via PCIe. It generates a system Verilog file to be used in third party partitioning tools. EDAptability's partitioner has an integrated switch setting optimizer. This switch setting definition can be read with SPE-Control.

Casing:

spe_gen2

enlarge

The SPE-System is mounted inside a PC tower (Lian Li). The left side is modified to enable access from the outside.


For further questions, feel free to contact EDAptability. We are always glad to hear from you.