Performance

Opal Kelly’s FrontPanel consists of HDL modules within the FPGA, firmware on the USB microcontroller (or PCIe bridge device), and an API on the PC that have been optimized for both performance and a clean abstraction.  Our latest FrontPanel-3 release has improved performance significantly while offering several features that customers have requested.

Achieving the highest level of performance for your particular application requires an understanding of the components being used and how certain things affect performance.  By following a few simple strategies and applying these notes, your application will be a top performer and still benefit from the ease of use and flexible abstraction that only FrontPanel provides.

Measured Performance

Measured performance figures in this section were taken on an Athlon 64 X2 4800+ machine running Windows XP SP2.  USB performance can vary significantly depending on a number of factors including the motherboard make and model, specific driver versions installed, and machine load.  The PipeTest application can be used as a simple benchmark.

Wires and Triggers

Wires and triggers provide the most basic form of communication between the FPGA and the PC.  From a performance perspective, wires can be read or written several hundred times per second.  All WireIns are read simultaneously, regardless of which ones you are interested in.  Similarly, all WireOuts are written simultaneously.

Activating a TriggerIn is a very fast operation and can operate at over 1,000-times per second.  Only one trigger is written per call.  Updating TriggerOuts is similar to reading all WireOuts: all TriggerOuts are read simultaneously.

Since Wire and Trigger updates are always blocking API calls, these measurements provide some indication of the latency performance of the device.

Measured Performance (CPS = Calls Per Second)

API CALLUSB 3.0 (CPS)USB 2.0 (CPS)PCIE (CPS)
UpdateWireIns5,000+1,000+4,000+
UpdateWireOuts4,000+800+3,000+
ActivateTriggerIn8,000+2,000+66,000+
UpdateTriggerOuts4,000+800+3,000+

Pipes (Bulk Transfers)

Pipes are the fastest way to transmit or receive bulk data.  Due to overhead, performance is best with long transfers.  Each time you perform a pipe transfer, several layers of setup are required including those at the firmware level, API level, and operating system level.  Therefore, it is best to design around using long transfers, if possible.  This generally means using large buffer sizes on the FPGA and relying on external memory when possible.

Low-latency, high-bandwidth transfers present a special challenge to any protocol and USB (and therefore FrontPanel) is no different.  In this case, the two goals are at odds: trying to perform many operations and still achieve high bandwidth.  The problem is that the overhead associated with setting up each transfer cuts into the time available to perform the data transfer.

It is important to note that Windows, Linux, and Mac OS X are not real-time operating systems.  They are complex systems that may have many other processes taking higher priority at any given time.  Therefore, it is often the case that simple operations (like moving a window) dramatically reduce transfer bandwidth.  This should be a consideration when designing the buffering for any bandwidth-dependent application.

NOTE: Pipes in FrontPanel-3 are actually a subset of Block-Throttled Pipes where the EP_READY signal is always asserted, thus disabling any throttling.  Also, block sizes are always 1024 bytes except for the last block which may be smaller to account for the total length of the transfer.  Block sizes are 64 bytes when the device is enumerated at full-speed.

Measured Performance

All values in MB/s (megabytes per second).  Writes measured with WriteToPipeIn.  Reads measured with ReadFromPipeOut.

 USB 3.0USB 2.0PCI EXPRESS
TRANSFER LENGTHWRITEREADWRITEREADWRITEREAD
128 bytes0.060.120.1000.100TBDTBD
256 bytes0.120.240.1000.200TBDTBD
512 bytes0.240.490.3000.400TBDTBD
1.0 kB0.490.980.7000.80016.115.8
4.0 kB0.983.912.83.158.766.6
16.0 kB7.8115.68.910.4100125
64.0 kB31.355.020.823.2100172
256 kB12515031.832.7100185
1.0 MB25225836.536.7100200
4.0 MB31331337.937.9100200
8.0 MB32131838.238.1100200

Block-Throttled Pipes (Bulk Transfers)

Block-Throttled Pipes are available only in FrontPanel-3 implementations on USB devices.  They provide equivalent performance to the standard pipe except that the FPGA can throttle the data transfer at the block level.  The block is programmable by the user with highest performance achieved at the largest (1,024-byte) block size.

BTPipes are an excellent way to achieve high performance with smaller buffer sizes because the FPGA can negotiate the transfer at a low level without incurring the significant overhead of  setting up a new transfer for each small buffer block.

Measured Performance

All measurements taken with a 8-MB transfer length.

 USB 2.0USB 3.0
BLOCK LENGTH (BYTES)WRITETOBLOCKPIPEINREADFROMBLOCKPIPEOUTWRITETOBLOCKPIPEINREADFROMBLOCKPIPEOUT
4353 kB / s266 kB / sNot SupportedNot Supported
161.33 MB / s1.03 MB / s72.07 MB / s88.89 MB / s
644.88 MB / s3.98 MB / s222.22 MB / s186.05 MB / s
25617.7 MB / s14.0 MB / s296.30 MB / s275.86 MB / s
30020.6 MB / s13.8 MB / sNot SupportedNot Supported
40024.8 MB / s16.9 MB / sNot SupportedNot Supported
51229.9 MB / s24.5 MB / s307.69296.30 MB / s
60032.8 MB / s21.9 MB / sNot SupportedNot Supported
70035.1 MB / s22.4 MB / sNot SupportedNot Supported
80035.7 MB / s23.0 MB / sNot SupportedNot Supported
90035.0 MB / s22.7 MB / sNot SupportedNot Supported
102438.2 MB / s38.1 MB / s320 MB / s320 MB / s

Isochronous Transfers?

FrontPanel does not support USB isochronous transfers.  It is true that isochronous transfers can negotiate for guaranteed bandwidth on the USB which can be very helpful when trying to build a system that must deliver certain performance to the end-user.  However, this guarantee comes at a significant price: isochronous transfers do not provide the same level of error-detection and error-correction that the more reliable USB bulk transfers provide.  Furthermore, the guarantee is only for bus bandwidth and says nothing about the operating system’s capabilities.

If an error occurs during the transmission of a bulk transfer, the host will request that the missing packet be repeated.  The host will also properly reconstitute the transmission so that everything is properly sequenced.

With isochronous transfers, the bandwidth and latency requirements trump delivery accuracy.  Therefore, it is possible that some data may be lost in this pursuit.  Isochronous transfers were created for things such as multimedia content that requires on-time delivery.  But if the host is too busy or something interrupts the transfer, a few missing frames of video or a few milliseconds of audio are considered expendable.