Performance
Opal Kelly’s FrontPanel consists of HDL modules within the FPGA, firmware on the USB microcontroller (or PCIe bridge device), and an API on the PC that have been optimized for both performance and a clean abstraction. Our latest FrontPanel-3 release has improved performance significantly while offering several features that customers have requested.
Achieving the highest level of performance for your particular application requires an understanding of the components being used and how certain things affect performance. By following a few simple strategies and applying these notes, your application will be a top performer and still benefit from the ease of use and flexible abstraction that only FrontPanel provides.
Measured Performance
Measured performance figures in this section were taken on an Athlon 64 X2 4800+ machine running Windows XP SP2. USB performance can vary significantly depending on a number of factors including the motherboard make and model, specific driver versions installed, and machine load. The PipeTest application can be used as a simple benchmark.
Wires and Triggers
Wires and triggers provide the most basic form of communication between the FPGA and the PC. From a performance perspective, wires can be read or written several hundred times per second. All WireIns are read simultaneously, regardless of which ones you are interested in. Similarly, all WireOuts are written simultaneously.
Activating a TriggerIn is a very fast operation and can operate at over 1,000-times per second. Only one trigger is written per call. Updating TriggerOuts is similar to reading all WireOuts: all TriggerOuts are read simultaneously.
Since Wire and Trigger updates are always blocking API calls, these measurements provide some indication of the latency performance of the device.
Measured Performance (CPS = Calls Per Second)
API CALL | USB 3.0 (CPS) | USB 2.0 (CPS) | PCIE (CPS) |
---|---|---|---|
UpdateWireIns | 5,000+ | 1,000+ | 4,000+ |
UpdateWireOuts | 4,000+ | 800+ | 3,000+ |
ActivateTriggerIn | 8,000+ | 2,000+ | 66,000+ |
UpdateTriggerOuts | 4,000+ | 800+ | 3,000+ |
Pipes (Bulk Transfers)
Pipes are the fastest way to transmit or receive bulk data. Due to overhead, performance is best with long transfers. Each time you perform a pipe transfer, several layers of setup are required including those at the firmware level, API level, and operating system level. Therefore, it is best to design around using long transfers, if possible. This generally means using large buffer sizes on the FPGA and relying on external memory when possible.
Low-latency, high-bandwidth transfers present a special challenge to any protocol and USB (and therefore FrontPanel) is no different. In this case, the two goals are at odds: trying to perform many operations and still achieve high bandwidth. The problem is that the overhead associated with setting up each transfer cuts into the time available to perform the data transfer.
It is important to note that Windows, Linux, and Mac OS X are not real-time operating systems. They are complex systems that may have many other processes taking higher priority at any given time. Therefore, it is often the case that simple operations (like moving a window) dramatically reduce transfer bandwidth. This should be a consideration when designing the buffering for any bandwidth-dependent application.
NOTE: Pipes in FrontPanel-3 are actually a subset of Block-Throttled Pipes where the EP_READY signal is always asserted, thus disabling any throttling. Also, block sizes are always 1024 bytes except for the last block which may be smaller to account for the total length of the transfer. Block sizes are 64 bytes when the device is enumerated at full-speed.
Measured Performance
All values in MB/s (megabytes per second). Writes measured with WriteToPipeIn. Reads measured with ReadFromPipeOut.
USB 3.0 | USB 2.0 | PCI EXPRESS | ||||
---|---|---|---|---|---|---|
TRANSFER LENGTH | WRITE | READ | WRITE | READ | WRITE | READ |
128 bytes | 0.06 | 0.12 | 0.100 | 0.100 | TBD | TBD |
256 bytes | 0.12 | 0.24 | 0.100 | 0.200 | TBD | TBD |
512 bytes | 0.24 | 0.49 | 0.300 | 0.400 | TBD | TBD |
1.0 kB | 0.49 | 0.98 | 0.700 | 0.800 | 16.1 | 15.8 |
4.0 kB | 0.98 | 3.91 | 2.8 | 3.1 | 58.7 | 66.6 |
16.0 kB | 7.81 | 15.6 | 8.9 | 10.4 | 100 | 125 |
64.0 kB | 31.3 | 55.0 | 20.8 | 23.2 | 100 | 172 |
256 kB | 125 | 150 | 31.8 | 32.7 | 100 | 185 |
1.0 MB | 252 | 258 | 36.5 | 36.7 | 100 | 200 |
4.0 MB | 313 | 313 | 37.9 | 37.9 | 100 | 200 |
8.0 MB | 321 | 318 | 38.2 | 38.1 | 100 | 200 |
Block-Throttled Pipes (Bulk Transfers)
Block-Throttled Pipes are available only in FrontPanel-3 implementations on USB devices. They provide equivalent performance to the standard pipe except that the FPGA can throttle the data transfer at the block level. The block is programmable by the user with highest performance achieved at the largest (1,024-byte) block size.
BTPipes are an excellent way to achieve high performance with smaller buffer sizes because the FPGA can negotiate the transfer at a low level without incurring the significant overhead of setting up a new transfer for each small buffer block.
Measured Performance
All measurements taken with a 8-MB transfer length.
USB 2.0 | USB 3.0 | |||
---|---|---|---|---|
BLOCK LENGTH (BYTES) | WRITETOBLOCKPIPEIN | READFROMBLOCKPIPEOUT | WRITETOBLOCKPIPEIN | READFROMBLOCKPIPEOUT |
4 | 353 kB / s | 266 kB / s | Not Supported | Not Supported |
16 | 1.33 MB / s | 1.03 MB / s | 72.07 MB / s | 88.89 MB / s |
64 | 4.88 MB / s | 3.98 MB / s | 222.22 MB / s | 186.05 MB / s |
256 | 17.7 MB / s | 14.0 MB / s | 296.30 MB / s | 275.86 MB / s |
300 | 20.6 MB / s | 13.8 MB / s | Not Supported | Not Supported |
400 | 24.8 MB / s | 16.9 MB / s | Not Supported | Not Supported |
512 | 29.9 MB / s | 24.5 MB / s | 307.69 | 296.30 MB / s |
600 | 32.8 MB / s | 21.9 MB / s | Not Supported | Not Supported |
700 | 35.1 MB / s | 22.4 MB / s | Not Supported | Not Supported |
800 | 35.7 MB / s | 23.0 MB / s | Not Supported | Not Supported |
900 | 35.0 MB / s | 22.7 MB / s | Not Supported | Not Supported |
1024 | 38.2 MB / s | 38.1 MB / s | 320 MB / s | 320 MB / s |
Isochronous Transfers?
FrontPanel does not support USB isochronous transfers. It is true that isochronous transfers can negotiate for guaranteed bandwidth on the USB which can be very helpful when trying to build a system that must deliver certain performance to the end-user. However, this guarantee comes at a significant price: isochronous transfers do not provide the same level of error-detection and error-correction that the more reliable USB bulk transfers provide. Furthermore, the guarantee is only for bus bandwidth and says nothing about the operating system’s capabilities.
If an error occurs during the transmission of a bulk transfer, the host will request that the missing packet be repeated. The host will also properly reconstitute the transmission so that everything is properly sequenced.
With isochronous transfers, the bandwidth and latency requirements trump delivery accuracy. Therefore, it is possible that some data may be lost in this pursuit. Isochronous transfers were created for things such as multimedia content that requires on-time delivery. But if the host is too busy or something interrupts the transfer, a few missing frames of video or a few milliseconds of audio are considered expendable.