high speed video Fastvideo - high speed cameras and software high speed camera high speed cameras
high fps camera
high fps cameras
high fps camera HOME
high speed camera
high speed camera
software for high speed camera
high speed camera projects
high speed camera
  Site search:
high fps camera
high fps videocamera
high speed smart camera high fps smart camera

Fast CUDA JPEG encoder

We have wide experience with high speed cameras and software and it's a usual problem - how to increase duration for high speed video recording. All our cameras do online video streaming to PC RAM via PCI-Express framegrabber and CameraLink interface. In that case total recording time is restricted by the size of free RAM which is usually in the range of 1-12 GBytes. Conventional high speed camera data rate is about 650 MB/s or even much more, so one can record up to 16 seconds of high speed video in a raw format.

The basic idea how to improve the situation is quite straightforward. We need to compress incoming to PC stream with lossy algorithm to achieve compression ratio of 10-20 times to be able to write compressed data stream to HDD/SSD/RAID in realtime. Usual CPU is unable to cope with that problem, so we decided to use GPU with NVIDIA CUDA technology instead. We believe that it's a good idea to use video cards to implement the latest NVIDIA findings for parallel computations.

We have implemented JPEG lossy compression algorithm (JPEG baseline for 8/24-bit images) because we consider it to fulfil our considerations. The main demands are the following:

  • algorithm should be able to be adopted to parallel computations
  • good image quality for compression ratio in the range from 10 to 20 times
  • algorithm should not be too computationally intensive
  • task division to maximum number of sub-tasks
  • minimum amount of memory for one thread
  • popular open standard with multiple encoders and decoders available on different platforms

Lossy JPEG compression algorithm meets all criteria for parallel computations on the GPU. We have developed the software without using standard libraries like CUDA NVIDIA NPP, Cublas, etc. To create high-performance JPEG encoder for high speed video applications we have also done algorithm optimization for GPU calculations. This is full, performance-oriented implementation of Baseline JPEG. We got ultra fast JPEG compression and decompression on the GPU due to full parallel implementation of Baseline JPEG algorithm. Our JPEG codec is much more fast in comparison with the best commercial multithreaded JPEG codecs for multicore CPUs.

Fast JPEG image compression features for CUDA JPEG codec

  • Implementation is 100% compliant with JPEG Baseline standard
  • Baseline JPEG compression and decompression for grayscale (8-bit) and color (24-bit) images with arbitrary width and height
  • Extremely fast lossy image encoding and decoding with variable compression ratio
  • Maximum input image size is 12000 x 12000 or even more (optional)
  • Subsampling modes: 4:4:4, 4:2:2, 4:2:0
  • JPEG image quality in the range from 1 to 100%
  • Read/edit/write any EXIF section
  • Data input: 8/24-bit images from RAM/HDD/RAID/SSD
  • Data output: final compressed/uncompressed image in RAM/HDD/RAID/SSD
  • Continuous data mode (input one image after another)
  • Standard set of computations for parallel implementation of Baseline JPEG compression and decompression
    • JPEG Encoding on GPU: Input data parcing, Color Transform, 2D DCT, Quantization, Zig-zag, AC/DC, DPCM, RLE, Huffman, Byte stuffing, JFIF formatting
    • JPEG Decoding on GPU: JFIF parcing, Restart marker search, Inverse Huffman, Inverse RLE, Inverse DPCM, AC/DC, Inverse Zig-zag, Inverse Quantization, IDCT, Inverse Color Transform, Output formatting
  • Optimized for the latest NVIDIA GPUs
  • Optional integration with OpenGL and FFMPEG, MJPEG output
  • Compatible with Windows-7/8 and Linux (32/64)

We have succeeded to make parallel all stages of JPEG algorithm including entropy encoding and decoding. There was a widespread opinion that Huffman algorithm could be only serial. In our solution Huffman coding is not a bottleneck anymore and it's fully parallel. Now we don't off-load anything from GPU to CPU to make JPEG codec faster. CUDA JPEG codec is extremely fast and is functioning completely on GPU.

Benchmarks for JPG encoding on GeForce GTX 980

Now we need just 1.13 ms for Baseline JPEG encoding of 24-bit color image with 3840 x 2160 resolution, JPEG quality 90% and subsampling 4:2:0 (it corresponds to compression ratio ~10:1). If we include DeviceIO latency (copy image data from Host to GPU memory and vice versa), we get total compression time 3.5 ms.

  • JPEG encoding performance for GPU computations ~ 21 GByte/s
  • JPEG encoding performance with DeviceIO latency ~ 6.8 GByte/s

We have chosen the above compression parameters because they correspond to so called "visually lossless" compression and this approach is widely used in real applications. For 20 MPix image with the same compression parameters we can get performance 25 GByte/s.

Benchmarks for CUDA Discrete Cosine Transform (2D DCT) on GeForce GTX 980

The latest version of DCT (Discrete Cosine Transform) from CUDA JPEG Codec shows the following benchmarks on NVIDIA GeForce GTX 980 for 4K image with resolution 3840 x 2160 (8-bit or 24-bit):

  • 8-bit grayscale image ~ 160 microseconds
  • 24-bit color, subsampling 4:2:0 ~ 340 microseconds
  • 24-bit color, subsampling 4:2:2 ~ 380 microseconds
  • 24-bit color, subsampling 4:4:4 ~ 470 microseconds

Timings are valid for DCT transform without DeviceIO latency (copy image data from RAM to GPU memory and vice versa).

Options for CUDA JPEG image compressor

We can offer fast SDK for GPU image processing. Here you can see some benchmarks for combined debayer and JPEG encoding on NVIDIA GPU (timings include DeviceIO latency):

  • Debayer DFPD + JPEG compression (quality 90%, subsampling 4:2:0) for Full HD image takes 1.0 ms (at NVIDIA GeForce GTX 980)
  • Debayer DFPD + JPEG compression (quality 90%, subsampling 4:2:0) for 4K image takes 3.1 ms (at NVIDIA GeForce GTX 980)

We can also extend our JPEG compression software by fast image processing pipeline for high speed and high resolution cameras: bad pixel removal, dark frame subtraction, shading correction, white balance, demosaicing, color correction, image filtering, denoising, LUT, gamma, color management, histogram, online resize, crop, rotate, sharp, OpenGL output, integration with FFMPEG, bayer compression, etc.


We license CUDA JPEG and other components of GPU Image processing SDK to software developers, camera manufacturers, internet providers, software integrators, etc. Our SDK is utilized in wide range of imaging applications. Demo version, documentation, licensing info and quotation are available upon request. We are also offering custom software design according to agreed specification. If you need to get significant speed up for your image processing application, don't hesitate to contact us.

Roadmap 2015 for further improvements of CUDA JPEG Codec

  • 12-bit CUDA JPEG codec
  • Minimum memory usage on GPU
  • More input/output formats for JPEG codec

For any further information concerning CUDA JPEG codec or free trial please contact us via email.

high speed camera high fps camera high speed cameras high fps cameras
smart camera
high fps smart camera Russia, Moscow 129344, Iskra Street 17-A, Build. 3
Phone: +7 (495)-542-04-49
high fps camera fastvideo