In this document, we demonstrate the use of ping-pong textures, a common technique used in many GPGPU applications, and its application to scale-space filtering as a post-processing application.
Ping-pong textures involve a pair of texture surfaces that a shader uses both as input as well as output data. The shader program uses one texture as input to do some computation and writes the output to the second texture. In the next iteration, the second texture becomes the input while the output is written to the first texture. The pair of textures switch roles on every new iteration - hence the name ping-pong textures. The two textures are analogous to temporary variables in a program running on a CPU.
In this document we focus on the ping-pong texture implementation in DirectX. We also explain how we use it to implement scale-space filtering as a post-production process or for plain image processing.
In the next sections we explain the ping-pong texture and its use in the implementation of the scale-space filtering. The final section outlines how this technique can be used to implement the FFT algorithm on the GPU.
This paper does not discuss the benchmarking and optimization issues.
The following diagram shows a representation of ping-pong textures. The oval shapes represent the pixel shaders. The rectangles represent the textures used for the ping-pong operations. The arrows show the directions of the data flow.
The rasterizer delivers the data to the pixel shader PS_0. The output of the PS_0 shader is delivered to Texture_0. Shader PS_1 takes this data as input and writes its output to Texture_1. Shader PS_2 takes that in turn as input, and writes its output to Texture_0. Texture_0 and Texture_1 are the ping-pong textures in the above example. Notice that we use two pixel shaders PS_1 and PS_2 in our implementation, but depending on the application, only one pixel shader may use the ping-pong textures.
We apply this technique in our implementation of the scale-space filtering application by iterating over the two pixel shaders (PS_1, and PS_2) while interchanging the two textures (Texture_0, and Texture_1) as input and output data.
Scale-space filtering is explained in detail by Witkins[1]. scale-space is a concept used in image and signal processing and pattern recognition. Scale-space filtering provides a basis for handling input signals at different scales that are parameterized by a variable termed the scale parameter. This representation of the signals is termed scale-space representation.
In our case the signal is an image. If the input to our application is a 3D scene, then we render the scene to a texture and use that as the input image. If the input is an image then we directly load the image as a texture. Thus essentially the input to the filtering process is always an image. For lucidity we use terms texture surface and image interchangeably, without being ambiguous about their implied meaning.
The most frequently used scale-space is the linear scale-space also known as the Gaussian scale-space. In the rest of this document we discuss only the linear scale-space with a two dimensional image as our input signal. This document uses the words ‘image’ and ‘texture’ interchangeably.
The scale-space representation provides a family of representations of the same base image. We can select any of those representations using a single parameter (scale). In our application, we have provided a slider to select the scale parameter. The application will generate a scale-space representation of the image based on this parameter, by convolving the image I(x, y), with the filter F(x, y, p) shown below:
In the above equation, p is the scale parameter. By changing parameter p, we get different scale-space representations of I(x, y). We notice that for p = 0, the filter becomes an identity operation and we recover the original image. For increasing positive values of p we get increasingly smoothened image as the output.
Gaussian kernels are separable into horizontal and vertical components. In other words the above 2D filter can be separated into two 1D filters as shown below:
The separable filter is computationally less expensive than the non-separable filter. In our implementation, we convolve the image in the horizontal direction (Horizontal Pass or HP) and then convolve the resulting image in the vertical direction (Vertical Pass or VP).
The input scene is first rendered to the texture Texture_0. If the input is an image, we render a screen aligned polygon and use the image as the texture map. Shader PS_0 renders the 3D scene or the 2D image into Texture_0. In case of a 2D image as input the shader PS_0 is an identity texture sampling shader to bring the image into Texture_0.
Initially, the image in Texture_0 is the zeroth scale representation of the input image. This corresponds to the parameter value of p = 0, in the scale-space. In our implementation, the minimum (left most) position of the slider corresponds to this level in the scale-space representation of the signal.
Shader PS_1 takes Texture_0 as input and convolves the image with the HP filter, writing the output to Texture_1. Shader PS_2 takes Texture_1 as input and performs the convolution with the VP filter. The output from PS_2 is written to Texture_0. The final result of applying the filters (HP and VP) is written to Texture_2, which corresponds to the level 1 scale-space representation of the image.
This process of applying HP and VP filters successively is repeated depending on the desired representation level. After the desired number of the blurring passes is done, the final data is sent to the viewport surface for displaying.

The two buttons on the right allow the user to select either a 3D model or a 2D image as the signal input, as well as select either a gray scale or a RGB color rendering. The four images, going clockwise from the top left show a full 3D model, a gray scale 3D model, a full color image and a gray scale image rendered. The images shown are rendered at zeroth scale.
The slider on the right side allows the user to adjust the scale level of the image to be displayed. The scale increases from left to right. Below we show a sequence of four outputs for different scales to show the effect of scale-space filtering, on the rendering of a 3D model as well as on a 2D image.
There are several applications of scale-space filtering in image analysis and image processing, in addition to other fields.
The Gaussian scale-space is not the only option for scale-space filters, though by and large that is the most frequently used option. It happens to be the canonical way of doing scale-space filters. We selected the Gaussian filter largely because of the inherent separability of the filter and to demonstrate the use of ping-pong textures.
We intend to use this framework to implement the Fast Fourier Transform that is optimally load balanced between the CPU and the GPU.
We plan to implement Cooley-Tuckey FFT algorithm. In addition to the ping-pong textures, we also need the ability to send an array of floating point data to the graphics subsystem. That facility is implemented in the current framework.
Our motivation of this framework was to create different scalable applications for benchmarking pixel shader programs with different characteristics such as those doing texture samplings of different types, those generating textures procedurally, and those with high arithmetic intensities (think GPGPU). The current release of the framework fulfills that requirement.
This document detailed the ping-pong textures and its application to implementing the scale-space filtering as an example of the use of the former. We implemented this as a precursor to implementing the FFT on GPU. We have implemented all the pieces required for implementing the FFT, in the host program and provided all the pixel shader setups, and texture maps etc. necessary for implementing the Cooley-Tuckey algorithm for FFT.
References
Witkin, A. P. "Scale-space filtering", Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe, Germany,1019–1022, 1983.
[Cooley65] Cooley, J. W. and Tukey, O. W. “An Algorithm for the Machine Calculation of Complex Fourier Series.” Math. Comput. 19, 297-301, 1965.
Raghu Muthyalampalli is a Software Engineer in the Software Solutions Group at Intel Corporation.
Shankar Swamy is a Senior Graphics Architect in the Software Solutions Group at Intel Corporation.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
This white paper, as well as the software described in it, is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document.
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor%5Fnumber/ for details.
The Intel processor/chipset families may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Copies of documents, which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site.
Intel and the Intel Logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2007, Intel Corporation. All rights reserved.