Publication:
2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2002
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer-Verlag Berlin
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code).
Description
International Conference on High Performance Computing (HiPC 2002) (9th . Dec 18-21, 2002. Bangalore, India).
Keywords
Citation
[1]Z. Zhang and R. S. Blum. A Categorization of Multiscale-Decomposition-Based Image Fusion Schemes with a Performance Study for a Digital Camera Application. Proceeding of the IEEE, Vol. 87(8): 1315-1325, August 1999. [2]E. J. Stollnitz, T. D. DeRose and D. H. Salesin. Wavelets for Computer Graphics: Theory and Applications. The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling, Morgan Kaufmann Publishers, Inc. San Francisco, CA, 1996. [3]S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra and M. Thottethodi. Nonlinear Array Layouts for Hierarchical Memory Systems. Proceedings of 1999 ACM International Conference on Supercomputing, pp. 444-453, Rhodes, Greece, June 1999. [4]P. Meerwald, R. Norcen, and A. Uhl. Cache issues with JPEG2000 wavelet lifting. In proceedings of 2002 Visual Communications and Image Processing (VCIP'02), volume 4671 of SPIE Proceedings, San Jose, CA, USA, January 2002. [5]D. Chaver, M. Prieto, L. Piñuel, F. Tirado. Parallel Wavelet Transform for Large Scale Image Processing. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'2002). Florida, USA, April 2002. [6]D. Chaver, C. Tenllado, L. Piñuel, M. Prieto and F. Tirado. Wavelet Transform for Large Scale Image Processing on Modern Microprocessors. To be published in the proceedings of Vecpar 2002, Porto, Portugal, June, 2002. [7]C. Chrysafis and A. Ortega. Line Based Reduced Memory Wavelet Image Compression. IEEE Trans. on Image Processing, Vol 9, No 3, pp. 378-389, March 2000. [8]M. Vishwanath, The recursive pyramid algorithm for the discrete wavelet transform. IEEE Trans. Signal Processing, vol. 42, pp. 673-676, March 1994. [9]Intel Corp. Intel C/C++ Compiler for Linux. Information available at http://www.intel.com/software/products/compilers/c50/linux [10]D. Chaver, C. Tenllado, L. Piñuel, M. Prieto and F. Tirado. Vectorizing the Wavelet Transform on the Intel Pentium-III and Pentium-4 Microprocessors. Technical Report 02-001. Dept. of Computer Architecture. Complutense University, 2002. [11]K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour and T. Spencer. End-user Tools for Application Performance Analysis, Using Hardware Counters. Presented at International Conference on Parallel and Distributed Computing Systems. August 2001. [13]C. Chakrabarti and C. Mumford. Efficient realizations of encoders and decoders based on the 2-D discrete wavelet transforms. IEEE Trans. VLSI Syst., pp. 289-298, September 1999. [14]T. Denk and K. Parhi. LSI Architectures for Lattice Structure Based Orthonormal Discrete Wavelet Transforms. IEEE Trans. Circuits and Systems, vol. 44, pp. 129-132, February 1997. [15]M. Holmström. Parallelizing the fast wavelet transform. Parallel Computing, 11(21): 1837-1848, April 1995. [16]O.M. Nielsen and M. Hegland. Parallel Performance of Fast Wavelet Transform. International Journal of High Speed Computing, 11 (1): 55-73, June 2000. [17]L. Yang and M. Misra. Coarse-Grained Parallel Algorithms for Multi-Dimensional Wavelet Transforms. The journal of Supercomputing 11:1-22, 1997. [18]M. Feil and A. Uhl. Multicomputer algorithms for wavelet packet image decomposition. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'2000), pages 793-798, Cancun, Mexico, 2000. IEEE Computer Society. [19]Intel Corp. Real and Complex FIR Filter Using Streaming SIMD Extensions. Intel Application Note AP-809. Available at http://developer.intel.com.