Publication:
Wavelet transform for large scale image processing on modern microprocessors

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2003
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer-Verlag Berlin
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
In this paper we discuss several issues relevant to the vectorization of a 2-D Discrete Wavelet Transform on current microprocessors. Our research is based on previous studies about the efficient exploitation of the memory hierarchy, due to its tremendous impact on performance. We have extended this work with a more detailed analysis based on hardware performance counters and a study of vectorization, in particular, we have used the Intel Pentium SSE instruction set. Most of our optimizations are performed at source code level to allow automatic vectorization, though some compiler intrinsic functions have been introduced to enhance performance. Taking into account the abstraction at which the optimizations are performed, the results obtained on an Intel Pentium III microprocessor are quite satisfactory, even though further improvement can be obtained by a more extensive use of compiler intrinsics.
Description
© 2003 Springer Verlag. International Conference on High Performance Computing for Computational Science (VECPAR 2002) (5th. jun 26-28, 2002. Oporto, Portugal).
Keywords
Citation
[1] Z. Zhang and R. S. Blum. A Categorization of Multiscale-Decomposition-Based ImageFusion Schemes with a Performance Study for a Digital Camera Application. Proceeding of the IEEE, Vol. 87(8):1315-1325, August 1999 [2] E. J. Stollnitz, T. D. DeRose and D. H. Salesin. Wavelets for Computer Graphics: Theory and Applications. Computer Graphics and Geometric Modeling, Morgan Kaufmann Publishers, Inc. San Francisco, 1996 [3] Intel Corp. Pentium-III processor. http://developer.intel.com/design/PentiumIII [4] C. Chakrabarti and C. Mumford. Efficient realizations of encoders and decoders based on the 2-D discrete wavelet transforms. IEEE Trans. VLSI Syst., pp. 289-298, September 1999 [5] T. Denk and K. Parhi. LSI Architectures for Lattice Structure Based Orthonormal Discrete Wavelet Transforms. IEEE Trans. Circuits and Systems, vol. 44, pp. 129-132, February 1997 [6] C. Chrysafis and A. Ortega. Line Based Reduced Memory Wavelet Image Compression. IEEE Trans. on Image Processing, Vol 9, No 3, pp. 378-389, March 2000 [7] S. Chatterjee, V. V. Jain, et al. Nonlinear Array Layouts for Hierarchical Memory Systems. Proceedings of 1999 ACM International Conference on Supercomputing, pp. 444-453, Rhodes, Greece, June 1999 [8] P. Meerwald, R. Norcen, et al. Cache issues with JPEG2000 wavelet lifting. In C.-C. Jay Kuo, editor, Visual Communications and Image Processing 2002 (VCIP'02), volume 4671 of SPIE Proceedings, San Jose, CA, USA, January 2002 [9] Intel Corp. C/C++ Compiler. http://www.intel.com/software/products/compilers [10] K. London, J. Dongarra, et al. End-user Tools for Application Performance Analysis, Using Hardware Counters. Presented at International Conference on Parallel and Distributed Computing Systems. August 2001 [11] Perfctr Linux driver. Info. available at http://www.csd.uu.se/~mikpe/linux/perfctr [12] Intel Corp. Data Alignment and Programming Issues for the Streaming SIMD Extensions with the Intel C/C++ Compiler. Intel Application Note AP-833. Available at http://developer.intel.com [13] Intel Corp. Intel Architecture Optimization. Reference Manual. Available at http://developer.intel.com [14] M. Holmström. Parallelizing the fast wavelet transform. Parallel Computing, 11(21):1837- 1848, April 1995 [15] D. Chaver, M. Prieto, L. Piñuel, F. Tirado. Parallel Wavelet Transform for Large Scale Image Processing. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'2002). Florida, USA, April 2002 [16] O.M. Nielsen and M. Hegland. Parallel Performance of Fast Wavelet Transform. International Journal of High Speed Computing, 11 (1): 55-73, June 2000 [17] L. Yang and M. Misra. Coarse-Grained Parallel Algorithms for Multi-Dimensional Wavelet Transforms. The journal of Supercomputing 11:1-22 , 1997 [18] M. Feil and A. Uhl. Multicomputer algorithms for wavelet packet image decomposition. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'2000), pages 793-798, Cancun, Mexico, 2000 [19] Intel Corp. Real and Complex FIR Filter Using Streaming SIMD Extensions. Intel Application Note AP-809. Available at http://developer.intel.com [20] D. Chaver, C. Tenllado, L. Piñuel, M. Prieto and F. Tirado. Vectorizing the Wavelet Transform on the Intel Pentium III Microprocessor. Technical Report 02-001. Dept. of Computer Architecture. Complutense University, 2002