Publication:
A power-efficient and scalable load-store queue design

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2005
Authors
Castro, F.
Huang, M. C.
Tirado Fernández, Francisco
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer-Verlag Berlin
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
The load-store queue (LQ-SQ) of modem superscalar processors is responsible for keeping the order of memory operations. As the performance gap between processing speed and memory access becomes worse, the capacity requirements for the LQ-SQ increase, and its design becomes a challenge due to its CAM structure. In this paper we propose an efficient load-store queue state filtering mechanism that provides a significant energy reduction (on average 35% in the LSQ and 3.5% in the whole processor), and only incurs a negligible performance loss of less than 0.6%.
Description
© Springer-Verlag Berlin Heidelberg 2005. We want to thank Simha Sethumadhavan for his helpful and thorough comments. International Workshop on Power and Timing Modeling, Optimization and Simulation (15th. sep 21-23, 2005.Lovaina, Belgica).
Keywords
Citation
1. R. E. Kessler. The Alpha 21264 Microprocessor. Technical Report, Compaq Computer Corporation, 1999. 2. B. Calder and G. Reinman. A Comparative Survey of Load Speculation Architectures. Journal of Instruction-Level Parallelism, May-2000. 3. C. Nairy and D. Soltis. Itanium-2 Processor Microarchitecture. IEEE-Micro, 23(2):44-55, March/April, 2003. 4. J. M. Tendler, J. S. Dodson, J. S. Fields Jr., H. Le and B. Sinharoy. Power-4 System Microarchitecture. IBM Journal of Research and Development, 46(1):5-26, 2002. 5. S. Sethumadhavan, R. Desikan, D. Burger, Charles R. Moore, Stephen W. Keckler. Scalable Hardware Memory Disambiguation for High ILP Processors. Proceedings of MICRO-36, December-2003. 6. T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. Computer, vol. 35, no. 2, Feb 2002. 7. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. 28-ISCA, Göteborg, Sweden. July, 2001. 8. T. Sherwood, E. Perelman, G. Hamerly, B. Calder. Automatically charecterizing large scale program behavior . Proceedings of ASPLOS-2002, October-2002. 9. S. Sethumadhavan, R. Desikan, D. Burger, Charles R. Moore, Stephen W. Keckler. Scalable Hardware Memory Disambiguation for High ILP Processors. IEEE-Micro, Vol. 24, Issue 6:118-127, November/December, 2004. 10. I. Park, C. Liang Ooi, T. N. Vijaykumar. Reducing design complexity of the load-store queue. Proceedings of MICRO-36, December-2003. 11. H. W. Cain and M. H. Lipasti. Memory Ordering: A Value-Based Approach. Proceedings of ISCA-31, June-2004. 12. A. Roth. A high-bandwidth load-store unit for single- and multi- threaded processors. Technical Report, University of Pennsylvania, 2004. 13. L. Baugh and C. Zilles. Decomposing the Load-Store Queue by Function for Power Reduction and Scalability. Proceedings of PAC Conference, October-2004.