Publication:
Write-aware replacement policies for PCM-based systems

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2015-09
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Oxford Univ Press
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
The gap between processor and memory speeds is one of the greatest challenges that current designers face in order to develop more powerful computer systems. In addition, the scalability of the Dynamic Random Access Memory (DRAM) technology is very limited nowadays, leading one to consider new memory technologies as candidates for the replacement of conventional DRAM. Phase-Change Memory (PCM) is currently postulated as the prime contender due to its higher scalability and lower leakage. However, compared with DRAM, PCM also exhibits some drawbacks, like lower endurance or higher dynamic energy consumption and write latency, that need to be mitigated before it can be used as the main memory technology for the next generation of computers. This work addresses the PCM endurance constraint. For this purpose, we present an analysis of conventional cache replacement policies in terms of the amount of writebacks to main memory that they imply and we also propose some new replacement algorithms for the last-level cache (LLC) with the goal of cutting down the write traffic to memory and consequently, to increase PCM lifetime without degrading system performance. In this paper, we target general purpose processors provided with this kind of non-volatile main memory and we exhaustively evaluate our proposed policies in both single- and multi-core environments. Experimental results show that, on average, compared with a conventional Least Recently Used (LRU) algorithm, some of our proposals manage to reduce the amount of writes to main memory up to 20-30% depending on the scenario evaluated, which leads to memory endurance extensions of up to 20-45%, also reducing the energy consumption in the memory hierarchy by up to 9% and hardly degrading performance.
Description
This work has been supported in part by the Spanish government through the research contract CICYT-TIN 2008/508, TIN2012-32180 and the HIPEAC-3 European Network of Excellence. Also it was supported by a grant scholarship from the University of Costa Rica and Costa Rican Ministry of Science and Technology MICIT and CONICIT.
Keywords
Citation
[1] Zhou, P., Zhao, B., Yang, J., and Zhang, Y. (2009) A durable and energy efficient main memory using phase change memory technology. ACM SIGARCH Computer Architecture News, 37, 14. [2] Cho, S. and Lee, H. (2009) Flip-n-write: a simple deterministic technique to improve pram write performance, energy and endurance. MICRO, pp. 347–357. [3] Qureshi, M. K., Srinivasan, V., and Rivers, J. A. (2009) Scalable high performance main memory system using PCM technology. ACM SIGARCH Computer Architecture News, 37, 24-33. [4] Ramos, L. E., Gorbatov, E., and Bianchini, R. (2011) Page placement in hybrid memory systems. ICS, pp. 85–95. [5] Rodr´ıguez-Rodr´ıguez, R., Castro, F., Chaver, D., Piñuel, L., and Tirado, F. (2013) Reducing writes in phase-change memory environments by using efficient cache replacement policies. DATE, pp. 93–96. [6] Kim, C. (2001) LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Transactions on Computers, 50, 1352–1361. [7] Jaleel, A., Theobald, K. B., Steely, S. C., and Emer, J. S. (2010) High performance cache replacement using re-reference interval prediction (RRIP). ISCA, pp. 60–71. [8] Chaudhuri, M. (2009) Pseudo-lifo: the foundation of a new family of replacement policies for last-level caches. MICRO, pp. 401–412. [9] Wu, C.-J., Jaleel, A., Hasenplaugh, W., Martonosi, M., Steely, S. C., and Emer, J. S. (2011) Ship: signaturebased hit predictor for high performance caching. MICRO, pp. 430–441. [10] Lee, B. C. et al. (2010) Phase-change technology and the future of main memory. IEEE Micro, 30, 143. [11] Qureshi, M. K., Gurumurthi, S., and Rajendran, B. (2011) Phase change memory: From devices to systems. Synthesis Lectures on Computer Architecture, 6, 1–134. [12] Hu, J., Xue, C. J., Tseng, W.-C., He, Y., Qiu, M., and Sha, E. H.-M. (2010) Reducing write activities on nonvolatile memories in embedded cmps via data migration and recomputation. DAC, pp. 350–355. [13] Liu, T., Zhao, Y., Xue, C. J., and Li, M. (2011) Poweraware variable partitioning for dsps with hybrid pram and dram main memory. DAC, pp. 405–410. [14] Ferreira, A. P., Zhou, M., Bock, S., Childers, B. R., Melhem, R. G., and Moss´e, D. (2010) Increasing pcm main memory lifetime. DATE, pp. 914–919. IEEE. [15] Zhou, M., Du, Y., Childers, B., Melhem, R., and Mossé, D. (2012) Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems. ACM TACO, 8, 1–21. [16] Zhang, X., Hu, Q., Wang, D., Li, C., and Wang, H. (2011) A read-write aware replacement policy for phase change memory. Advanced Parallel Processing Technologies, pp. 31–45. Springer. [17] Lee, S., Bahn, H., and Noh, S. (2013) Clock-dwf: A write-history-aware page replacement algorithm for hybrid pcm and dram memory architectures. IEEE Transactions on Computers. [18] Belady, L. A. (1966) A study of replacement algorithms for virtual-storage computer. IBM Systems Journal, 5, 78–101. [19] Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. S. (2007) Adaptive insertion policies for high performance caching. ISCA, pp. 381–391. [20] Ban, A. (2004). Wear leveling of static areas in flash memory. US Patent 6,732,221. [21] Kgil, T., Roberts, D., and Mudge, T. N. (2008) Improving nand flash based disk caches. ISCA, pp. 327–338. IEEE. [22] Gal, E. and Toledo, S. (2005) Algorithms and data structures for flash memories. ACM Comput. Surv., 37, 138–163. [23] Ben-Aroya, A. and Toledo, S. (2011) Competitive analysis of flash memory algorithms. ACM Transactions on Algorithms, 7, 23. [24] Qureshi, M. K. and Patt, Y. N. (2006) Utilitybased cache partitioning: A low-overhead, highperformance, runtime mechanism to partition shared caches. MICRO, pp. 423–432. IEEE Computer Society. [25] Corbato, F. J. (1969) A Paging Experiment with the Multics System. In Honor of P.M. Morse, pp. 217–228. MIT Press. [26] Binkert, N. et al. (2011) The gem5 simulator. ACM SIGARCH Computer Architecture News, 39, 1. [27] Intel (2013). http://www.intel.com/content/www/us/en/processors/core/core-i7-processor.html. [28] Rosenfeld, P., Cooper-Balis, E., and Jacob, B. (2011) Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters, 10, 16 –19. [29] http://www.cse.psu.edu/ xydong/software.html. [30] (2013). http://www.spec.org/cpu2006/. [31] Bienia, C. (2011) Benchmarking Modern Multiprocessors. PhD thesis Princeton University. [32] Patil, H., Cohn, R. S., Charney, M., Kapoor, R., Sun, A., and Karunanidhi, A. (2004) Pinpointing representative portions of large intel R itanium R programs with dynamic instrumentation. MICRO, pp. 81–92. IEEE Computer Society. [33] Schechter, S. E., Loh, G. H., Strauss, K., and Burger, D. (2010) Use ecp, not ecc, for hard failures in resistive memories. ISCA, pp. 141–152. [34] (2013). http://www.hpl.hp.com/research/cacti/. [35] Muralimanohar, N., Balasubramonian, R., and Jouppi, N. P. (2009) Cacti 6.0: A tool to understand large caches.
Collections