UpDown System - Accelerating Graph and Sparse Computations at Massive Scale
(builds on Recode/UDP for efficient data transformation and 10x10 Systematic Heterogeneity)



UpDown System
The UpDown System is a flexible, general-purpose data-movement and graph computing system. Its key novel feature is the UpDown accelerator, which delivers efficient fine-grained parallel threads, massive memory parallelism, event-driven execution, and the powerful data transformation capabilities of the Recode/UDP engine. The UpDown accelerator emerged from ideas explored in Project 38, a joint DOE-DOD effort to identify new breakout architecture features for memory systems. We gratefully acknowledge support from IARPA. Press on the Updown Project here .

Recode/UDP
The most interesting accelerators were all data-oriented.  The three data-oriented accelerators (generalized pattern matching, small sort, and gather-scatter) were merged into a new architecture called the Unstructured Data Processor (UDP/Recode) and sometimes Unified Automata Processor (UAP). UDP/Recode is also being explored as a part of Project 38, a joint DOE-DOD effort to identify new breakout architecture features for memory systems. We gratefully acknowledge support from DARPA, Samsung, DOE, DoD, and the National Science Foundation.


10x10 and Data Movement
Computing the Representation to optimize Data Movement and Storage. Data movement (from memory, from SSD, within a parallel machine, even in the wires across a chip) is the critical cost and performance limiter in computer systems. We are building architectures that enable efficient rapid transformation of information encodings, to reduce size and computation cost. UAP, UDP, and now the Recoding Engine. Initial designs and studies show that benefits of 4x to 1000x can be achieved in specific cases. Critical challenges include how to expose these new ideas to software: (e.g. transformer libraries, or to view C++ arrays as abstract data types with a different concrete type implementation), as well as a variety of functional and implementation architecture issues. These efforts came out of the 10x10 project, that pursued a  a principled, systematic approach to heterogeneity in computer architecture (see Andrew A. Chien, 10x10 must replace 90/10: the Future of Computer Architecture at the Salishan Conference on High Performance Computing, May 2010). A 10x10 architecture exploits deep workload analysis to drive co-design of a federated heterogeneous architecture that exploits customization for energy efficiency, but federates a set of customized engines to achieve general-purpose coverage.    The 10x10 project built 7 accelerators and federated them in a study that assessed overall benefit.

Publications
  1. (NEW!) Yuqing Wang, Andronicus Rajasukumar, Tianshuo Su, Marziyeh Nourian, Jose M Monsalve Diaz, Ahsan Pervaiz, Jerry Ding, Charles Colley, Wenyi Wang, Yanjing Li, David F. Gleich, Hank Hoffmann, and Andrew A. Chien, Efficiently Exploiting Irregular Parallelism Using Keys at Scale , Workshop on Languages and Compilers for Parallel Computing, Lexington, KY, November 2023.
  2. Andronicus Rajasukumar, UpDown: An Intelligent Data Movement Architecture for Large Scale Graph Processing, Master's Thesis, March 2023, and UChicago Technical Report 2023-03, available from here .
  3. Chen Zou, High-Performance Architectures for Data Center Computational Storage , PhD Thesis, December 2022.
  4. Marziyeh Nourian, Tri Nguyen, Andrew A Chien, Michela Becchi, "Data Transformation Acceleration using Deterministic Finite-State Transducers", 2022 IEEE Conference on Big Data, December 2022, Osaka, Japan
  5. Chen Zou and Andrew A Chien, “ASSASIN: Architecture Support for Stream Computing to Accelerate Computational Storage”, ACM/IEEE MICRO-55, October 2022, Chicago, IL
  6. Chen Zou, Hui Zhang, Yang Seok Ki, and Andrew A. Chien, "PSACS: Highly-Parallel Shuffle Accelerator on Computational Storage" , in the International Conference on Computer Design, ICCD 2021, October 2021.
  7. Chen Zou, Andrew A Chien, Robert Gardner, and Ilija Vukotic, "Computational Storage to Increase the Analysis Capability of Tier-2 HEP Data Sites" , IEEE Cluster 2021, September 7-10, 2021 (poster).
  8. Mandy La, "A Particle in Cell Performance Model on the CS-2", work done on the Cerebras CS-1 and CS-1, Bachelor's Thesis, June 2021.
  9. Hao Jiang, Chunwei Liu, John Paparizzos, Andrew A. Chien, Jihong Ma, Aaron Elmore, "Good to the Last Bit: Data-Driven Encoding with CodecDB", SIGMOD 2021, June 2021.
  10. Arjun Rawal, Exploiting Domain-specific Data Properties to Improve Compression for High Energy Physics Data , Master's Thesis, June 3, 2020.
  11. Chen Zou and Andrew A. Chien, Empowering Architects and Designers: A Classification of What Functions to Accelerate in Storage , CS Technical Report, june 2020.
  12. Olivia Weng and Andrew A. Chien, Evaluating Achievable Latency and Cost: SSD Latency Predictors (MittOS Model Inference) , in Accelerated Machine Learning (AccML) Workshop at HIPEAC 2020, Bologna, Italy, January 2020.
  13. Chen Zou, Andrew A. Chien, John Shalf, Ray Bair, et. al., Project 38: Accelerating Architecture Innovation into Fieldable Extreme-Scale Systems (A Cross-Agency Effort) , Poster at ACM/IEEE International Conference on Supercomputing (SC'19), (Denver, Colorado), November 2019.
  14. Yuanwei Fang, Chen Zou, and Andrew A. Chien, Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture" , in Proceedings of the 45th International Conference on Very-large Databases (VLDB), Los Angeles, August 2019.
  15. Yuanwei Fang, Extreme Acceleration and Seamless Integration of Raw Data Processing , UChicago, Computer Science, PhD Thesis, June 2019.
  16. Chen Zou, Memory Hierarchy Designs for Tiled Heterogeneous Architectures , MS Paper, April 2019.
  17. Arjun Rawal, Yuanwei Fang, and Andrew A. Chien. Programmable Acceleration for Sparse Matrices in a Data-movement Limited World , in Heterogenous Computing Workshop 2019, Rio de Janeiro, Brazil, May 2019. Affiliated with the International Parallel and Distributed Processing Symposium (IPDPS).
  18. Yuanwei Fang, Chen Zou, Aaron Elmore, and Andrew A. Chien. UDP: A Programmable Accelerator for Extract-Transform-Load Workloads and More , in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50), October 2017, Boston, Massachusetts.
  19. Yuanwei Fang, Andrew A. Chien, Andrew Lehane, and Lee Barford. Performance of Parallel Prefix Circuit Transition Localization of Pulsed Waveforms, IEEE International Instrumentation and Measurement Technology Conference, May 23-26, 2016, Taipei, Taiwan.
  20. Tung Hoang, Amirali Shambayati, and Andrew A. Chien, A Data Layout Transformation (DLT) Accelerator: Architectural Support for Data Movement Optimization in Accelerated Systems, Design, Automation and Test in Europe (DATE), 14-18 March 2016, Dresden, Germany, also  Department of Computer Science Technical Report, University of Chicago, March 2015. 
  21. Yuanwei Fang, Tung Hoang, Michela Becchi, and Andrew A. Chien. Fast Support for Unstructured Data Processing: The Unified Automata Processor, in Proceedings of IEEE Conference on Micro-architecture (MICRO-48), December 2015, Honolulu, Hawaii.   Preprint Available
  22. Yuanwei Fang, EFFCLIP + UAP: Unified, Efficient Representation and Architecture for Automata Processing, Masters Thesis, November 2015.
  23. Tung Thanh Hoang, Amirali Shambayati, Henry Hoffmann, Andrew A. Chien, Does arithmetic logic dominate data movement? a systematic comparison of energy-efficiency for FFT accelerators, In Proceedings of the 26th IEEE International Confere nce on Application-specific Systems, Architectures and Processors, (ASAP 2015), Pages 66-67, Toronto, Ontario, August 2015.   Also Department of Computer Science Technical Report, University of Chicago, January 2015.
  24. A. Chien, D. Vasudevan, T. Hoang, Y. Fang, and A. Shambayati, 10x10: A Case Study for Federated Heterogeneous Computing , Computer Architecture News,  Volume 43 Issue 3, May 2015,  Pages 2-9.   Also available at UChicago Computer Science Technical Report 2015-08.
  25. Yuanwei Fang, Andrew Lehane, and Andrew A. Chien.  "EffCLiP: Efficient Coupled-Linear Packing", Dept of Computer Science Technical Report 2015-5,  January 2015.
  26. Dilip Vasudevan and Andrew A. Chien, “BNB: Bit-Nibble-Byte Microengine For Accelerating Low-Level Bit Operations” , in Proceedings of the Great Lakes Symposium on VLSI, (GLSVLSI), Pittsburgh, PA, May 2015.
  27. Yuanwei Fang, Tung Hoang, Michela Becchi, and Andrew A. Chien.  "The Unified Automata Processor", November 2014.
  28. Tung Hoang, Calvin Deutschbein, Hank Hoffmann, and Andrew A. Chien.  “ Performance and Energy Limits of a Processor Integrated FFT Accelerator ”, in High-performance Extreme Computing (HPEC-2014), September 2014, Waltham, Massachusetts. 
  29. Yuanwei Fang, Raihan Rasool, Dilip Vasudevan, and Andrew A. Chien, " Generalized Pattern Matching Micro-engine ", in 4th Workshop on Architectures and Systems for Big Data (ASBD) held with the International Symposium on Computer Architecture (ISCA), June 2014, Minneapolis, Minnesota.  
  30. Amirali Shambayati, Data Layout Transformation Micro-engine: A Specialized Architecture to Manage Data Movements for Performance and Energy Efficiency, Masters Thesis, March 2014.
  31. Andrew A. Chien and Vijay Karamcheti,  Moore’s Law: The First Ending and A New Beginning , IEEE Computer Magazine, December 2013.
  32. P. Cicotti, L. Carrington, and Andrew A. Chien.   Towards Application-specific Memory Reconfiguration for Energy Efficiency , in Proceedings of the First Workshop on Energy Efficient Supercomputing, November 2013, at the ACM/IEEE Conference on Supercomputing.
  33. Apala Guha; Yao Zhang; Raihan ur Rasool; Andrew A Chien.   Calibrating the Relationship between Hardware Customization and Energy Efficiency . University of Chicago, Department of Computer Science Technical Report 2013-04, July 2013.
  34. Cicotti, Carrington, and Chien, Customizing Caches for Energy Efficiency: A Workload Driven Approach , University of Chicago CS-TR-2013-06, available from https://www.cs.uchicago.edu/research/publications/techreports/TR-2013-06.
  35. Apala Guha, Yao Zhang, Raihan ur Rasool, and Andrew A. Chien. 2013. Systematic evaluation of workload clustering for extremely energy-efficient architectures. SIGARCH Comput. Archit. News 41, 2 (May 2013), 22-29.
  36. Yao Zhang, Mark Sinclair II, and Andrew A. Chien,  Improving Performance Portability in OpenCL Programs , in the IEEE International Supercomputing Conference (ISC), June 16-20, 2013, Leipzig, Germany.
  37. Prasanna Balaprakash, Darius Buntinas, Anthony Chan, Apala Guha, Rinku Gupta, Sri Hari Krishna Narayanan, Andrew Chien, Paul Hovland, Boyana Norris ,  Exascale Workload Characterization and Architecture Implications , 21st High Performance Computing Symposium, at 2013 SCS Spring Simulation Multi-conference (Springsim '13), April 7-10, 2013, San Diego, CA. (Best Paper Award Winner!)
  38. Andrew A. Chien and Vijay Karamcheti, Moore’s Law: The First Ending and A New Beginning , IEEE Computer Magazine, 2013. Also available as UChicago CS TR 2012-06.
  39. Rinku Gupta, Prasanna Balaprakash, Darius Buntinas, Anthony Chan, Apala Guha, Sri Hari Krishna Narayanan, Andrew Chien, Paul Hovland, Boyana Norris, Exascale Workload Characterization and Architecture Implications , 2013 IEEE International Symposium on Performance Analysis of Systems Software, April 2013, Poster.
  40. Rinku Gupta, Prasanna Balaprakash, Darius Buntinas, Anthony Chan, Apala Guha, Sri Hari Krishna Narayanan, Andrew Chien, Paul Hovland, Boyana Norris, An Exascale Workload Study , ACM/IEEE Conference on Supercomputing, November 2012, Poster.
  41. Apala Guha and Andrew A. Chien, Systematic Evaluation of Workload Clustering for Designing Heterogeneous, General-purpose Architectures , June 2012, available as UChicago CS TR 2012-05.
  42. Apala Guha and Andrew A. Chien, The 10x10 Foundation for Heterogeneity , January 2012, available as UChicago CS TR 2012-01 
  43. Shekhar Borkar and Andrew A. Chien, The Future of Microprocessors , Communications of the Association for Computing Machinery (CACM), May 2011.   BorkarChien2011,
  44. Mark Gahagan, Allan Snavely, and Andrew A. Chien, 10x10 a General-purpose Architectural Approach to Heterogeneity and Energy-efficiency , International Conference on Computational Science, ICCS 2011  , Singapore, June 2011. ICCS2011.
  45. Andrew A. Chien, 10x10 must replace 90/10: the Future of Computer Architecture , Salishan Conference on High Performance Computing, May 2010.   10x10May2010,

We gratefully acknowledge support for the above architecture research projects from the IARPA ,, National Science Foundation (NSF) Defense Advanced Research Projects Administration (DARPA) , and Keysight Corporation (formerly Agilent).