About Me
I am a Performance & Capacity Engineer at Facebook/Meta. I work in a hardware performance team within Facebook Infra that
- define hardware roadmap for CPU, memory, accelerators, storage.
- build hardware solutions and productionize future platforms.
- evaluate next-gen hardware products with real-world application & traffic.
Before joining Facebook in Oct 2019, I spent 5+ years in CPU architecture/micro-architecture design
- at Samsung working on ARM CPU core that goes into the high-end Exynos SoCs for Samsung's premium Galaxy phones.
- at Hygon working on x86 CPU core & SoC that hits server & HPC market in China.
I received my PhD from University of Wisconsin-Madison working with Prof. Nam Sung Kim in 2015,
and my bachelor from Peking University in 2010.
Industry Experience
-
Performance & Capacity Engineer at Facebook/Meta, Oct 2019 - Present
-
CPU/SoC Performance Architect at Hygon Austin R&D Center, Oct 2017 - Sep 2019
- Worked on building the performance model for AMD Zen1 from scratch; 1 US + 2 CN patents granted
- Architected data prefetcher for next-gen CPU
- designed a pointer engine to capture memory access patterns on complicated data structures
- Build performance study & analysis infrastructure for server CPU architectural design
- drive model vs. RTL correlation; improved correlation ratio from ~70% to 90%+
- build simpoint trace analysis framework, performance study flow, and statistics analysis tool
-
CPU Performance Architect at Samsung Austin R&D Center, Jun 2015 - Oct 2017
- Involved in 4 gens of Exynos mobile CPU design; 2 US patents granted
- Made Exynos M4 score the highest in Geekbench v4 memory latency test among all competitors
- CPU architecture performance modeling and analysis
- develop cycle-accurate performance model, w/ focus on memory system (LS/PF/MMU/L2)
- analyze workload characteristics & architectural performance bottlenecks
- Select of architected performance features:
- memory disambiguation, streaming detection & handling, memcpy optimization in Exynos M3
- spatially correlated prefetcher, cache conflict blueuction technique in Exynos M4
-
Co-Op Engineer at AMD Research, Jan 2012 - Aug 2012
- Joint optimization of OpenCL workload partitioning and dynamic voltage/frequency/core scaling (DVFS)
Eduction
-
Ph.D. in Computer Architecture, University of Wisconsin-Madison, 2015
- Thesis: Heterogeneous processors and memory systems
- Published 6 first-author & 4 second-author papers; 3 US patents granted
- An integrated gem5+GPGPU-Sim simulator: http://cpu-gpu-sim.ece.wisc.edu
-
B.S. in MicroElectronics, Peking University, 2010
Publication
-
TMO: Transparent Memory Offloading in Datacenters
Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, Dimitrios Skarlatos
ACM Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Feb. 2022.
-
Ghost routers: energy-efficient asymmetric multicore processors with symmetric NoCs
Hyojun Son, Hanjoon Kim, Hao Wang, Nam Sung Kim, John Kim
IEEE/ACM International Symposium on Networks-on-Chip (NOCS), Oct. 2019.
-
DUANG: Fast and lightweight page migration in asymmetric memory systems
Hao Wang, Jie Zhang, Sharmila Shridhar, Minwoo Lee, Myoungsoo Jung, Nam Sung Kim
IEEE Int. Symp. on High-Performance Computer Architecture (HPCA), Feb. 2016.
-
Workload-Aware Optimal Power Allocation on Single-Chip Heterogeneous Processors
Jaeyoung Jang, Hao Wang, Nam Sung Kim, Euijin Kwon, Jae Lee
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016
-
Alloy: Parallel-Serial Memory Channel Architecture for Single-Chip Heterogeneous Processor Systems
Hao Wang, Changjae Park, Gyungsu Byun, Jung Ho Ahn, Nam Sung Kim
IEEE Int. Symp. on High-Performance Computer Architecture (HPCA), Feb. 2015.
-
Memory Scheduling Toward High-Throughput Cooperative Heterogeneous Computing
Hao Wang, Ripudaman Singh, Michael Schulte, Nam Sung Kim
IEEE/ACM Int. Conf. on Parallel Architecture and Compilation Techniques (PACT), Aug. 2014.
-
Maximizing Throughput of Power/Thermal-constrained Processors by Balancing Power Consumption of Cores
Abhishek A. Sinkar, Hao Wang, Nam Sung Kim
IEEE Int. Symp. on Quality Electronics Design (ISQED), Mar. 2014.
-
Improving Platform Energy and Chip Area Trade-off in Near-Threshold Computing Environment
Hao Wang, Abhishek A. Sinkar, Nam Sung Kim
IEEE/ACM Int. Conf. on Computer Aided Design (ICCAD), Nov. 2013.
-
Improving Throughput of Many-core Processors Based on Unreliable Emerging Devices under Power Constraint
Hao Wang, Nam Sung Kim
IEEE Micro Magazine, vol. 33, no. 4, July-Aug. 2013.
-
Workload and Power Budget Partitioning for Single-Chip Heterogeneous Processors
Hao Wang, Vijay Sathish, Ripudaman Singh, Michael Schulte, Nam Sung Kim
IEEE/ACM Int. Conf. on Parallel Architecture and Compilation Techniques (PACT), Sep. 2012.
-
Workload-Aware Voltage Regulator Optimization for Power Efficient Multi-Core Processors
Abhishek A. Sinkar, Hao Wang, Nam Sung Kim
IEEE/ACM Design Automation and Test in European (DATE), Mar. 2012.
-
Asymmetric Issues of FinFET Device after Hot Carrier Injection and Impact on Digital and Analog Circuits
Chenyue Ma, Hao Wang, Xiufang Zhang, Frank He, Yadong He, Xing Zhang, Xinnan Lin
IEEE Int. Symp. on Quality Electronics Design (ISQED), Mar. 2010.
Links