Technology, Process and Cost
NVIDIA Tesla P100 Graphics Processing Unit (GPU) with HBM2
By Yole SystemPlus —
TSMC CoWoS – Samsung HBM2 – 2.5D and 3D Packaging
Targeted for High Performance Computing (HPC) and deep learning, the NVIDIA Tesla P100 is the world’s first artificial intelligence supercomputing data center GPU.
It uses various leading edge technologies, including 3D stacked memory with 2.5D integration on a silicon interposer in a Chip-on-Wafer-on-Substrate (CoWoS) process.
Improving memory performance threefold over the NVIDIA Maxwell architecture, the Tesla P100 accelerators are equipped with 12GB or 16GB of second generation high bandwidth memory (HBM2).
HBM2 greatly increases memory capacity and bandwidth over first generation HBM1 technology. HBM1 was limited to 1GB of memory per stack of four dynamic random access memory (DRAM) die with maximum capacity of 256MB and 125GB/sec of bandwidth.
That compares to 8GB of memory per stack of eight stacked DRAM die with maximum capacity of 1GB and 180GB/sec bandwidth for HBM2.
The single 55mm x 55mm 12-layer ball grid array (BGA) package of the NVIDIA Tesla P100 includes more than 3,500 mm² of silicon area. Two industry leaders, TSMC and Samsung, had to come together to deliver this much silicon area in a package.
TSMC is the main provider for the Tesla P100. Using its 2.5D CoWoS platform, it manufactures the GP100 GPU die, featuring a 16nm FinFET process and 15.3 billion transistors.
It also produces a large silicon interposer on top of which the GPU is assembled at the wafer-level with its four HBM2 stacks.
Samsung provides the HBM2 stacks. A 3D assembly process yields HBM2 stacks composed of four 1GB DRAM memory dies and one buffer die, connected with via-middle through-silicon vias and micro-bumps.
The report includes a complete physical analysis of the packaging process, with details on all technical choices regarding process, equipment and materials.
Also, the complete manufacturing supply chain is described and manufacturing costs are calculated.
The report also compares the Tesla P100 with AMD’s Fury X, which uses HBM1 and 2D assembly, to explain the interest in evolution through the HBM2 and CoWoS 2.5D platforms.
Finally, it describes NVIDIA’s key module design and related process choices.
REVERSE COSTING WITH
- Detailed photos and cross-sections
- Precise measurements
- Material analysis
- Manufacturing process flow
- Supply chain evaluation
- Manufacturing cost analysis
- Estimated sales price
Overview / Introduction
Company Profile
Physical Analysis
- Physical analysis methodology
- RCP SiP Packaging analysis
- Package view and dimensions
- Package x-ray view
- Package opening: RDL, line/space width
- Package cross-section: RDL, bumps, Via Frame
- Physical Analysis Comparison
- SiP vs discrete
- TSMC’s inFO
- Shinko’s MCeP
- Die analysis: APE, PMIC, Flash Memory
- Die view and dimensions
- Die cross-section
- Die process
Manufacturing Process Flow
- Die Fabrication Unit: APE, PMIC, Flash Memory
- Packaging Fabrication Unit
- RCP SiP Package Process Flow
Cost Analysis
- Overview of the Cost Analysis
- Supply Chain Description
- Yield Hypotheses
- Die Cost Analyses: APE, PMIC, Flash Memory
- Front-end Cost
- Wafers and Die Costs
- RCP SiP Package Cost Analysis
- RCP SiP wafer front-end Cost
- RCP SiP cost by process step