How GPUs are made
These really are, as Paul Simon sang, the days of miracles and wonders. It seems almost impossible to believe that engineers have now managed to design and build a machine whose components measure as little as 40nm across. That's just one-thousandth the diameter of a human hair.
Yet we're making such devices right now. They're called general processing units, and they're starting to challenge the central processing unit for its long-held title of the most magical piece of engineering found in a computer.
What threat is the CPU facing? Consider a top-of-the-range desktop processor like the quad-core Sandy Bridge variant of Intel's Core i7. It boasts just short of a billion transistors. A leading edge general processing unit like AMD's Cayman, as used in the Radeon HD 6970 graphics card, clocks up a massive 2.64 billion transistors.
Given that it boasts no fewer than 1,536 shader processors, 24 SIMD (single instruction multiple data) engines and 32 ROPs (raster operator units), this perhaps isn't surprising. This is the story of how AMD GPUs are made - how an idea becomes silicon using some of the most advanced and intricate engineering modes and mechanisms. Read on as we delve into the real days of miracle and wonder.
1. The high level design
Designing a general processing unit doesn't start with any thought of transistors or copper tracks, but with something called the product requirement specification, or PRS - a prioritised definition of all the features the new chip must have. It might not sound wildly exciting, but the PRS acts as the checklist throughout the whole design process.
Given that design is a very costly exercise, in terms of time and money, it's vitally important that the PRS provides an adequate answer to the question: 'What exactly is it that we're trying to build here?'
Typically it will take six months to complete the PRS. Thousands of engineers, including architects, hardware designers, board designers, validation engineers, software engineers and firmware/BIOS engineers will be involved, as will representatives from product management, technology management and developer relations.
The document takes the form of a database and could include over 1,000 features, each of which could be anything from an odd sentence to a 100-page specification.
Another output from the high level design - one that most technically savvy PC users will be familiar with - is a block diagram. Although it bears no resemblance to how the elements of the GPU will be arranged on the chip, it includes each of the major functional blocks and shows how signals pass between them.
2. Floorplan and netlist
Teams of engineers now set to work on two distinct areas of the design. First, the floorplan must be defined. This is a physical representation that will take account of how large each block is expected to be and where it should be positioned relative to other blocks.
Here, account is taken of how many signals pass between the blocks with the aim of reducing the lengths of the pathways. Meanwhile, other engineers work on the component level design of each of the blocks.
However, this is nothing like electronic circuit design as envisaged by the layperson. Instead of a circuit diagram, the design is created in a hardware description language like VDHL. If you're interested, the compound acronym stands for very-high-speed integrated circuits (that's what the 'V' stands for) hardware description language.
Looking much like a programming language, this way of generating circuits provides many of the benefits on offer to the software engineer. Most importantly circuits can be defined hierarchically so, for example, having defined a logic OR gate from individual transistors, this can be used in the definition of a more complicated block like a one-bit adder.
In the same way, increasingly sophisticated building blocks are built up by reusing what's already been created. Often the designer won't even have to define the building blocks, because they'll be available from third-party libraries.
When the VDHL code is complete, it goes through a process called synthesis, which is the equivalent of compiling a programming language. Whereas compilation of a programming language checks the code for errors and, once it's error-free, generates a file containing individual processor instructions, the output of synthesis is called a netlist and it defines the connections between each and every component, including those 2.64 billion transistors.
3. Circuit verification and emulation
The netlist could go directly into the mask making process, but this would be asking for trouble because designs as complicated as a GPU are never 100 per cent correct on the first attempt. What's more, given that a set of masks could cost $1million, testing the design on real silicon would be prohibitively expensive.
Instead the design is verified and emulated - a hugely processor-intensive operation that requires supercomputing resources. Verification involves testing individual blocks with perhaps thousands of tests per block. Each time anything fails, the design team backtracks to correct the errors and then performs a full set of simulation tests to make sure the remedial action hadn't broken something that previously worked correctly.
Once all the individual blocks are operational, the team moves to emulation. This means exercising the GPU as a whole, but given the amount of processing time needed to simulate a multi-billion transistor chip, these tests might initially be nothing more complicated than drawing a single pixel.
In addition to functional testing, emulation also ensures that the chip meets its requirements in terms of processing speed.
4. Making the masks
With the simulation out of the way the designers know that the circuit connections are correct, but so far, with the exception of the top-level floorplan, no thought has been given to where the components go on the chip. This is carried out using a special CAD package, driven by the floorplan.
This largely automated process places each component and routes the copper tracks that will ultimately connect them all together. The culmination of this process is a major milestone referred to as 'tape out', and marks the transition from design to fabrication.
Since AMD is a fabless semiconductor company, this is also the point at which it hands the baton to TSMC, its chosen foundry for GPUs. Before any chips can be manufactured though, the foundry needs to create a set of photographic masks that will be used in the photolithography - one for each of the many layers by which the circuit is built up on the chip.
Using the data supplied at tape out (which can be thought of as images of the patterns on each layer), the masks are created as a patterned layer of opaque metallic chromium on the surface of quartz glass.
How GPUs are made, continued...
5. Photolithography
Photolithography is the key to many subsequent steps involved in making the GPU, and although we're going to introduce it here, it will be used over and over again as the circuit is built up, layer by layer on the silicon wafer. It involves applying a patterned mask to the surface of the wafer so that subsequent chemical processes only affect those areas with gaps in the mask.
The letters (a) to (e) in the following description correspond to the steps in the diagram. First, a layer of photosensitive material called photoresist is applied on top of any layers that have already been created (a). This is done by putting solution on the wafer and then spinning it so the solution spreads into a thin, even layer.
When the solution has dried, the wafer is exposed to ultraviolet light (UV) through one of the masks (b). This process changes the chemical composition of the photoresist where the mask allows the ultraviolet light to pass through. The wafer is immersed in a tank of developer that dissolves away those portions of the photoresist that had been exposed to the UV light (c).
With a partial layer of photoresist now in place on the wafer, it's possible to carry out a chemical process that will only affect the wafer in those areas where the resist has that removed - we'll see an example of how this works in step 6 (d).
With the chemical process now completed, the remainder of the photoresist can be removed from the wafer using a solvent (e).
The silicon wafer will contain hundreds of individual chips, or dies to give them their correct name, so the exposure stage above (b) is carried out several times - once for each die - with the wafer being moved relative to the mask and the optical system between each exposure.
6. Patterned oxide layer
We saw in step 5 that the layer of photoresist forms a suitable barrier to many chemicals, thereby allowing a chemical process to be carried out only on portions of the wafer as defined by a mask. Other processes - most notably those involving hot gases - would destroy the photoresist, so a different type of resist is needed.
In these cases, a patterned oxide layer, otherwise known as a sacrificial oxide layer (because it's subsequently removed), is used as described in the following steps (a) to (d). Again, the letters relate to the diagram.
The wafer is covered with a layer of silicon dioxide, which completely coats all existing layers (a). The processes described in step 5 are now carried out (b), the chemical process referred to in part (d) being the dissolving of silicon dioxide using hydrofluoric acid.
The end result, therefore, is a partial layer of silicon dioxide in the pattern of the required features. The necessary chemical process is carried out - this will affect only those portions of the wafer where the patterned oxide layer is missing (c). The remnant of the oxide layer is removed, again using hydrofluoric acid (d).
7. Creating the transistors
A MOSFET (the type of transistor used in GPUs) is an electronic switch. In other words, it's an electronic component that uses a signal on one circuit to control the flow of current in another. This is the most fundamental requirement in digital electronics.
Here we see how an n-channel MOSFET is created, but p-channel MOSFETs are also required. They differ only in that one has n-type material where the other has p-type, and vice versa. In the following description, changes are made selectively to parts of the wafer by use of either a layer of photoresist as described in step 5, or a patterned oxide layer as described in step 6.
The wafer is bombarded with phosphorous ions that implant themselves into the silicon through the gaps in the photoresist to create so-called wells of n-type material. This is a modified form of silicon that has additional electrodes to carry an electrical current.
Next, two smaller islands of p-type material are created within the n-type wells - these form the two electrodes known as the source and the drain of the MOSFETs. After this, a very thin insulating layer of silicon dioxide, just a few molecules thick, is deposited on the surface of the silicon between the source and the drain.
This is done using chemical vapour deposition (CVD), a process that takes place in a furnace filled with gases to chemically modify the silicon.
Finally, again using CVD, a layer of silicon is applied over the oxide layer to create the MOSFETs' third and final electrode, which is called the gate.
8. Connecting everything
We now have a wafer comprising several dies, each of which contains billions of transistors, but to convert these from isolated components into a working circuit they have to be connected using copper tracks.
First, an insulating layer of silicon dioxide is applied to the wafer so that the interconnecting tracks don't short all the MOSFETs together. Next, holes are etched in the silicon dioxide so that connections can be made to the MOSFETs' electrodes.
Then, trenches in the shape of the tracks are etched into the silicon dioxide before a layer of copper is applied by electro-plating. This covers the entire surface of the silicon dioxide, and fills the trenches and the holes to make contact with the MOSFETs.
Finally, the excess copper is removed using a process called chemical-mechanical polishing so that copper only remains in the trenches and holes. A single layer of copper interconnections isn't nearly enough to create a viable circuit. Since it isn't possible to connect everything in a single layer without making shorts, additional layers are used, each created in the same way as the first copper layer.
Increasing the number of layers can reduce the size of the chip, and some layers have to be dedicated to providing power, so ten or more layers isn't unusual in top-end chips.
9. Testing
The initial manufacturing processes are now complete to the extent that the wafer will contain several hundred dies, although typically not all of them will be functional.
The next task is to test the dies one at a time. This is carried out using a sophisticated piece of equipment called a wafer prober, which makes electrical contact with microscopic pads on the dies, ensuring correct registration by use of optical image recognition techniques. In this fully automated process, the machine remembers which of the dies passed the test.
Typically, for a state-of-the-art chip, the yield (the percentage of working dies), is in the 50-60 per cent area, although this will often improve as the manufacturing methods mature.
10. Packaging
Once the testing process is complete, the wafer is sawn up into individual dies and the non-functional ones are discarded. The final step is to take the working dies - tiny rectangles of silicon that are far too flimsy to be used as regular electronic components - and package them into what most people think of as a chip, ready to be soldered onto a circuit board.
This involves bonding the die onto a substrate, and making connections between the minuscule pads on the die and the somewhat larger solder bumps that will eventually be used to make the electrical connections on a graphics card.
A final functional test is carried out to ensure that the packing has been successful, and this miracle of digital technology is then ready to be shipped to the graphics card manufacturing facility.
We'd like to express our thanks to David Nalasco, Senior Technology Manager at AMD, who explained the intricacies of the design process to us.
0 comments:
Post a Comment