Dead Hackers Society
Select »
 
 

Sommarhack 2024

Silly Venture 2024 SE
External scenelinks  Special feature  Online compos  IRC #atariscne  Garfield vs Atariscene

Select article
Interview with Kalms/TBL
OSX/Atari GCC crossdev
Atari GCC development
PowerPC history
SatanDisk
Overscan
Multicolour
  dhs.nu special feature #4
 
Paranoid of Paradox writes about the architecture and history of the PowerPC micro processors.
Published November 15, 2007
 


The unpopular successor - The PowerPC
The Paranoid / Paradox, paranoid(at)atari.org
 

 
History Obviously, the PowerPC-Processor failed. Failed to keep up with the mighty Intel x86, failed to conquer the desktop PC market, failed to attract new customers, even failed to attract Apple and as a result, Apple now uses Intel x86 instead of PowerPCs.
On the other hand, the PowerPC is found in Nintendo's GameCube and Wii, even in Microsoft's Xbox360 and has been the basis for the Sony and IBM cooperation, the Cell processor that powers the Playstation 3. The PowerPC is managing most of today's car engines as part of the engine control unit, is found in many gearbox control units and intelligent communication systems.
 
It seems like the PowerPC has quite a story to tell. So this is a fair attempt to give a brief overview of the history of this processor architecture, its strong and weak points, a programmer's view of the system and its future.
 
 
Actually, the history of the PowerPC is fairly complex and not easy to recapture because it involves two major companies that are not really famous for disclosing information of that kind, namely IBM and Apple. Still, IBM obviously must be considered the father of the PowerPC. Initially, IBM designed the RISC system called 801 in the late 70s. While this spawned off other projects such as the 16-Register ROMP processor intended for intelligent office devices, still, the lack of performance lead to the America Project, which resulted in the creation of the POWER-architecture - an acronym for "Performance Optimisation With Enhanced RISC" - that finally lead to the creation of IBM's high-end Unix workstations named RS/6000. However, the first incarnations of POWER processors were complex multi-chip systems and IBM started to work on single-chip solutions for lower-end workstations and maybe even desktop processors.
In order to increase acceptance and not be limited to IBM's own product palette, IBM approached Apple and offered to design a desktop version of the POWER architecture in cooperation. Back then, Apple was using Motorola 680x0-processors in their Macintosh series - which were fairly powerful back then but suffered from relatively low clock rates and were therefore harder to advertise with. Motorola had been working on their own - rather luckless - 88000 RISC processor series and to benefit from Motorola's experience, Apple offered Motorola to join in. The so-called AIM alliance was founded.
 
The goal was to scale down the POWER architecture to a slightly less powerful, yet much cheaper system that would be nicknamed PowerPC and the first work-product was the PowerPC Instruction Set Architecture (ISA). The first silicon implementation was the PowerPC 601, sometimes also referred to as Generation 1. Allowing clock rates around 75MHz and implemented in both Apple's PowerMac as well as low-end IBM RS/6000 workstations - often used as front-end workstations for CAD/CAM-systems running AIX - the PowerPC initially attracted even Microsoft to port Windows NT 3.51 to it, so did IBM with OS/2. However, Apple's PowerMac was suffering from the fact that most of the available software - and even parts of the operating system - relied on an internal 68k emulation that consumed a measurable amount of CPU power. Also, the PPC601 was compatible to both the POWER as well as PowerPC instruction sets, making the PPC601 way more complex and expensive than it could have been.
 
As a result of that, Apple and IBM decided to take a different approach for the second generation. IBM focussed on the development of a high-end PowerPC also intended for use in low-end workstations while Motorola focussed on the low-cost PowerPC, intended for small desktop computers and notebooks. While the PPC604 performed fairly well and was used in IBM's RS/6000 43P- series, the PPC603 suffered from a rather smallish L1 cache that severely slowed down the 68k emulation. Motorola later on solved this by releasing the PowerPC 603e that had an increased L1 cache and allowed clock speeds up to 200MHz, making it compete with the more expensive PPC604.
 
However, the PPC603 and its initial low performance made the PowerPC logo on the front side of every PowerMac look rather uncool and outdated, especially now that Intel had started to aggressively advertise the "Pentium" processor by the trademark jingle and the "Intel Inside" logo. Apple needed to compete, so that the development of the next generation of PowerPC processors was started, to be marketed as "Generation 3" or "G3" processors. The PPC750, again a cooperation between IBM and Motorola, had an enhanced integer and floating point unit, on-chip support for L2 caches, faster bus speeds and allowed core clock rates around 300MHz. Apple used the new "G3" tag on both the new iMac computers and the new G3 PowerMac, keeping up with Intel's newest x86 processors for a while. However, Intel soon broke the 500MHz border, which was highly important for marketing purposes.
 
As a result, Apple pushed the development of the Generation 4, a close cooperation of Apple and Motorola. While this PPC7400 had many minor internal improvements, it was mainly famous for the introduction of a highly powerful Vector Unit named AltiVec (nicknamed "Velocity Engine" by Apple). Initially, Motorola promised clockspeeds of 500MHz but could not deliver. Naturally, Apple had already advertised a 500MHz PowerMac and now took the blame for not being able to deliver, severely spoiling business relations to Motorola. However, not only did Motorola manage to produce 500MHz processors from February 2000 on, they also managed to revise the initial design in 2001 as PPC7450 (G4+), allowing clockspeeds up to 1.67GHz, increasing the number of independant integer and vector units and adding support for a L3 cache while keeping power consumption and heat dissipation low, making the G4 a perfect solution for small computers (MacMini) and notebooks.
 
Still, the business relations between Motorola and Apple had severely suffered when Motorola also announced to reorganize the company which also meant the outsourcing of the semiconductor business as "Freescale", which clearly showed little interest in the desktop processor market together with Apple since it meant low sales and low revenue. To actually being capable of introducing a new high-end PowerMac, Apple returned to IBM. IBM however had lost track of the PowerPC development and created the so-called "G5", the PPC970, from IBM's very own POWER4 architecture. While that meant the benefits of having a dual-core, L3 supporting high-end architecture, it also meant higher power consumption, bulky chips and increased heat dissipation. As a result of that, Apple had major problems in using the chip in the new iMac design and completely failed to implement the chip in notebook computers. Soon after, IBM gave up the non-embedded PowerPC division anyhow to a company named AMCC (Applied Micro Circuits Corporation). The management at Apple decided to discontinue the support for the PowerPC architecture and teamed up with Intel.
 
 
Internals Before looking at the PowerPC's future, let's have a closer look at its internals. What is special about the PowerPC, where are its weaknesses, where are its strong elements and how is it being programmed ?
 
The main important aspect of the PowerPC is that it's a true and complete 32-Bit RISC system. All datapathes are (at least) 32-Bits wide, the later incarnations often feature internal 64-Bit data busses, pipelines and prefetch units. Still, the way the bits are numbered is slightly special:
Bit   0 . . . . . . 7  8 . . . . . .15 16 . . . . . . . .23 24 . . . . . . .31
Val. 31            24 23            16 15                 8  7               0
    2             2  2             2  2                  2  2               2
In other words, the bit "0" represents the most significiant bit or the value 2 to the power of 31 while bit "31" represents the least significant bit or the value of 2 to the power of 0, meaning 1. Later revisions even feature 64-Bit registers of which the lower 32 Bits are compatible to the PowerPC ISA. Due to the bit numbering convention, this implies that Bits 32 to 63 are being introduced on the least significant side, making Bits 32 to 63 being the ones manipulated by the PowerPC instruction set, enabling bits 0 to 31 being available for extensions. Also, the terminology is fairly special. While PowerPC documentation refers to an 8-Bit operand as a "Byte", a 16-Bit operand is referred to as a "Half Word" and a 32-Bit operand is named "Word". The "AltiVec" vector unit then introduces the "quad-word" for a 128-Bit operand.
 
However, the PowerPC is commonly neither a little nor a big endian. In fact, a little logic in its address generation could be used for byte swapping on aligned operands,allowing the PowerPC to be used in both big- and little endian environments - The only exception is the G5 processor which is big endian only.
 
Furthermore, the PowerPC has 32 general purpose registers performing basically any kind of instruction, arithmetics, logical, shift, multiply and divide instructions, they also serve as address and index registers. Besides these general purpose registers, the PowerPC has special instructions to operate a set of non-memory mapped but numbered special purpose registers, usually called SPR to serve special purposes.
Among them is, for example, the Link Register (LR) which is being exploited as a shadow program counter and is used for subroutine/interrupt management, or the Count Register (CTR), which can be used for 0-cycle loops or as branch target register and various others, depending on the used CPU core, most of are "read-only" for application software but also providing User Special Purpose General Registers (USPRG) which can be used to exchange data between application and driver software.
 
Naturally, the PowerPC supports both user- and supervisor modes, much like the 680x0 series. While the registers above refer to software running in user mode, the supervisor mode allows to access a lot more SPRs that are, however, slightly more hardware dependant and are not being explained in detail here. Naturally, the intention is to run operating systems and low level drivers requiring hardware access in supervisor mode and to run application software in user mode. To make sure this separation is reliable, the PowerPC does not only sport privileged instructions only allowed in supervisor mode but also a paged memory management unit that allows to individually configure memory pages for user or supervisor (or both) access, it also allows setting flags for read only, read-write and execute.
 
As a matter of fact, the most powerful characteristic of the PowerPC is rather invisble to the user: The superscalarity. As such a system, the PowerPC is actually capable of executing more than 1 instruction simultaneously, namely integer instructions, branch processing, Load/Store operations, and, if available, Vector instructions and Floating Point Operations, too. Also, many of these units exist multiply, the MPC7450 for example has 4 independant integer units. Due to the RISC-based Load/Store architecture, no PowerPC instruction can operate on two of these units at once. This allows a program to perform a Load or Store operation while several integer, vector or floating point units perform calculations while at the same time a branch is being processed. Along with prefetch units and internal pipelines, the PowerPC is capable of executing more than 1 instruction per cycle.
 
 
The Good, The Bad & The Ugly From a programmer's point of view this makes the PowerPC a rather unique system with strong and weak parts, which are briefly explained below.
 
PowerPC - The good!
  • A pure 32-Bit RISC system
    Implies a small number of highly flexible instructions, a fixed instruction size of 4 bytes that implies optimum usage of pipelines and prefetches. The fairly abstract instruction set operates equally on all registers, eliminating all exceptions regarding addressing modes, target registers etc.,
     
  • Large set of General Purpose Registers
    mean a sufficient amount of data storage registers which are all equally powerful and capable of executing all PowerPC instructions, also, a standard of how to use these registers to maintain compatibility between object files or library routines is given by the PowerPC EABI,
     
  • Superscalar architecture
    allows the processor to perform several instructions in one cycle, especially due to the side-effect free pipeline and the option to protect critical code by forcing in-order execution,
     
  • Implicit L1 cache support
    which speeds up repeated bus access tremendously by being multiple-way associative and completely transparent to the user once it has been set up correctly.
     
PowerPC - The bad!
  • A pure 32-Bit RISC system
    which also means that any instruction, even a simple increment, requires 4 bytes, making code bulkier than on some other systems, also, the reduced instruction set leads to a large set of "virtual instructions" - meaning that a special case of a more powerful RISC instruction is assigned its own mnemonic but has no individual opcode,
     
  • Large Set of General Purpose Registers
    means the lack of a system stack pointer. In fact, the PowerPC does not sport a stack pointer. Instead, it offers "shadow registers" to back up register content in certain situations which must then be saved by software, commonly using one general purpose register to emulate a system stack pointer, thus increasing software overhead for routine duties,
     
  • Superscalar Architecture
    makes coding efficiently more difficult. To gain the maximum processing power out of a PowerPC it is necessary to re-arrange parts of the code to allow synchronous instruction execution and to minimize pipeline stalls - Some routines, such as a memcpy for example, is almost impossible to optimize for a superscalar system since the integer, vector or floating point unit is basically not needed for this purpose,
     
  • Implicit L1 Cache support
    does not necessarily imply memory coherence. Intelligent peripherals with direct memory access require special treatment in order to guarantee data coherency.
     
PowerPC - The ugly!
  • A pure 32-Bit RISC System
    means that limiting every instruction to 4 Bytes makes usage of absolute parameters of 32-Bit size impossible. To actually load a 32-Bit absolute number, the PowerPC requires two separate instructions, increasing software overhead measurably,
     
  • Large Set of General Purpose Registers
    but no system stack pointer introduces software overhead for stack emulation and creates critical code sections that may in no case be interrupted, requiring even more software overhead. Also, the PowerPC does not support a single-instruction context switch (even though there is a load/store multi-word instruction in later revisions of the PowerPC),
     
  • Superscalar Architecture
    finally implies that optimized code becomes totally unreadable and basically impossible to debug and finally,
     
  • Implicit L1 Cache support
    spoils deterministic timing of code execution. While this is normally not relevant for ordinary applications, certain safety-critical code segments are required to behave fully deterministic which leads to additional constraints on the software.
     
     
PowerPC vs 680x0 Especially when comparing the PowerPC with the 680x0 architecture - which is by now being advertised as variable instruction length RISC - the differences are easy to spot:
  • The 680x0 features instruction length of 2 to 10 Bytes, allowing even 2 parameters of 32-Bit width per instruction which - depending on the optimisation level - requires up to 4 instructions on the PowerPC.
     
  • The 680x0 is not a Load/Store architecture, which is one of its biggest advantages. The 680x0 is capable of reading and writing memory in a single instruction, which is impossible on the PowerPC.
     
  • The 680x0 accepts a move-multi instruction, allowing to save all relevant registers in a single instruction or to load them in a similar manner, providing explicit support for multitasking environments.
     
  • The 680x0 features a system stack pointer that is being managed internally on several instructions or events, such as branch to subroutine, return from subroutine or interrupt execution, keeping software overhead rather low.
     
  • The 680x0 instruction set is easy to understand and operate with a most powerful move instruction that replaces many kinds of load, store and transfer instructions of other processor families.
     
  • While the 680x0 features a separate set of 8 Data and 8 Address Registers, many operations can be performed on both, also, the 680x0 sports a huge set of effective addressing modes, of which most apply to all instructions.
     
  • The original 68000 has been a 16-Bit processor capable of executing 32-Bit operations without software overhead, the later models from 68020 on have been true 32-Bit processors, of which the later models also sport cache support, prefetch units and pipelines. In fact, the 68060 is a superscalar system, too.
     
While the above list seems to show how much more powerful an 680x0 processor is with regard to the PowerPC architecture, most of the points listed have restrictions or bear other constraints the PowerPC covers up for in a certain way:
  • The variable length instruction set of the 680x0 makes the organisation of pipelines and prefetches difficult, especially the separation into code and data access. As a result of this, pipelines and prefetch units become more complex, making the processor either more expensive or less powerful. The PowerPC's fixed 32-Bit instruction set is a lot easier to manage through prefetches and pipelines, making the system less complex, more powerful and usually cheaper.
     
  • It's very similar regarding the 680x0 not being a Load/Store Architecture. Having single instructions capable of accessing memory twice affects cache, pipeline and buffering logic, making the system more complex again. The PowerPC can only perform a single memory access per instruction, again leading to a less complex access logic.
     
  • While the 680x0 does indeed feature a system stack pointer, many operations that internally manage the system stack are multi-cycle instructions, for example JSR and RTS. The PowerPC needs software overhead to actually emulate a system stack, however, most of these instructions run in a single cycle, making up for the lack of an internal system stack.
     
  • Also, the 680x0 supports fast context switching by providing move-multi instructions, still this kind of instruction requires many cycles to perform and bears other restrictions regarding effective addressing modes. While the PowerPC does not have this kind of support for fast context switching, the fact that most instructions execute in a single cycle allows the PowerPC to compete under most conditions.
     
  • Even the 680x0 has virtual instructions, too. There is, for example, no "clr" instruction that operates on address registers. Still, many assemblers accept such a "clr" instruction and convert it to a "suba" internally. Also, the 680x0 instruction set has restrictions regarding address modes for certain instructions that the PowerPC has not.
     
Finally, the 680x0 has indeed gone through an evolution similar to that of the PowerPC, still, it has always been more complex and therefore more expensive for 680x0 processors due to the different architecture. In other words, while an architecture such as the PowerPC favours caches, pipelines and prefetches, even an internal separation of code and data flow (internal Harvard architectures), the 680x0 does not as easily. While it is without a doubt possible, it is more complex in any way.
 
 
PowerPC vs x86 Nevetheless, the main competitor of the PowerPC has been the x86 series, not the 680x0 series. So how does a PowerPC perform in comparison to this processor family ?
First of all, the PowerPC is easier to scale. Having been a RISC system right from the beginning on, the PowerPC counts a lot less transistors than an x86, and is easier to optimize with regard to power consumption, heat dissipation and size. On the other hand, PowerPC does mean more software overhead for certain services which an x86 handles with less software interaction. X86 has a CISC-based instruction set and a small amount of registers of which some are even special purpose, individual operations can be decoupled from the rest of the system and optimized - Something the PowerPC does not allow. Also, this feature of the x86 allows more, longer and in some parts customized pipelines, which the PowerPC can not do. As a side-effect, parts of an x86 can be "overclocked" with regard to the rest of the CPU, which the PowerPC can't - The pipelines of x86 are commonly considered "long and thin" while those of the PowerPC are usually called "short and fat", which is neither an advantage nor a disadvantage: The x86 pipelines allow higher clock rates but get less work done per cycle, the PowerPC pipelines are more efficient but cannot be clocked as high. Naturally, this has its side-effects: X86 processors are more complex, more expensive, tend to show excessive heat dissipation and power consumption when operated at maximum workload. Also, most PowerPC instructions finish in a single cycle while most of the x86 instructions being used by the software effectively do not due to the major revisions on the x86 instruction architecture (RISC86). It should be noted nonetheless, that especially the later PowerPC incarnations such as the PPC7450 suffer from a rather outdated bus-logic that reduces data throughput in comparison to the more advanced x86 bus systems - While this is not an inherent malus of the PowerPC architecture itself, it still affected the performance of the existing PowerPC systems in comparison to x86 systems.
 
Still, it hasn't been the technical differences that made x86 conquer the world, a PowerPC is not - necessarily - less powerful than an x86 processor. In the end, it has been much rather the market position. Not only do several producers of x86 processors compete with each other, they are also forced to expand the market to enlargen the customer basis by attracting users that would otherwise have bought high-end workstations or servers. Motorola and IBM were not only not competing with each other, they also only needed to serve one customer, Apple. Due to the fact that the PowerPC is easy to introduce to embedded systems because of low power consumption, better scalability and high reliability with regard to rough conditions, Motorola soon focussed on the embedded market. IBM first lost interest, sold its PowerPC-department to the AMCC, now IBM supplies Nintendo, Microsoft and even Sony - The high interest in PowerPC has lead to the foundation of Power.Org, which gives other companies the option of taking part in the PowerPC development.
 
So, while the PowerPC has failed to establish itself in desktop computers, it runs in many embedded applications without the user even noticing the 32-Bit superscalar RISC core. And one thing is for certain: Even though the world will probably not see desktop computers employing PowerPC processors in large scales anymore, the story of the PowerPC is far from being finished.
 
 
The Paranoid of Paradox
for Dead Hackers Society in 2007
 
 
References References used in the WWW: References used in written form:
  • MPC5554/MPC5554 Reference Manual, Rev. 3.1, Freescale
     
  • MPC5566 Reference Manual, Rev 0.1, Freescale
     
  • e200z6 User's Manual, Rev. 0.1, Freescale
     
  • PowerPC Book E, Third Revision, 2002
     
  • Motorola RISC CPU (RCPU) Reference Manual, Freescale
     
  • MPC7450 Reference Manual, Rev. 5, 2005, Freescale
     
Tools, Soft- and Hardware used:
  • Metrowerks CodeWarrior v1.5 for MPC55xx, Metrowerks 2004
     
  • Freescale CodeWarrior v2.1 for MPC55xx, Freescale 2007
     
  • Lauterbach Trace32 On-Chip Debugger
     
  • Motorola MPC5554 Evaluation Board
     
© 1994-2024 Dead Hackers Society Contact: Anders Eriksson