Select article
Interview with Kalms/TBL
OSX/Atari GCC crossdev
Atari GCC development
PowerPC history
SatanDisk
Overscan
Multicolour
|
|
dhs.nu special feature #4
Paranoid of Paradox writes about the architecture and history of the PowerPC micro processors.
Published November 15, 2007
The unpopular successor - The PowerPC
The Paranoid / Paradox, paranoid(at)atari.org
History
Obviously, the PowerPC-Processor failed. Failed to keep up with the mighty
Intel x86, failed to conquer the desktop PC market, failed to attract new
customers, even failed to attract Apple and as a result, Apple now uses
Intel x86 instead of PowerPCs.
On the other hand, the PowerPC is found in Nintendo's GameCube and Wii, even
in Microsoft's Xbox360 and has been the basis for the Sony and IBM cooperation,
the Cell processor that powers the Playstation 3. The PowerPC is managing
most of today's car engines as part of the engine control unit, is found in
many gearbox control units and intelligent communication systems.
It seems like the PowerPC has quite a story to tell. So this is a fair
attempt to give a brief overview of the history of this processor architecture,
its strong and weak points, a programmer's view of the system and its future.
Actually, the history of the PowerPC is fairly complex and not easy to
recapture because it involves two major companies that are not really famous
for disclosing information of that kind, namely IBM and Apple.
Still, IBM obviously must be considered the father of the PowerPC. Initially,
IBM designed the RISC system called 801 in the late 70s. While this spawned
off other projects such as the 16-Register ROMP processor intended for
intelligent office devices, still, the lack of performance lead to the
America Project, which resulted in the creation of the POWER-architecture -
an acronym for "Performance Optimisation With Enhanced RISC" - that finally
lead to the creation of IBM's high-end Unix workstations named RS/6000.
However, the first incarnations of POWER processors were complex multi-chip
systems and IBM started to work on single-chip solutions for lower-end
workstations and maybe even desktop processors.
In order to increase acceptance and not be limited to IBM's own product
palette, IBM approached Apple and offered to design a desktop version of
the POWER architecture in cooperation. Back then, Apple was using Motorola
680x0-processors in their Macintosh series - which were fairly powerful back
then but suffered from relatively low clock rates and were therefore harder
to advertise with. Motorola had been working on their own - rather luckless -
88000 RISC processor series and to benefit from Motorola's experience, Apple
offered Motorola to join in. The so-called AIM alliance was founded.
The goal was to scale down the POWER architecture to a slightly less
powerful, yet much cheaper system that would be nicknamed PowerPC and the
first work-product was the PowerPC Instruction Set Architecture (ISA).
The first silicon implementation was the PowerPC 601, sometimes also referred
to as Generation 1. Allowing clock rates around 75MHz and implemented in both
Apple's PowerMac as well as low-end IBM RS/6000 workstations - often used as
front-end workstations for CAD/CAM-systems running AIX - the PowerPC initially
attracted even Microsoft to port Windows NT 3.51 to it, so did IBM with OS/2.
However, Apple's PowerMac was suffering from the fact that most of the
available software - and even parts of the operating system - relied on an
internal 68k emulation that consumed a measurable amount of CPU power. Also,
the PPC601 was compatible to both the POWER as well as PowerPC instruction
sets, making the PPC601 way more complex and expensive than it could have been.
As a result of that, Apple and IBM decided to take a different approach for
the second generation. IBM focussed on the development of a high-end PowerPC
also intended for use in low-end workstations while Motorola focussed on
the low-cost PowerPC, intended for small desktop computers and notebooks.
While the PPC604 performed fairly well and was used in IBM's RS/6000 43P-
series, the PPC603 suffered from a rather smallish L1 cache that severely
slowed down the 68k emulation. Motorola later on solved this by releasing the
PowerPC 603e that had an increased L1 cache and allowed clock speeds up to
200MHz, making it compete with the more expensive PPC604.
However, the PPC603 and its initial low performance made the PowerPC logo on
the front side of every PowerMac look rather uncool and outdated, especially
now that Intel had started to aggressively advertise the "Pentium" processor
by the trademark jingle and the "Intel Inside" logo. Apple needed to compete,
so that the development of the next generation of PowerPC processors was
started, to be marketed as "Generation 3" or "G3" processors. The PPC750, again
a cooperation between IBM and Motorola, had an enhanced integer and floating
point unit, on-chip support for L2 caches, faster bus speeds and allowed core
clock rates around 300MHz. Apple used the new "G3" tag on both the new iMac
computers and the new G3 PowerMac, keeping up with Intel's newest x86
processors for a while. However, Intel soon broke the 500MHz border, which was
highly important for marketing purposes.
As a result, Apple pushed the development of the Generation 4, a close
cooperation of Apple and Motorola. While this PPC7400 had many minor internal
improvements, it was mainly famous for the introduction of a highly
powerful Vector Unit named AltiVec (nicknamed "Velocity Engine" by Apple).
Initially, Motorola promised clockspeeds of 500MHz but could not deliver.
Naturally, Apple had already advertised a 500MHz PowerMac and now took the
blame for not being able to deliver, severely spoiling business relations to
Motorola. However, not only did Motorola manage to produce 500MHz processors
from February 2000 on, they also managed to revise the initial design in 2001
as PPC7450 (G4+), allowing clockspeeds up to 1.67GHz, increasing the number of
independant integer and vector units and adding support for a L3 cache
while keeping power consumption and heat dissipation low, making the G4
a perfect solution for small computers (MacMini) and notebooks.
Still, the business relations between Motorola and Apple had severely
suffered when Motorola also announced to reorganize the company which also
meant the outsourcing of the semiconductor business as "Freescale", which
clearly showed little interest in the desktop processor market together with
Apple since it meant low sales and low revenue. To actually being capable of
introducing a new high-end PowerMac, Apple returned to IBM. IBM however had
lost track of the PowerPC development and created the so-called "G5", the
PPC970, from IBM's very own POWER4 architecture. While that meant the
benefits of having a dual-core, L3 supporting high-end architecture, it also
meant higher power consumption, bulky chips and increased heat dissipation. As
a result of that, Apple had major problems in using the chip in the new
iMac design and completely failed to implement the chip in notebook computers.
Soon after, IBM gave up the non-embedded PowerPC division anyhow to a company
named AMCC (Applied Micro Circuits Corporation). The management at Apple
decided to discontinue the support for the PowerPC architecture and teamed
up with Intel.
Internals
Before looking at the PowerPC's future, let's have a closer look at its
internals. What is special about the PowerPC, where are its weaknesses,
where are its strong elements and how is it being programmed ?
The main important aspect of the PowerPC is that it's a true and complete
32-Bit RISC system. All datapathes are (at least) 32-Bits wide, the later
incarnations often feature internal 64-Bit data busses, pipelines and
prefetch units. Still, the way the bits are numbered is slightly special:
Bit 0 . . . . . . 7 8 . . . . . .15 16 . . . . . . . .23 24 . . . . . . .31
Val. 31 24 23 16 15 8 7 0
2 2 2 2 2 2 2 2
In other words, the bit "0" represents the most significiant bit or the value
2 to the power of 31 while bit "31" represents the least significant bit or
the value of 2 to the power of 0, meaning 1.
Later revisions even feature 64-Bit registers of which the lower 32 Bits are
compatible to the PowerPC ISA. Due to the bit numbering convention, this
implies that Bits 32 to 63 are being introduced on the least significant
side, making Bits 32 to 63 being the ones manipulated by the PowerPC
instruction set, enabling bits 0 to 31 being available for extensions.
Also, the terminology is fairly special. While PowerPC documentation refers
to an 8-Bit operand as a "Byte", a 16-Bit operand is referred to as a "Half
Word" and a 32-Bit operand is named "Word". The "AltiVec" vector unit then
introduces the "quad-word" for a 128-Bit operand.
However, the PowerPC is commonly neither a little nor a big endian. In fact,
a little logic in its address generation could be used for byte swapping on
aligned operands,allowing the PowerPC to be used in both big- and little endian
environments - The only exception is the G5 processor which is big endian only.
Furthermore, the PowerPC has 32 general purpose registers performing basically
any kind of instruction, arithmetics, logical, shift, multiply and divide
instructions, they also serve as address and index registers.
Besides these general purpose registers, the PowerPC has special instructions
to operate a set of non-memory mapped but numbered special purpose registers,
usually called SPR to serve special purposes.
Among them is, for example, the Link Register (LR) which is being exploited as
a shadow program counter and is used for subroutine/interrupt management, or
the Count Register (CTR), which can be used for 0-cycle loops or as branch
target register and various others, depending on the used CPU core, most of
are "read-only" for application software but also providing User Special
Purpose General Registers (USPRG) which can be used to exchange data between
application and driver software.
Naturally, the PowerPC supports both user- and supervisor modes, much like
the 680x0 series. While the registers above refer to software running in
user mode, the supervisor mode allows to access a lot more SPRs that are,
however, slightly more hardware dependant and are not being explained in
detail here. Naturally, the intention is to run operating systems and
low level drivers requiring hardware access in supervisor mode and to run
application software in user mode. To make sure this separation is reliable,
the PowerPC does not only sport privileged instructions only allowed in
supervisor mode but also a paged memory management unit that allows to
individually configure memory pages for user or supervisor (or both) access,
it also allows setting flags for read only, read-write and execute.
As a matter of fact, the most powerful characteristic of the PowerPC is
rather invisble to the user: The superscalarity. As such a system, the
PowerPC is actually capable of executing more than 1 instruction
simultaneously, namely integer instructions, branch processing, Load/Store
operations, and, if available, Vector instructions and Floating Point
Operations, too. Also, many of these units exist multiply, the MPC7450 for
example has 4 independant integer units. Due to the RISC-based Load/Store
architecture, no PowerPC instruction can operate on two of these units at once.
This allows a program to perform a Load or Store operation while several
integer, vector or floating point units perform calculations while at the same
time a branch is being processed. Along with prefetch units and internal
pipelines, the PowerPC is capable of executing more than 1 instruction
per cycle.
The Good, The Bad & The Ugly
From a programmer's point of view this makes the PowerPC a rather unique
system with strong and weak parts, which are briefly explained below.
PowerPC - The good!
- A pure 32-Bit RISC system
Implies a small number of highly flexible instructions, a fixed instruction
size of 4 bytes that implies optimum usage of pipelines and prefetches. The
fairly abstract instruction set operates equally on all registers,
eliminating all exceptions regarding addressing modes, target registers
etc.,
- Large set of General Purpose Registers
mean a sufficient amount of data storage registers which are all equally
powerful and capable of executing all PowerPC instructions, also, a
standard of how to use these registers to maintain compatibility between
object files or library routines is given by the PowerPC EABI,
- Superscalar architecture
allows the processor to perform several instructions in one cycle,
especially due to the side-effect free pipeline and the option to protect
critical code by forcing in-order execution,
- Implicit L1 cache support
which speeds up repeated bus access tremendously by being multiple-way
associative and completely transparent to the user once it has been set up
correctly.
PowerPC - The bad!
- A pure 32-Bit RISC system
which also means that any instruction, even a simple increment, requires
4 bytes, making code bulkier than on some other systems, also, the reduced
instruction set leads to a large set of "virtual instructions" - meaning
that a special case of a more powerful RISC instruction is assigned its own
mnemonic but has no individual opcode,
- Large Set of General Purpose Registers
means the lack of a system stack pointer. In fact, the PowerPC does not
sport a stack pointer. Instead, it offers "shadow registers" to back up
register content in certain situations which must then be saved by
software, commonly using one general purpose register to emulate a system
stack pointer, thus increasing software overhead for routine duties,
- Superscalar Architecture
makes coding efficiently more difficult. To gain the maximum processing
power out of a PowerPC it is necessary to re-arrange parts of the code to
allow synchronous instruction execution and to minimize pipeline stalls -
Some routines, such as a memcpy for example, is almost impossible to
optimize for a superscalar system since the integer, vector or floating
point unit is basically not needed for this purpose,
- Implicit L1 Cache support
does not necessarily imply memory coherence. Intelligent peripherals with
direct memory access require special treatment in order to guarantee
data coherency.
PowerPC - The ugly!
- A pure 32-Bit RISC System
means that limiting every instruction to 4 Bytes makes usage of absolute
parameters of 32-Bit size impossible. To actually load a 32-Bit absolute
number, the PowerPC requires two separate instructions, increasing
software overhead measurably,
- Large Set of General Purpose Registers
but no system stack pointer introduces software overhead for stack
emulation and creates critical code sections that may in no case be
interrupted, requiring even more software overhead. Also, the PowerPC
does not support a single-instruction context switch (even though there
is a load/store multi-word instruction in later revisions of the PowerPC),
- Superscalar Architecture
finally implies that optimized code becomes totally unreadable and
basically impossible to debug and finally,
- Implicit L1 Cache support
spoils deterministic timing of code execution. While this is normally not
relevant for ordinary applications, certain safety-critical code segments
are required to behave fully deterministic which leads to additional
constraints on the software.
PowerPC vs 680x0
Especially when comparing the PowerPC with the 680x0 architecture - which is by
now being advertised as variable instruction length RISC - the differences are
easy to spot:
- The 680x0 features instruction length of 2 to 10 Bytes, allowing even 2
parameters of 32-Bit width per instruction which - depending on the
optimisation level - requires up to 4 instructions on the PowerPC.
- The 680x0 is not a Load/Store architecture, which is one of its biggest
advantages. The 680x0 is capable of reading and writing memory in a single
instruction, which is impossible on the PowerPC.
- The 680x0 accepts a move-multi instruction, allowing to save all relevant
registers in a single instruction or to load them in a similar manner,
providing explicit support for multitasking environments.
- The 680x0 features a system stack pointer that is being managed internally
on several instructions or events, such as branch to subroutine, return
from subroutine or interrupt execution, keeping software overhead rather
low.
- The 680x0 instruction set is easy to understand and operate with a most
powerful move instruction that replaces many kinds of load, store and
transfer instructions of other processor families.
- While the 680x0 features a separate set of 8 Data and 8 Address Registers,
many operations can be performed on both, also, the 680x0 sports a huge
set of effective addressing modes, of which most apply to all instructions.
- The original 68000 has been a 16-Bit processor capable of executing 32-Bit
operations without software overhead, the later models from 68020 on have
been true 32-Bit processors, of which the later models also sport cache
support, prefetch units and pipelines. In fact, the 68060 is a superscalar
system, too.
While the above list seems to show how much more powerful an 680x0 processor
is with regard to the PowerPC architecture, most of the points listed have
restrictions or bear other constraints the PowerPC covers up for in a certain
way:
- The variable length instruction set of the 680x0 makes the organisation of
pipelines and prefetches difficult, especially the separation into code
and data access. As a result of this, pipelines and prefetch units become
more complex, making the processor either more expensive or less powerful.
The PowerPC's fixed 32-Bit instruction set is a lot easier to manage through
prefetches and pipelines, making the system less complex, more powerful and
usually cheaper.
- It's very similar regarding the 680x0 not being a Load/Store Architecture.
Having single instructions capable of accessing memory twice affects cache,
pipeline and buffering logic, making the system more complex again. The
PowerPC can only perform a single memory access per instruction, again
leading to a less complex access logic.
- While the 680x0 does indeed feature a system stack pointer, many operations
that internally manage the system stack are multi-cycle instructions, for
example JSR and RTS. The PowerPC needs software overhead to actually emulate
a system stack, however, most of these instructions run in a single cycle,
making up for the lack of an internal system stack.
- Also, the 680x0 supports fast context switching by providing move-multi
instructions, still this kind of instruction requires many cycles to
perform and bears other restrictions regarding effective addressing modes.
While the PowerPC does not have this kind of support for fast context
switching, the fact that most instructions execute in a single cycle allows
the PowerPC to compete under most conditions.
- Even the 680x0 has virtual instructions, too. There is, for example, no
"clr" instruction that operates on address registers. Still, many assemblers
accept such a "clr" instruction and convert it to a "suba" internally.
Also, the 680x0 instruction set has restrictions regarding address modes for
certain instructions that the PowerPC has not.
Finally, the 680x0 has indeed gone through an evolution similar to that of the
PowerPC, still, it has always been more complex and therefore more expensive
for 680x0 processors due to the different architecture. In other words, while
an architecture such as the PowerPC favours caches, pipelines and prefetches,
even an internal separation of code and data flow (internal Harvard
architectures), the 680x0 does not as easily. While it is without a doubt possible,
it is more complex in any way.
PowerPC vs x86
Nevetheless, the main competitor of the PowerPC has been the x86 series, not the
680x0 series. So how does a PowerPC perform in comparison to this processor
family ?
First of all, the PowerPC is easier to scale. Having been a RISC system right
from the beginning on, the PowerPC counts a lot less transistors than an x86,
and is easier to optimize with regard to power consumption, heat dissipation
and size. On the other hand, PowerPC does mean more software overhead for
certain services which an x86 handles with less software interaction. X86 has
a CISC-based instruction set and a small amount of registers of which some
are even special purpose, individual operations can be decoupled from the
rest of the system and optimized - Something the PowerPC does not allow.
Also, this feature of the x86 allows more, longer and in some parts customized
pipelines, which the PowerPC can not do. As a side-effect, parts of an x86 can
be "overclocked" with regard to the rest of the CPU, which the PowerPC can't -
The pipelines of x86 are commonly considered "long and thin" while those of the
PowerPC are usually called "short and fat", which is neither an advantage nor a
disadvantage: The x86 pipelines allow higher clock rates but get less work done
per cycle, the PowerPC pipelines are more efficient but cannot be clocked as high.
Naturally, this has its side-effects: X86 processors are more complex, more
expensive, tend to show excessive heat dissipation and power consumption when
operated at maximum workload. Also, most PowerPC instructions finish in a
single cycle while most of the x86 instructions being used by the software
effectively do not due to the major revisions on the x86 instruction
architecture (RISC86). It should be noted nonetheless, that especially the
later PowerPC incarnations such as the PPC7450 suffer from a rather outdated
bus-logic that reduces data throughput in comparison to the more advanced
x86 bus systems - While this is not an inherent malus of the PowerPC
architecture itself, it still affected the performance of the existing PowerPC
systems in comparison to x86 systems.
Still, it hasn't been the technical differences that made x86 conquer the world,
a PowerPC is not - necessarily - less powerful than an x86 processor. In
the end, it has been much rather the market position. Not only do several
producers of x86 processors compete with each other, they are also forced to
expand the market to enlargen the customer basis by attracting users that
would otherwise have bought high-end workstations or servers. Motorola and IBM
were not only not competing with each other, they also only needed to serve one
customer, Apple. Due to the fact that the PowerPC is easy to introduce to
embedded systems because of low power consumption, better scalability and
high reliability with regard to rough conditions, Motorola soon focussed on
the embedded market. IBM first lost interest, sold its PowerPC-department to
the AMCC, now IBM supplies Nintendo, Microsoft and even Sony - The high
interest in PowerPC has lead to the foundation of Power.Org, which gives
other companies the option of taking part in the PowerPC development.
So, while the PowerPC has failed to establish itself in desktop computers, it
runs in many embedded applications without the user even noticing the 32-Bit
superscalar RISC core. And one thing is for certain: Even though the world
will probably not see desktop computers employing PowerPC processors in large
scales anymore, the story of the PowerPC is far from being finished.
The Paranoid of Paradox
for Dead Hackers Society in 2007
References
References used in the WWW:
- Wikipedia PowerPC article (http://en.wikipedia.org/wiki/PowerPC) and related,
various authors
- "A history of chipmaking at IBM", developerWorks, IBM, Dec. 15th, 2005,
http://www-128.ibm.com/developerworks/power/library/pa-powerppl/index.html?ca=drs-
- "A developer's guide to the PowerPC architecture", Brett Olson,
Anthony Marsala, IBM, Mar. 30th 2004,
http://www.ibm.com/developerworks/linux/library/l-powarch/
- PPC overview, Peter Perlso, Oct. 2006,
http://titancity.com/articles/ppc.html
- PowerPC FAQ, Derek Noonburg, Jan 16th, 1997 (discontinued),
http://www.microprocessor.sscc.ru/powerpc-faq/
- Analysis: x86 vs PPC, Nicholas Blachford, OSNews, Jul. 9th, 2003,
http://www.osnews.com/story.php/3997/Analysis-x86-Vs-PPC
References used in written form:
- MPC5554/MPC5554 Reference Manual, Rev. 3.1, Freescale
- MPC5566 Reference Manual, Rev 0.1, Freescale
- e200z6 User's Manual, Rev. 0.1, Freescale
- PowerPC Book E, Third Revision, 2002
- Motorola RISC CPU (RCPU) Reference Manual, Freescale
- MPC7450 Reference Manual, Rev. 5, 2005, Freescale
Tools, Soft- and Hardware used:
- Metrowerks CodeWarrior v1.5 for MPC55xx, Metrowerks 2004
- Freescale CodeWarrior v2.1 for MPC55xx, Freescale 2007
- Lauterbach Trace32 On-Chip Debugger
- Motorola MPC5554 Evaluation Board
|