Main menu
IT Visions
Microprocessor
Architecture
Task scheduler
Instruction set
|
The new Innovatic microprocessor architecture combines the best from the
CISC and RISC world. The basic idea is to make an extremely efficient
processor architecture with an extremely high code density. A processor, which is
useable from the smallest 8-bit smartcard and embedded application to even
very powerful 64-bit PC's. This enables reuse of software and hardware and with that very big savings
in development costs, but no microprocessor family today has such a big span.
The new architecture has the following properties:
- Simple and cheap to produce.
- A simple and easy to learn 8-bit instruction set.
- Stack based architecture. This makes the processor very easy to
utilize 100 % for a compiler so that very efficient code is generated, and
because there is nothing to save and restore during subroutine/procedure
calls, the lack of performance with interrupts and object oriented
programming is not as big as on traditional register based computers.
- A code density even better than the best CISC processors and
typical 2-6 times better than most other RISC processors! This is the
keyword for the new, light weight computer world.
The high code density makes it economical possible to base a part of
the memory system in a PC on fast, but rather expensive static RAMs
(SRAM) as a supplement to the usual cheap, but slow dynamic RAMs (DRAM).
In this way, the most critical parts of the programs and data may be loaded
into SRAM. This increases the speed considerably compared to today's
PC's, because such a human controlled split-up is much more efficient than
the automatic cache mechanisms, which are used on today's PC's to hold a
limited (typical 128-512 kbyte) amount of the last fetched instructions and
data. Besides, it is much simpler and there is no problems in keeping the
memory and the data cache synchronized. A SRAM uses 6 (or 8) transistors
per bit, but a DRAM only one transistor and one capacitor. Therefore, the
die size for an SRAM is approximately 3-4 times bigger than a DRAM and
therefore more expencive, but if the code density is increased
correspondingly so that less memory is needed, it compensates for this!
Besides, SRAM do not need any refresh. This makes it possible to let the
computer "go to sleep" with a very low power consumption and still
maintain some data in the SRAMs. The data in the DRAMs are lost a few
milliseconds after the refresh stops so they must be saved on e.g. a
harddisk before power down.
The real bottle-neck in a modern computer system is the memory. A 1.4
GHz RISC processor needs e.g. all data and instructions within 0.7 nS to
maintain full speed. However, with DRAMs it takes approximately 40-60 nS to
get a word if it is not available in any memory pipeline or in a cache
memory (a cache miss). This is the case regardless of the type of
memory and the bus speed!
The advantage of the modern RAM types like bursted RAM,
synchroneous RAM (SDRAM), dual-data-rate (DDR DRAM), RAMBUS (RDRAM) etc.
is just, that they contains a pipeline system, which makes it is possible
to fetch the following words extremely fast e.g. 8 16-bit words in
10 nS for RDRAM, but this is of course only important if these words are
actually needed! For data access, which is usually quite random, this
is often not the case. However, to try to utilize the pipeline, a PC
always fetches 4-8 words at a time, which courses a 25-75 % overhead, so
with random access over big data areas the real speed may be as low as
10-15 MHz! The fast speed of todays DRAMs like 800 MHz for RDRAM is just a
pseudo speed, which can only be utilized under very special circumstances.
On the other hand, switching to SRAMs gives real power as the access time
is at least 3 times less. The only type of data access, which can really
benefit from a fast DRAM memory architectures, is transfer of big data areas
e.g. by means of DMA (Direct Memory Access).
Regarding the program, it is of course much more likely
that the following words are needed - if the program is not written in
such a way that there are too many jumps and subroutine calls! However, it
is far more important with a high code density than with a high bus speed
and exotic memory types. With e.g. a 64-bit bus and 8-bit instructions, 8
instructions are fetched at a time, and if in average each 1 1/2
instruction uses or store data in the memory, as with the Innovatic
architecture, the overhead for fetching instructions is only 19 %. This is
so little that it may be ignored. Even with infinite fast instruction
fetch the user may not notice the difference. However, many of the modern
RISC processors has a fixed instruction length of 32 bits and because of
the load/store architecture, which makes it necessary to load all data
into the internal registers before they may be used in calculations, they
typical uses 2 instructions for each data access. This means that with a
64 bit bus the overhead for fetching instructions will be approximately
100 %, so unless the computer has an extremely big - and expencive - cache,
which contains the whole program, the lack of performance coursed by the
low code density is almost a factor 2!
- An efficiency and speed as good as traditional 32 bit
RISC processors.
- A digital filter performance close to many dedicated digital
signal processors (DSP). This is especially important for many of
todays embedded applications.
- High efficiency and speed even at low clock frequencies and
without a cache. This makes the architecture perfect for low
cost and power saving applications and for processing of very big data
areas like 3D graphics, high resolution image processing, computer
simulations etc., where the data cannot be contained in a cache.
- Relative addressing over the full address range so that it is
no longer necessary to relocate a program before it can be executed.
This saves the space needed for the relocation table and it saves the relocation time, so that
programs may be activated as fast as they can be fetched from e.g. a flashdisk
or a harddisk!
- Usable with both Von Neumann architecture (common data and program memory), which is used on a PC,
and Harvard architecture (separate data and program memory), which is common in embedded applications.
Some RISC processors uses a very long instruction length of 64 or 128
bits - the so called VLIW (Very Long Instruction Word) processors. The
purpose is to be able to do more operations simultaneously by means of
more parallel execution units where each unit uses its own part of the
instruction word.
The disadvantage is that the instructions need to be grouped together in
a precise order, which fits the processor architecture. This requires an
extremely advanced compiler and makes assembler programming impossible.
Besides, it puts extremely high demands on the memory because a new, long
instruction word should be available on each clock-cycle or else nothing
is gained. In fact, the VLIW architecture is only suited for systems with a
very little amount of code and data, as e.g. digital signal processors
(DSP), where both instructions and data may be contained in an on-chip
level 1 cache.
In practice, the extremely complicated architectures, which
are used in many of today's microprocessors, often only enhances the
performance by a few percent - at the expense of the power consumption,
reliability and price! If the complexity of a processor is e.g. increased
10 times and the chip is heated to 40° C over the ambient temperature,
which is very common for today's PC processors due to the high complexity
and clock speed, the total reliability of the processor is reduced
160 times!
This page is updated March 13th 2006
|