Main Menu
IT Visions
Microprocessor
Instruction set
Architecture
Task scheduler

Instruction Map

Below a rough map of the instruction set is shown. It should be noted that it is not a fully worked through instruction set, but just a first draft, which shows that it is possible to implement the suggested architecture and make an extremely efficient 8-bit instruction set.

Conditional Branch (PC + Disp8), 16 types
Branch (PC + Disp24)
6   Call (PC/R1/R3/R5/R7/A + Disp24)
 
25   8-bit instructions
 
Store




 16 address modes: 

  4   (Rb) + (Rd) + Disp8  
4   (Rb) + -(Rd)
4   (Rb) + (Rd)+
(B)
B
Rn, n=0-255
# data, (PC)+
Load
LoadFloat
LoadVector
Div (divide)
Mul (multiply)
  Mul#Add (multiply and add)  
Add
Sub (subtract)
Cmp (Compare)
And (logical And)
Or (logical Or)
XOr (logical exclusive Or)

Both displacements (Disp8 and Disp24) are signed integers. If Disp8 = 0 in a conditional branch, the content of the accumulator is used as displacement instead. This is utilized in e.g. Case statements, where the jump distination is determined by a value.

The address mode Rn is a two byte instruction where the number of the register is specified in the following byte. Immediate data is at least a two byte instruction.

Exept for pixel operations, all arithmetric operations are signed. If an overflow should occur the result is set to the maximum negative or positive number (right sign), and the overflow flag is set.

The most common operations with two operants (accumulator and one more) may be performed directly, that is, besides from the lowest stack level (B) they may also get the data directly from the memory, from a register (Rn) or as immediate (#) data. The other operations with two operants can only be performed with the accumulator (A) and the lowest stack level (B) as the two operands. Instructions, which involves B or a register Rn, are always carried out in full width.


Load

The Load opereration loads data into the accumulator. Usually all data are regarded as signed values exept for bytes, which are regarded as unsigned. 8 bit data are usually used for pixel data, short counters, ASCII letters etc. All these types are usually unsigned. However, it is rather seldom that 16, 32 or 64 bit data are unsigned. There data has such a big range that there is not much gained by also using unsigned data. As all arithmetric operations are signed, bytes are loaded into bit 1-8 and in this way converted to signed. The opposite shift is done at Store.

All data are loaded left shifted. With Load operations narrower than the width of the processor the least significant bits are set to zero. With more Load operations without any Store or Push instruction in between, the stack is automatically pushed up. This is utilized when parenthesis occurs in the expression.

(R0+R1)*(R2+R3)+((R4+R5)*4000H)) is translated to:

   Load (R0)
   Mul (R2)
   Load (R4)
   Mul 4000H
   Add
   
Note that it is only necessary to specify the displacement register - the base register is implied. It is also not necessary to specify a displacement of zero.

When an instruction has no parameters like the Add above, the operation is carried out in full width with the accumulator (A) and perhaps the lowest stacklevel (B) as the operands. If the operation involves B as with the Add above, the stack is simultaneously pulled down. This overwrites the old value in B except if the stack size before the operation is only 1. In this case, the value is retained instead of overwriting it with an undefined value. This feature makes it possible to use a stack with only one value as a constant.

Of course the instruction Store # corresponding to Load # has no meaning. Instead this instruction code is used for Store (#), which saves the contents of the accumulator in the specified absolute address.


LoadFloat

LoadFloat corresponds to Load, but tells the ALU that all subsequent operations until next Load should be carried out in floating point format. When loading 8 or 16 bit data they are automatically converted to 32-bit floating point format. This is done in the way that the data are shifted left until they are in the range from +/-0.5 to +/-1. After that the exponent is added. The exponent is calculated from the number of necessary left shifts.


LoadVector

For the moment, two different types of vectors are defined - stereo and pixels. These types are set by the Stereo and Pixel instruction. In the future many more vector types may be defined e.g. vector types, which uses floating point.

Stereo

In stereo mode, LoadVector corresponds to Load, but the ALU is split up in two parts. In this way, the ALU is able to process the two channels at a time. All data, which are loaded, are split up in two, and the two parts are loaded left shifted in the two parts of the ALU. If e.g. a 32 bit stereo signal is loaded on a 64 bit processor bit 0-15 is loaded in bit 0-15 of the accumulator, but bit 16-31 is loaded in bit 32-47. All other bits are cleared.

At Store the reverse process happens.

Pixel

In pixel mode LoadVector corresponds to Load, but split-up the ALU in 3 or 4 separate parts - 3 on 16-bit processors and 4 on 32 or 64 bit processors. In this way, the ALU is able to process an entire pixel at a time. All pixel operations are unsigned. According to the data size the load is carried out in the following way:

  • A byte is loaded into bit 0-7. The other bits are not changed. This may be utilized to load single byte pixels in the Netscape color palette (6 levels for RGB - 00h, 33h, 66h, 99h, CCh, FFh - plus 40 standard colors). Because the colors are fixed, standard colors they must not be changed, so no expansion is performed. The function may also be used to change the Alpha channel of an 32 bit AlphaRGB pixel. The alpha channel may e.g. be used for transparancy, color key compare etc.

    Under the name LoadByte the instruction code is also used to load ASCII letters etc., which should not be converted to signed numbers.

  • 16 bit words are regarded as "Strong Colors", that is, 5 bit red, 6 bit green and 5 bit blue - XGA 5-6-5 mode. On a 16-bit processor they are loaded directly, and the ALU is split-up in 3 parts corresponding to the 3 colors. On a 32-bit processor they are expanded to "True Colors" (4 x 8 bit AlphaRGB - 8-8-8-8 mode) so that bit 0-4 (red) are loaded in bit 8-12, bit 5-10 (green) in bit 16-21 and bit 11-15 (blue) in bit 24-28. On a 64-bit processor the ALU is split-up in 4 16-bit parts, and the 3 colors are loaded left shifted in the 3 least significant words.

    At Store, data are converted back. On a 32 or 64 bit processor it may be desirable to round the data first. On a 32 bit processor this may e.g. be done by adding a register, which contains the konstant 00040204H (Add Rn). As the ALU is split-up in 4 separate units, carry is not taken out of bit 8, 16 and 24, but if an overflow should occur in a calculation, the result is set to the maximum value as usual, that is, to 1...1.

  • 32-bit long words are regarded as "True Colors" and loaded directly on a 32-bit processor. On a 64-bit processor they are loaded left shifted in the 4 parts and in this way expanded to 64 bit.

  • 64-bit very long word data are regarded as 4 x 16 bit AlphaRGB and can only be loaded on a 64-bit processor.

Mul#Add

The instruction Mul#Add is special in the way that it uses data from the lowest stacklevel, the memory or a register together with immediate data. The addressed data is multiplied with the immediate data and the result is then added to the accumulator. Unlike the usual Mul instruction, Mul#Add uses the same multiplier/filter-quotient (immediate data) in all multiplications in case of vector processing. The Mul#Add instruction is primary intended to be used together with -(Rn) and (Rn)+ addressing in digital filters. If you e.g. wants to make a 96 stage FIR (Finite Impulse Response) filter for standard 16 bit 44.1 kHz stereo applications, the length of the displacement part of the pointer to the cyclic buffer is first programmed (in bit 0-4) to the nearest longer or equal filter length. In this case, it is 128, so that the cyclic buffer wraps around after this count. The interrupt routine, which is activated 44100 times per second, may look like this on a 64 bit processor:

   Stereo           ; Set vector type to stereo.
   Medium#          ; Set length of filter quotients and
                    ;  I/O addresses to 24 bit.
   Load Rm          ; Get pointer to cyclic buffer.
   PushR            ;  R1 = base address.
   Load Rn          ;   R0 = displacement/offset.
   PushR
   Load #InputAddr  ; Load address of memory mapped ADC
                    ;  input. The data size is programmed
                    ;   to 32 bits (two 16 bit values) in
                    ;    bit 0 and 1 (#00x...xB).
   Load #OutputAddr ; Load address of memory mapped DAC
                    ;  output. The data size is programmed
                    ;   to 32 bits.
   Load (B)         ; Load the new 32-bit input value from
                    ;  the two 16 bit ADC's. Note that
                    ;   before the operation B
                    ;    contains the input address.
                    ;     After the operation B contains
                    ;      the output address and A
                    ;       contains the input value.
   Store -(R0)      ; Point to the next 32-bit entry in the
                    ;  cyclic buffer and save the new value
                    ;   in this.
   LoadVector zero  ; Split the ALU until next load and
                    ;  clear A by loading a register
                    ;   (LoadVector Rx) with the fixed
                    ;    content 0.
   (NOp)            ; 0-3 NOp operations to adjust for
                    ;  32-bit boundary (optional)
                    ;   (Mul#Add + 24-bit #data = 32 bit).
   Mul#Add (R0)+, #FilterQuotient01
   Mul#Add (R0)+, #FilterQuotient02
   .
   .
   Mul#Add (R0)+, #FilterQuotient96
   Add Round16      ; Round the result by adding a register
                    ;  with the fixed content 00800080H
                    ;  (less bytes than Add #00800080H).
   Store (B)        ; Readout the two left shifted 16 bit
                    ;  parts of the two 32 bit results on
                    ;   the two DAC's. Note that the data
                    ;    type is still vector and B is
                    ;     programmed to 32 bits.
   PopR             ; Make correction for the difference in
   Add (128-96)     ;  length. Note that because the length
   Store Rn         ;   of R0 is programmed to 128 any
                    ;    overflow in the Add operation will
                    ;     be truncated the next time the
                    ;      value is pushed on the register
                    ;       stack so that also this
                    ;        operation wraps around.
                    ;         I does not matter that the
                    ;          data type is still vector.
                    ;           The operation just takes
                    ;            place in bit 31-47.
   PopD             ; Clean-up the stacks.
   PopD
   PopR
   RTI              ; Return and restore SR
   
Flags and conditional branches

The processor contains the following 6 flags:

C X V N Z D

C = Carry.

X = Extention is used in split arithmetric to transfer Carry from a previous operation. The X-bit is always included in all arithmetical operations, but the bit is usually reset. The instruction SetX sets X = C. The X-bit is automatically cleared after all arithmetical operations.

V = 2's complement overflow.

N = Negative. This bit is a copy of bit 0 (MBb).

Z = Zero. Set if the accumulator = 0.

D = Data Type. This bit is read-in together with the data from the memory or periferal units. The bit is used to distinguish between e.g. addresses and data. This is very practical for e.g. communication purpose.

Exept for the X-bit, the flags are used in conditional branches. The following 16 conditional branch types exist:

  • DBZ, DBNZ
  • B (Branch Always), BO (V = 1)
  • BNZ (Z = 0), BZ/BE (Z = 1)
  • BD (D = 0), BA (D = 1)
  • BP (N = 0), BN (N = 1)
  • BNC (C = 0), BC (C = 1)
  • BGE - Greater-than-or-Equal, BL - Less-than, (uses N and V)
  • BG - Greater-than, BLE - Less-than-or-Equal, (uses N, V and Z)

The instructions DBZ (Decrement Branch on Zero) and DBNZ (Decrement Branch on Non Zero) decrement the lowest level of the general purpose stack and then test the result before the conditional branch.


Remaining instructions, which involves the lowest data stack level (B)

  • PushD. Pushes A on the data stack. A is not changed. The instruction code is actually just another name for the Store B instruction.
  • PopD. Sets A = B and then pulls the data stack down. The instruction code is actually just another name for the Load B instruction.
  • Exch - Exchange. Exchanges A and B.

Instructions, which only involves the accumulator (A)

All these instructions are carried out in full width.

  • SL - Shift Left one bit. C = MSb. LSb = 0.
  • SR - Shift logical Right one bit. C = LSb. MSb = 0.
  • SAR - Shift Arithmetic Right one bit. C = LSb. Bit 0 and 1 = Bit 0.
  • Not - Logical Not. Inverts all bits.
  • Com - 2's Complement. Shifts the sign.
  • Abs - Absolute value.
  • PushS. Pushes A on the general purpose stack.
  • PushR. Pushes A on the address register stack.
  • PopS. Pops a value of the general purpose stack and saves it in A.
  • PopR. Pops a value of the address register stack and saves it in A.
Traditional operations like e.g. increment, decrement, clear etc. are carried out by adding or loading registers with a fixed content.


Miscellaneous Instructions
  • Branch (PC + Disp24).
  • Branch (A). Jump to the absolute address specified in A. The instruction is actually just another name for the Store PC instruction and therefore a two byte instruction with the register number of the PC as the second byte.
  • Call (PC/R1/R3/R5/R7/A + Disp24). Call of subroutine (relative).
  • Ret - Return from Subroutine.
  • RetI - Return from Interrupt.
  • SetX - Set X-bit equal to Carry.
  • DI - Disable Interrupt. Increments the interrupt disable counter.
  • EI - Enable Interrupt. Decrements the interrupt disable counter. The interrupt is enabled when the counter reaches zero.
  • HoldF - Hold Flags. Hold the flags until the next conditional branch.
  • Byte# - Byte immediate. Sets immediate data to byte size.
  • Word#. Sets immediate data to 16-bit words.
  • Medium#. Sets immediate data to 24-bit words.
  • Long#. Sets immediate data to 32-bit long words.
  • BrkPt - Break Point.
  • NOp - No Operation.
  • Sec - Second. Tells the instruction decoder that the next byte should be interpreted by means of another instruction map.

Second Instructions

Second instructions are instructions, which are decoded by means of another instruction map. Because the instructions are prefaced with a Sec code, they are all at least 2 byte instructions. The function is used to extent the instruction set with less common instructions and special instructions for e.g. graphics, data encryption, data compression etc.

  • Stereo. Two byte instruction, which programs the vector type to two normal left shifted numbers.
  • Pixel. Two byte instruction, which programs the vector type to 3-4 unsigned numbers.
  • Slow - Slow down. 3 byte instruction. To save power the processor speed is reduced with the factor in the following byte (immediate data) until next interrupt. The factor is rounded to a 2**N factor within the possible range of the clock divider.
  • TwistByte. 2 byte instruction, which twist bit 0-7 in such a way that bit 0 is saved in bit 7, bit 1 is saved in bit 6 and so on. This function may be used for communication purpose where the least significant bits are transmitted first.
  • TwistWord. As TwistByte but works on a 16 bit word (bit 0-15).
  • TwistLong. Twists a 32 bit word (bit 0-31).
  • TwistVLong. Twists a 64 bit word (bit 0-63).