|
Main Menu IT Visions Microprocessor Instruction set Architecture Task scheduler |
Instruction Map Below a rough map of the instruction set is shown. It should be noted that it is not a fully worked through instruction set, but just a first draft, which shows that it is possible to implement the suggested architecture and make an extremely efficient 8-bit instruction set.
Both displacements (Disp8 and Disp24) are signed integers. If Disp8 = 0 in a conditional branch, the content of the accumulator is used as displacement instead. This is utilized in e.g. Case statements, where the jump distination is determined by a value. The address mode Rn is a two byte instruction where the number of the register is specified in the following byte. Immediate data is at least a two byte instruction. Exept for pixel operations, all arithmetric operations are signed. If an overflow should occur the result is set to the maximum negative or positive number (right sign), and the overflow flag is set. The most common operations with two operants (accumulator and one more)
may be performed directly, that is, besides from the lowest stack level
(B) they may also get the data directly from the memory, from a
register (Rn) or as immediate (#) data. The other operations with two
operants can only be performed with the accumulator (A) and the lowest
stack level (B) as the two operands. Instructions, which involves B or a
register Rn, are always carried out in full width.
The Load opereration loads data into the accumulator. Usually all data are regarded as signed values exept for bytes, which are regarded as unsigned. 8 bit data are usually used for pixel data, short counters, ASCII letters etc. All these types are usually unsigned. However, it is rather seldom that 16, 32 or 64 bit data are unsigned. There data has such a big range that there is not much gained by also using unsigned data. As all arithmetric operations are signed, bytes are loaded into bit 1-8 and in this way converted to signed. The opposite shift is done at Store. All data are loaded left shifted. With Load operations narrower than the width of the processor the least significant bits are set to zero. With more Load operations without any Store or Push instruction in between, the stack is automatically pushed up. This is utilized when parenthesis occurs in the expression. (R0+R1)*(R2+R3)+((R4+R5)*4000H)) is translated to: Load (R0) Mul (R2) Load (R4) Mul 4000H AddNote that it is only necessary to specify the displacement register - the base register is implied. It is also not necessary to specify a displacement of zero. When an instruction has no parameters like the Add above, the operation is carried out in full width with the accumulator (A) and perhaps the lowest stacklevel (B) as the operands. If the operation involves B as with the Add above, the stack is simultaneously pulled down. This overwrites the old value in B except if the stack size before the operation is only 1. In this case, the value is retained instead of overwriting it with an undefined value. This feature makes it possible to use a stack with only one value as a constant. Of course the instruction Store # corresponding to Load # has no
meaning. Instead this instruction code is used for Store (#), which saves
the contents of the accumulator in the specified absolute address.
LoadFloat corresponds to Load, but tells the ALU that all subsequent
operations until next Load should be carried out in floating point format.
When loading 8 or 16 bit data they are automatically converted to 32-bit
floating point format. This is done in the way that the data are shifted
left until they are in the range from +/-0.5 to +/-1. After that the
exponent is added. The exponent is calculated from the number of
necessary left shifts.
For the moment, two different types of vectors are defined - stereo and pixels. These types are set by the Stereo and Pixel instruction. In the future many more vector types may be defined e.g. vector types, which uses floating point. Stereo In stereo mode, LoadVector corresponds to Load, but the ALU is split up in two parts. In this way, the ALU is able to process the two channels at a time. All data, which are loaded, are split up in two, and the two parts are loaded left shifted in the two parts of the ALU. If e.g. a 32 bit stereo signal is loaded on a 64 bit processor bit 0-15 is loaded in bit 0-15 of the accumulator, but bit 16-31 is loaded in bit 32-47. All other bits are cleared. At Store the reverse process happens. Pixel In pixel mode LoadVector corresponds to Load, but split-up the ALU in 3 or 4 separate parts - 3 on 16-bit processors and 4 on 32 or 64 bit processors. In this way, the ALU is able to process an entire pixel at a time. All pixel operations are unsigned. According to the data size the load is carried out in the following way:
Mul#Add The instruction Mul#Add is special in the way that it uses data from the lowest stacklevel, the memory or a register together with immediate data. The addressed data is multiplied with the immediate data and the result is then added to the accumulator. Unlike the usual Mul instruction, Mul#Add uses the same multiplier/filter-quotient (immediate data) in all multiplications in case of vector processing. The Mul#Add instruction is primary intended to be used together with -(Rn) and (Rn)+ addressing in digital filters. If you e.g. wants to make a 96 stage FIR (Finite Impulse Response) filter for standard 16 bit 44.1 kHz stereo applications, the length of the displacement part of the pointer to the cyclic buffer is first programmed (in bit 0-4) to the nearest longer or equal filter length. In this case, it is 128, so that the cyclic buffer wraps around after this count. The interrupt routine, which is activated 44100 times per second, may look like this on a 64 bit processor:
Stereo ; Set vector type to stereo.
Medium# ; Set length of filter quotients and
; I/O addresses to 24 bit.
Load Rm ; Get pointer to cyclic buffer.
PushR ; R1 = base address.
Load Rn ; R0 = displacement/offset.
PushR
Load #InputAddr ; Load address of memory mapped ADC
; input. The data size is programmed
; to 32 bits (two 16 bit values) in
; bit 0 and 1 (#00x...xB).
Load #OutputAddr ; Load address of memory mapped DAC
; output. The data size is programmed
; to 32 bits.
Load (B) ; Load the new 32-bit input value from
; the two 16 bit ADC's. Note that
; before the operation B
; contains the input address.
; After the operation B contains
; the output address and A
; contains the input value.
Store -(R0) ; Point to the next 32-bit entry in the
; cyclic buffer and save the new value
; in this.
LoadVector zero ; Split the ALU until next load and
; clear A by loading a register
; (LoadVector Rx) with the fixed
; content 0.
(NOp) ; 0-3 NOp operations to adjust for
; 32-bit boundary (optional)
; (Mul#Add + 24-bit #data = 32 bit).
Mul#Add (R0)+, #FilterQuotient01
Mul#Add (R0)+, #FilterQuotient02
.
.
Mul#Add (R0)+, #FilterQuotient96
Add Round16 ; Round the result by adding a register
; with the fixed content 00800080H
; (less bytes than Add #00800080H).
Store (B) ; Readout the two left shifted 16 bit
; parts of the two 32 bit results on
; the two DAC's. Note that the data
; type is still vector and B is
; programmed to 32 bits.
PopR ; Make correction for the difference in
Add (128-96) ; length. Note that because the length
Store Rn ; of R0 is programmed to 128 any
; overflow in the Add operation will
; be truncated the next time the
; value is pushed on the register
; stack so that also this
; operation wraps around.
; I does not matter that the
; data type is still vector.
; The operation just takes
; place in bit 31-47.
PopD ; Clean-up the stacks.
PopD
PopR
RTI ; Return and restore SR
Flags and conditional branches
The processor contains the following 6 flags:
C = Carry. X = Extention is used in split arithmetric to transfer Carry from a previous operation. The X-bit is always included in all arithmetical operations, but the bit is usually reset. The instruction SetX sets X = C. The X-bit is automatically cleared after all arithmetical operations. V = 2's complement overflow. N = Negative. This bit is a copy of bit 0 (MBb). Z = Zero. Set if the accumulator = 0. D = Data Type. This bit is read-in together with the data from the memory or periferal units. The bit is used to distinguish between e.g. addresses and data. This is very practical for e.g. communication purpose. Exept for the X-bit, the flags are used in conditional branches. The following 16 conditional branch types exist:
The instructions DBZ (Decrement Branch on Zero) and DBNZ (Decrement
Branch on Non Zero) decrement the lowest level of the general purpose
stack and then test the result before the conditional branch.
Instructions, which only involves the accumulator (A) All these instructions are carried out in full width.
Miscellaneous Instructions
Second Instructions Second instructions are instructions, which are decoded by means of another instruction map. Because the instructions are prefaced with a Sec code, they are all at least 2 byte instructions. The function is used to extent the instruction set with less common instructions and special instructions for e.g. graphics, data encryption, data compression etc.
|
||||||||||||||||||||||||||