BYTEC’s hardware does not support integer multiplication and division. It was a design choice I made very early in the project in order to keep the hardware simple. Fortunately, both multiplication and division can be implemented in software using simple arithmetic (addition/subtraction and bit shifting). Until now I never got around to implementing it, but finally decided to give it a try and hand code the missing routines in assembly. I could have done the same in C but in this case I wanted to focus on short and fast runtime code.
So how does it all work? Whenever LCC compiles code with multiplication or division, it emits something like:
ld a, -5 ld x, 7 call __muli16
In total, there are six routines for 16-bit signed and unsigned multiplication, division and modulo:
__mulu16 - multiplication; 16-bit unsigned operands; 16-bit unsigned product __diviu16 - division; 16-bit unsigned operands; 16-bit unsigned quotient __modu16 - modulo; 16-bit unsigned operands; 16-bit unsigned reminder __muli16 - multiplication; 16-bit signed operands; 16-bit signed product __divii16 - division; 16-bit signed operands; 16-bit signed quotient __modi16 - modulo; 16-bit signed operands; 16-bit signed reminder
Note that the exact same approach is used for 32-bit long, 64-bit long long, 32-bit float, and 64-bit double, all of which are not directly supported by my hardware. In these cases LCC also emits arguments (or their addresses) and then calls a runtime function. In this case the runtime functions are yet to be implemented, though.
There are multiple ways how to implement multiplication and division in software, ranging from very simple (e.g. multiplication by repeated addition), through slightly more complex (e.g. Booth’s division), to very complex (e.g. SRT). I chose something from the lower range of scale, easy to understand yet of reasonable performance. There are hundreds of course materials with examples on how these algorithms work on the web so I won’t even try to describe them here. Just for the record, I used algorithms as summarized below.
I used a very simple shift and add algorithm, really trivial. One of the descriptions is here. All that needs to be understood is that multiplication can be implemented by repeatedly right shifting the multiplier, examining its least significant bit and adding a multiplicand to the product if the bit is set.
Using the above algorithm for signed operands would produce incorrect results, but there is a simple trick. The idea is to check the sign of the multiplier. If it is negative, both multiplier and multiplicand should be inverted (in sense of finding their two’s complements). Then the unsigned multiplication algorithm (shift and add) can be used directly (as it only needs positive multiplier), and the final result will have a correct sign.
Unsigned division / modulo
Division is more complex only in theory. Practically, it can be implemented using a very simple algorithm known from school, often referred to as pen-and-paper algorithm or long division, only in binary. Wikipedia has a very good description of division algorithms, including a binary version of long division. The same algorithm is used to calculate both division quotient and remainder (modulo).
Signed division / modulo
As you may expect, the idea for signed division and modulo is also to play with signs. In this case both operands are converted to positive values, and the above long division is used directly. Then, quotient and reminder are inverted (again, two’s complement) depending on what were the signs in the operands. The convention is that quotient is negative if signs of operands do not match, and remainder is negative when multiplicand is negative.
With the above in place, here is how happily BYTEC/16 does the calculations:
The above routines (as well as future implementation for longs and floats) will become part of the binutils package and will reside in crtx.as files (C runtime) which should be assembled and linked to programs which need to use them. Multiplication and division will take the crt1.as file, longs will be in crt2.as and floats in crt3.as, so that they (or their .o objects to be more precise) can be linked separately, to save code space. File crt0.as is reserved for lowest level runtime which sets stack and data pointers and calls main() and is always required, unless you are writing and compiling an operating system, or a boot record, or alike.
While implementing this new binary arithmetic I have come across and fixed numerous bugs in the linker and in my LCC machine description. Since I still have a list of things to so, I am not going to publish the full software package today and release it only when I am happier with its shape. However, if you are really curious how the BYTEC/16 assembly looks for shift and add or long division, here is the full source code of current crt1.as.
; crt1 ; integer arithmetic .extern _main .global __mulu16 .global __divu16 .global __modu16 .global __muli16 .global __divi16 .global __modi16 .text ; implement integer arithmetics ; 16-bit unsigned * unsigned - shift and add __mulu16: ld y, 0 ; product __mulu16_1: push a and a, 1 jz __mulu16_2 add y, x __mulu16_2: mov a, x shl a mov x, a pop a shr a jnz __mulu16_1 mov a, y ret ; 16-bit unsigned / unsigned - long division __divu16: push a ; N (sp:8) push x ; D (sp:6) ld a, 0 push a ; Q (sp:4) push a ; R (sp:2) ld a, 0b1000000000000000 push a ; mask (sp:0) ld x, 16 __divu16_1: ld a, (sp:2) shl a mov y, a ld a, (sp:8) shl a st (sp:8), a mov a, y ld y, 0 adc a, y cmp a, (sp:6) jlu __divu16_2 sub a, (sp:6) st (sp:2), a ld a, (sp:0) or a, (sp:4) st (sp:4), a jmp __divu16_3 __divu16_2: st (sp:2), a __divu16_3: ld a, (sp:0) shr a st (sp:0), a sub x, 1 jnz __divu16_1 ld a, (sp:4) add sp, 10 ret ; 16-bit unsigned / unsigned - long division ; simplified for remainder, quotiend discarded __modu16: push a ; N (sp:4) push x ; D (sp:2) ld a, 0 push a ; R (sp:0) ld x, 16 __modu16_1: ld a, (sp:0) shl a mov y, a ld a, (sp:4) shl a st (sp:4), a mov a, y ld y, 0 adc a, y cmp a, (sp:2) jlu __modu16_2 sub a, (sp:2) __modu16_2: st (sp:0), a sub x, 1 jnz __modu16_1 add sp, 6 ret ; 16-bit signed * signed - check sign, shift and add __muli16: ; check if multiplier is negative, if yes ; negate both numbers and do unsigned mul cmp a, 0 jg __muli16_1 ld y, -1 xor a, y add a, 1 push a mov a, x xor a, y add a, 1 mov x, a pop a __muli16_1: call __mulu16 ret __divi16: ; convert to positive, divide as signed numbers ; fix quotient sign afterwards push a push a xor a, x and a, 0b10000000 st (sp:2), a ; remember sign pop a ld y, -1 cmp a, 0 jg __divi16_1 xor a, y add a, 1 __divi16_1: push a mov a, x cmp a, 0 jg __divi16_2 xor a, y add a, 1 mov x, a __divi16_2: pop a call __divu16 push a ld a, (sp:2) cmp a, 0 pop a je __divi16_3 ld y, -1 xor a, y add a, 1 __divi16_3: pop x ; discard sign ret __modi16: ; convert to positive, divide as signed numbers ; fix reminder sign afterwards push a ; remember dividend sign ld y, -1 cmp a, 0 jg __modi16_1 xor a, y add a, 1 __modi16_1: push a mov a, x cmp a, 0 jg __modi16_2 xor a, y add a, 1 mov x, a __modi16_2: pop a call __modu16 push a ld a, (sp:2) cmp a, 0 pop a jg __modi16_3 ld y, -1 xor a, y add a, 1 __modi16_3: pop x ; discard dividend sign ret