Multiplication & Division

BYTEC’s hardware does not support integer multiplication and division. It was a design choice I made very early in the project in order to keep the hardware simple. Fortunately, both multiplication and division can be implemented in software using simple arithmetic (addition/subtraction and bit shifting). Until now I never got around to implementing it, but finally decided to give it a try and hand code the missing routines in assembly. I could have done the same in C but in this case I wanted to focus on short and fast runtime code.

So how does it all work? Whenever LCC compiles code with multiplication or division, it emits something like:

ld a, -5
ld x, 7
call __muli16

In total, there are six routines for 16-bit signed and unsigned multiplication, division and modulo:

__mulu16 - multiplication; 16-bit unsigned operands; 16-bit unsigned product
__diviu16 - division; 16-bit unsigned operands; 16-bit unsigned quotient
__modu16 - modulo; 16-bit unsigned operands; 16-bit unsigned reminder
__muli16 - multiplication; 16-bit signed operands; 16-bit signed product
__divii16 - division; 16-bit signed operands; 16-bit signed quotient
__modi16 - modulo; 16-bit signed operands; 16-bit signed reminder

Note that the exact same approach is used for 32-bit long, 64-bit long long, 32-bit float, and 64-bit double, all of which are not directly supported by my hardware. In these cases LCC also emits arguments (or their addresses) and then calls a runtime function. In this case the runtime functions are yet to be implemented, though.

There are multiple ways how to implement multiplication and division in software, ranging from very simple (e.g. multiplication by repeated addition), through slightly more complex (e.g. Booth’s division), to very complex (e.g. SRT). I chose something from the lower range of scale, easy to understand yet of reasonable performance. There are hundreds of course materials with examples on how these algorithms work on the web so I won’t even try to describe them here. Just for the record, I used algorithms as summarized below.

Unsigned multiplication

I used a very simple shift and add algorithm, really trivial. One of the descriptions is here. All that needs to be understood is that multiplication can be implemented by repeatedly right shifting the multiplier, examining its least significant bit and adding a multiplicand to the product if the bit is set.

Signed multiplication

Using the above algorithm for signed operands would produce incorrect results, but there is a simple trick. The idea is to check the sign of the multiplier. If it is negative, both multiplier and multiplicand should be inverted (in sense of finding their two’s complements). Then the unsigned multiplication algorithm (shift and add) can be used directly (as it only needs positive multiplier), and the final result will have a correct sign.

Unsigned division / modulo

Division is more complex only in theory. Practically, it can be implemented using a very simple algorithm known from school, often referred to as pen-and-paper algorithm or long division, only in binary. Wikipedia has a very good description of division algorithms, including a binary version of long division. The same algorithm is used to calculate both division quotient and remainder (modulo).

Signed division / modulo

As you may expect, the idea for signed division and modulo is also to play with signs. In this case both operands are converted to positive values, and the above long division is used directly. Then, quotient and reminder are inverted (again, two’s complement) depending on what were the signs in the operands. The convention is that quotient is negative if signs of operands do not match, and remainder is negative when multiplicand is negative.

With the above in place, here is how happily BYTEC/16 does the calculations:

muldiv

The above routines (as well as future implementation for longs and floats) will become part of the binutils package and will reside in crtx.as files (C runtime) which should be assembled and linked to programs which need to use them. Multiplication and division will take the crt1.as file, longs will be in crt2.as and floats in crt3.as, so that they (or their .o objects to be more precise) can be linked separately, to save code space. File crt0.as is reserved for lowest level runtime which sets stack and data pointers and calls main() and is always required, unless you are writing and compiling an operating system, or a boot record, or alike.

While implementing this new binary arithmetic I have come across and fixed numerous bugs in the linker and in my LCC machine description. Since I still have a list of things to so, I am not going to publish the full software package today and release it only when I am happier with its shape. However, if you are really curious how the BYTEC/16 assembly looks for shift and add or long division, here is the full source code of current crt1.as.

; crt1
; integer arithmetic

    .extern _main

    .global __mulu16
    .global __divu16
    .global __modu16
    .global __muli16
    .global __divi16
    .global __modi16

	.text

; implement integer arithmetics

; 16-bit unsigned * unsigned - shift and add
__mulu16:
    ld y, 0     ; product
__mulu16_1:
    push a
    and a, 1
    jz __mulu16_2
    add y, x
__mulu16_2:
    mov a, x
    shl a
    mov x, a
    pop a
    shr a
    jnz __mulu16_1
    mov a, y
    ret

; 16-bit unsigned / unsigned - long division
__divu16:
    push a      ; N (sp:8)
    push x      ; D (sp:6)
    ld a, 0
    push a      ; Q (sp:4)
    push a      ; R (sp:2)
    ld a, 0b1000000000000000
    push a      ; mask (sp:0)
    ld x, 16
__divu16_1:
    ld a, (sp:2)
    shl a
    mov y, a
    ld a, (sp:8)
    shl a
    st (sp:8), a
    mov a, y
    ld y, 0
    adc a, y
    cmp a, (sp:6)
    jlu __divu16_2
    sub a, (sp:6)
    st (sp:2), a
    ld a, (sp:0)
    or a, (sp:4)
    st (sp:4), a
    jmp __divu16_3
__divu16_2:
    st (sp:2), a
__divu16_3:
    ld a, (sp:0)
    shr a
    st (sp:0), a
    sub x, 1
    jnz __divu16_1
    ld a, (sp:4)
    add sp, 10
    ret

; 16-bit unsigned / unsigned - long division
; simplified for remainder, quotiend discarded
__modu16:
    push a      ; N (sp:4)
    push x      ; D (sp:2)
    ld a, 0
    push a      ; R (sp:0)
    ld x, 16
__modu16_1:
    ld a, (sp:0)
    shl a
    mov y, a
    ld a, (sp:4)
    shl a
    st (sp:4), a
    mov a, y
    ld y, 0
    adc a, y
    cmp a, (sp:2)
    jlu __modu16_2
    sub a, (sp:2)
__modu16_2:
    st (sp:0), a
    sub x, 1
    jnz __modu16_1
    add sp, 6
    ret

; 16-bit signed * signed - check sign, shift and add
__muli16:
    ; check if multiplier is negative, if yes
    ; negate both numbers and do unsigned mul
    cmp a, 0
    jg __muli16_1
    ld y, -1
    xor a, y
    add a, 1
    push a
    mov a, x
    xor a, y
    add a, 1
    mov x, a
    pop a
__muli16_1:
    call __mulu16
    ret

__divi16:
    ; convert to positive, divide as signed numbers
    ; fix quotient sign afterwards
    push a
    push a
    xor a, x
    and a, 0b10000000
    st (sp:2), a ; remember sign
    pop a
    ld y, -1
    cmp a, 0
    jg __divi16_1
    xor a, y
    add a, 1
__divi16_1:
    push a
    mov a, x
    cmp a, 0
    jg __divi16_2
    xor a, y
    add a, 1
    mov x, a
__divi16_2:
    pop a
    call __divu16
    push a
    ld a, (sp:2)
    cmp a, 0
    pop a
    je __divi16_3
    ld y, -1
    xor a, y
    add a, 1
__divi16_3:
    pop x ; discard sign
    ret

__modi16:
    ; convert to positive, divide as signed numbers
    ; fix reminder sign afterwards
    push a ; remember dividend sign
    ld y, -1
    cmp a, 0
    jg __modi16_1
    xor a, y
    add a, 1
__modi16_1:
    push a
    mov a, x
    cmp a, 0
    jg __modi16_2
    xor a, y
    add a, 1
    mov x, a
__modi16_2:
    pop a
    call __modu16
    push a
    ld a, (sp:2)
    cmp a, 0
    pop a
    jg __modi16_3
    ld y, -1
    xor a, y
    add a, 1
__modi16_3:
    pop x ; discard dividend sign
    ret

Leave a Reply

  

  

  

Time limit is exhausted. Please reload the CAPTCHA.