ANSI C compiler for BYTEC/16

It’s been over six months since I last posted but I haven’t dropped out of the project completely. I have been slowly and quietly working on an initial revision of LCC port in short weekend sessions (not more than 2 hours each week). The reason for losing momentum a bit is my daughter Viktoria’s birth on June 11. Needless to say that spending time with her is even more exciting than building a custom CPU.

Nevertheless, I managed to achieve another project milestone, too. Even though not thoroughly tested, my computer finally has a full ANSI C compiler. There is still no linker, no standard C library, and many run-time routines are missing (like 32/64-bit arithmetic and binary, 16-bit mul/div ops) but these should be now fun to develop. It opens an exciting range of possibilities for writing more interesting demos than just “Hello World”.

In the meantime I also picked a name for the machine. The “DIY Computer” will be called BYTEC/16 from now on. I decided it finally deserves its own identity – DIY was simply too generic, and was never meant to be a proper name for the computer. BYTEC is a name which brings back good memories from the 80s and 90s when I used to read a popular Polish computer magazine of the same name (“Bajtek”, say: bytec), and its ATARI section in particular.

The compiler

The C compiler for BYTEC/16 is based on LCC 4.2 retargetable compiler. LCC is a cross-compiler and runs on a PC. I prepared a machine description (.md file) for the back-end and made only one simple change to LCC’s front-end (namely, a small change to gen.c). Most of the time was spent on understanding how LCC works and preparing the machine description. When writing a port, I was faced with similar problems and design choices other LCC ports for similar machines had. Below is a summary of the most important ones.

Support for 8-bit registers and opcodes

LCC stubbornly assumes that 8-bit is not enough for decent computing and promotes operands to 16-bit for most arithmetic and bitwise operations. This effectively meant for BYTEC that its 8-bit opcodes would never be used. I ended up with two solutions for 8-bit ops.

1) Whenever LCC’s rule engine detects conversions (promotions of 8-bit to 16-bit) which are demoted to 8-bit soon afterwards, I assume this is a case for 8-bit op. I give up using LCC’s template engine then, and generate code “by hand” in LCC’s emit2() function. For example, these are such rules for 8-bit ADD:

reg8: ADDI2(reg8,CVII2(INDIRI1(addr)))         "# emit2 (use AH/AL instead of A)\n" 1
reg8: ADDI2(CVII2(reg8),CVII2(INDIRI1(addr)))  "# emit2 (use AH/AL instead of A)\n" 1
reg8: ADDI2(CVUI2(reg8),CVII2(INDIRI1(addr)))  "# emit2 (use AH/AL instead of A)\n" 1
reg8: ADDI2(reg8,con)                          "# emit2 (use AH/AL instead of A)\n" 1
reg8: ADDI2(CVII2(reg8),con)                   "# emit2 (use AH/AL instead of A)\n" 1
reg8: ADDI2(CVUI2(reg8),con)                   "# emit2 (use AH/AL instead of A)\n" 1

Function emit2() takes care to use for operands and return result in AH or AL register, suppressing promotions. Of course other rules exist for cases in which promotions are necessary, and result is really expected in 16-bit register, like these:

reg:  ADDI2(reg,reg)                           "?\tmov\t%c,%0\n\tadd\t%c,%1\n" 2
reg:  CVII2(reg8)                              "\tsex\t%c\n" 2

2) Rules which use 8-bit operands, but are executed for side effect (i.e. statements) and do not pass a result in a register are handled using LCC’s rule engine (there is no need to override target register and so templates are enough to do the job), e.g.:

stmt: EQI2(CVII2(reg8),CVII2(INDIRI1(addr)))   "\tcmp\t%0,%1\n\tje\t%a\n" 1
stmt: EQI2(CVUI2(reg8),CVII2(INDIRI1(addr)))   "\tcmp\t%0,%1\n\tje\t%a\n" 1

All in all – the solution is not very elegant, but works well.

Support for 32/64-bit opcodes

Similar problem exists for operations beyond native bus/register size of BYTEC, required by ANSI C types long, float and double (16-,16- and 32-bit wide, respectively). Here, I used the approach (and pieces of code) proposed by Bill Buzbee in his LCC port for Magic-1. The idea is to lie to LCC that it has a bunch of wide registers, let it use them as necessary and then map them onto frame space (if they were used). For example, a rule for 32-bit add:

reg32: ADDI4(reg32,reg32)                      "?\tCOPY32\t(%c,%0)\n\tADDI32\t(%c,%1)\n" 4

Generates the following sample interim code:

    COPY32 ((sp:4),(sp:0))
    ADDI32 ((sp:4),(sp:-8+20))

This code is later passed through a preprocessor which ultimately converts it to a run-time function call for 32-bit addition:

   lea  x,(sp:4)
   lea  y,(sp:0)
   call $__copy32
   lea  x,(sp:4)
   lea  y,(sp:-8+20)
   call $__addi32

The reason for using preprocessor macros instead of generating run-time calls directly is clear from the example above. The COPY32 macro is only generated when source and target register of the rule differ (it is due to the question mark at the beginning of rule definition). This mechanism of conditional register copies works well as long as the copy operation is one instruction only. COPY32 itself when unrolled requires two LEAs and one subroutine call, rendering the use of ‘?’ modifier obsolete in this case, and requiring interim pseudo-code (macros).

Activation frame

One of the important decisions is the structure of activation frame. Here, the problem I realized when porting LCC was that BYTEC/16 lacks a frame pointer register, which is typically used by other CPUs to reference a fixed position in an activation frame, and freeing the stack pointer. With frame pointer set, locals and function parameters are referenced relative to it. Without frame pointer, all you have is a stack pointer (SP), the use of which is limited for other purposes. When designing hardware I have completely ignored this fact which, in retrospect, was an omission. I am not saying it is impossible to design an activation frame with only S, but a set of design choices regarding the structure of the frame and the way it is used is somewhat limited. Moreover, care must be taken when generating code for the function not to change SP prior to local or formal parameter reference.

BYTEC/16 uses a solution in which:

Locals are stored in the activation frame and referenced relative to SP
Called function arguments are stored in the callers frame and referenced relative to SP by both caller and callee (although with different offsets)

The latter has one problem. The drawback of this approach is that the size of caller activation frame has to accommodate the longest parameter list of all its callees. This space is kept allocated as long as the caller function lives, and not only during callee activation. I am not sure if it’s really a best practice.

Here is a picture of BYTEC activation frame (with stack initialized at 0xffff) for a simple C program:

int sum(int a, int b) {
    int s = a+b;
    return s;
}

void main() {
    int x,y,z;

    z = sum(x,y);
}

Compiled to:

; BYTEC/16 assembly, generated by lcc 4.2

.global _sum
.code
_sum:
   sub  sp,2
   ld   a,(sp:0+2+2)
   add  a,(sp:2+2+2)
   st   (sp:-2+2),a
   ld   a,(sp:-2+2)
L1:
   add  sp,2
   ret

.global _main
_main:
   sub  sp,10
   ld   a,(sp:-2+10)
   st   (sp:0),a
   ld   a,(sp:-4+10)
   st   (sp:2),a
   call _sum
   st   (sp:-6+10),a
L2:
   add  sp,10
   ret

With the following structure of activation frames:

        |---------------------|
        | ...                 |    sum() frame:  spill area (here empty)
        |---------------------|
 0xFFF0 | s                   |    sum() frame:  locals
 0xFFF1 |                     |    
        |---------------------|
 0xFFF2 | Return PC           |    sum() frame:  return address (PC)
 0xFFF3 |                     |
        |---------------------| 
 0xFFF4 | y                   |    main() frame: outgoing params for sum()
 0xFFF5 |                     |
 0xFFF6 | x                   |
 0xFFF7 |                     |    
        |---------------------|
        | ...                 |    main() frame: spill area (here empty)
        |---------------------|
 0xFFF8 | z                   |    main() frame: locals
 0xFFF9 |                     |
 0xFFFA | y                   |
 0xFFFB |                     |    
 0xFFFC | x                   |
 0xFFFD |                     |       
        |---------------------|
 0xFFFE | Return PC           |    main() frame: return address (PC)
 0xFFFF |                     |    
        +---------------------+

As far as return values are concerned, 8/16-bit function results are always returned in register A, 32-bit results are returned in X and Y pair and 64-bit results (double) are returned to a dedicated position in the caller’s frame (with its reference passed by the caller to the callee in register A).

Quick build instructions

In order to build LCC with BYTEC support in Linux/Unix one must:

Download off-the-shelf LCC ver. 4.2. It is available here.
Download the most recent BYTEC/16 patches from downloads page.
Unzip LCC to a dedicated directory, e.g. lcc.
Unzip the patch file to LCC root directory by invoking gunzip patchfilename.gz.
Apply patches by invoking patch -p0 < patchfilename. This will update src/gen.c, src/bind.c, the makefile, and create src/bytec16.md (the machine description).
Create LCC build directory, e.g. lcc/bin by invoking mkdir -p bin from within LCC root directory.
Set BUILDDIR environment variable to point to the build directory by invoking export BUILDDIR=`pwd`/bin from within LCC root directory.
Invoke make HOSTFILE=etc/linux.c lcc to build LCC driver. Chose another host file if you are compiling for a different host (e.g. Windows).
Invoke make all to build the rest of LCC, including compiler proper.

With a bit of luck, the build directory should now contain executables. Compiler proper is in `rcc` executable and should be used as follows to output BYTEC/16 code:

./rcc –target=bytec16 source.c

Other changes

The compiler triggered other design changes, too. Again, I cleaned up the instruction set quite a bit. I decided to give up some redundant instructions and addressing modes (which were not interesting in view of code generation) and added some which seemed to me absolutely necessary (e.g. unsigned comparisons, load effective address ops, 16- to 8-bit truncating register moves, etc.). This required changes to the microcode, and the assembler. I will publish updated specs and source codes as soon as I clean up the files (should be no later than next week).

BYTEC/16

ANSI C compiler for BYTEC/16

The compiler

Other changes

Leave a Reply

Homebuilt CPUs WebRing

Join the ring?