Assembler and instruction set update 2

I am slowly progressing on my simplistic OS/Monitor development. It is only a little fun, mainly because I do not have a functional simulator (i.e. emulator) of the computer and I have to test every bit of assembly code on real hardware. For that I have to constantly switch from Linux (where my development toolchain resides) to Windows (where my EPROM programmer software is), remove the ROM chip, burn it, plug it back into the memory card, boot back to Linux to run the VT100 terminal emulator until I finally give it a try. Because I am mainly dealing with RS/232 driver for 16550 chip and other low level kernel routines, the success ratio is rather low. I really need to think about an emulator, otherwise development pace will be too slow and I am afraid I may get discouraged and the project may loose momentum. Somehow it is difficult to imagine at this point how to do more serious programming (e.g. C compiler retargeting, Minix porting) without a decent emulator.

The OS is still too immature to share it. Nevertheless, I have some progress to report. As a useful side effect of my recent work I managed to enhance the functionality of assembler to make it more usable. It is the same assembler built using flex and yacc, but I have added support for constants, expressions, multiple memory segments, and some more. The assembler still generates absolute binary objects (flat binaries with fixed memory references) in Intel HEX format, but at this point it is sufficient for my needs, and rather convenient (because this way I need no linker and burn the binaries directly to the ROM).

Also, I updated the instruction set and corresponding microcode by adding few instructions which I found particularly handy for current development. I have already used up all 256 opcodes, so from now on any new instruction will have to replace one of the less useful. I initially designed the instruction set quite expansively, assuming orthogonality for most load and store instructions, so this should not be a big problem. I am assuming that porting a C compiler will result in big ISA rework, anyway.

Check downloads page for updated software and microcode. Below is a “what’s new” report.


The assembler currently supports the following items (labels, expressions, literal types and directives):

Item Description
label Label is an alphanumeric string used to define references to locations of code and data segment. Labels that precede instruction mnemonics are are defined with a trailing colon (:). They are composed of letters and digits (first character of the label must be a letter. Labels are case sensitive.


_label1: ld a, 0x0001

To reference a label, use its name without a colon, e.g.:

jmp _label1

expression Expressions are used to build any value and were added for programmer’s convenience. The following are valid expressions:


literal (any of the literals presented in the table below)


expression + expression

expression – expression

expression * expression




ld dp, IVEC_BASE+0x0f*2

<label> equ <expression> Defines a constant. Constants may be used in expressions. Constant names are case sensitive.


REG_DATAPAGE0 equ 0x2400

db <expression>[,<expression>]* Dumps a series of 8-bit values to current memory segment. Each expression value is cast to 8-bits. String literals may be used with ‘dw’ directive. In such case string literal is emitted directly. The ‘db’ directive is usually labeled for easy referencing. In such case labels are without the colon.


msg1 db 'Hello world', ENDL, 0x00

dw <expression>[,<expression>]* Equivalent to ‘db’ but values are cast to 16-bits. String literals are illegal with ‘dw’.


buf dw 0x0001, 0x12af, 0xfdff

8-bit hex literal Hex literals are prefixed with ‘0x’. Literals are used in expressions (literal itself is an expression).


ld ah, 0x1f

16-bit hex literal Same as above but word size.


cmp a, 0x0ab7

8-bit binary literal Binary literals are prefixed with ‘0b’.


and al, 0b00110101

16-bit binary literal Same as above but word size.


or a, 0b0011010101111100

decimal literal Decimal literals may be used in expressions, too. Their value is cast to 8- or 16-bits depending on the instruction’s argument size. Decimal casting is signed.


cmp a, 2   ;this is cast to word

cmp al, -7 ;this is cast to a byte

8-bit char literal An 8-bit char literal is a string literal one byte in length. It may be used in expressions as 8-bit (ASCII) value or 16-bit value (it is then cast to a word).


st (dp), 'a'

string literal Used to define a string. It may be used only in ‘db’ directive, and not in expressions. String are enclosed in apostrophes (use double apostrophe to include it into a string):


prompt db 'Ready>', 0

hello db 'What''s up?', 0x10, 0x00

.code Switches code emitter to code segment. The assembler generates output for both code and data segment.
.data Switches to data segment.
.org <16-bit hex literal> Defines the segment’s entry (load) address. It may be used only once per each segment and takes 16-bit hex literal as an argument (expression is not allowed).

The assembler’s current grammar (autogenerated by yacc) may be found here.


Most of the changes are new instructions added for convenience, and are self explanatory. The only thing really worth documenting here is the new behavior of SYSCALL instruction used to call OS kernel routines. In previous microcode version I assumed that SYSCALL takes no argument and kernel function code to be invoked must be passed by A register. Then the ISR for SYSCALL would map the code in A to proper function call address and branch to it. I decided to move this branching to microcode, in a hope that it will ultimately make kernel routines calling faster, and release A register to be used for kernel function arguments. Now, SYSCALL takes and 8-bit argument (meaning, that there will be maximum of 256 kernel routines exported to user mode programs). In previous version, SYSCALL would read the ISR address from the interrupt vector (IVEC+0x1f) and branch to the ISR. Now, it takes the address from the interrupt vector, but instead of branching to the ISR, it treats this address as a base address of a map of kernel routine addresses (of course, this map must be constructed by the kernel, along with the interrupt vector). By adding an 8-bit offset (multiplied by two) to the base, the microcode obtains the effective address of a kernel routine and branches to it. This way, there is no need for any ISR for SYSCALL whatsoever. The ISR is in fact implemented in microcode. Here is the current microcode source for SYSCALL:

// SYSCALL #i8
0, *, MDR <- MDR ^ MDR
1, *, LO(MDR) <- MEM(PC); CODE              // read function code
2, *, PC <- MDR + MDR                       // multiply by 2 to get map offset and store in PC
3, *, MDR <- MSW                            // back up MSW before switching to supervisor (to store original CPU mode)
4, *, MAR <- SP                             // back up SP
5, *, MDR <- -1 + 1; LATCH_S                // enable supervisor mode (from this point on SP denotes KSP), MDR not latched
6, *, SP--                                  // this is KSP
7, *, MEM(SP) <- LO(MDR); DATA; SP--        // store MSW
8, *, MDR <- MAR                            // store SP
9, *, MEM(SP) <- LO(MDR); DATA; SP--
10, *, MEM(SP) <- HI(MDR); DATA; SP--
11, *, MAR <- IPTR                          // retrieve base address of syscall functions map from interrupt vector (0x1f)
12, *, HI(MDR) <- MEM(MAR); DATA; MAR++
13, *, LO(MDR) <- MEM(MAR); DATA
14, *, MAR <- MDR + PC                      // add to previously computed offset to map base address, store in PC
15, *, HI(MDR) <- MEM(MAR); DATA; MAR++     // retrieve function addess
16, *, LO(MDR) <- MEM(MAR); DATA
17, *, PC <- MDR
18, *, MEM(SP) <- LO(A); DATA; SP--         // store A
19, *, MEM(SP) <- HI(A); DATA; SP--
20, *, MEM(SP) <- LO(X); DATA; SP--         // store X
21, *, MEM(SP) <- HI(X); DATA; SP--
22, *, MEM(SP) <- LO(Y); DATA; SP--         // store Y
23, *, MEM(SP) <- HI(Y); DATA; SP--
24, *, MDR <- DP                            // store DP
25, *, MEM(SP) <- LO(MDR); DATA; SP--
26, *, MEM(SP) <- HI(MDR); DATA; SP--
27, *, MDR <- PPC                           // store PC (the next instruction's starting address)
28, *, MDR <- MDR + 1
29, *, MEM(SP) <- LO(MDR + 1); DATA; SP--
30, *, MEM(SP) <- HI(MDR + 1); DATA
31, *, fetch                                // fetch at PC

It is 32 cycles in total (maximum my microcode can take), but compared to the number of cycles the ISR would cost, that’s a profitable change. And, the A register is released for the OS programmer’s use.

I think what I will do next is revert back to hardware for some time. I want to add another device card with RTC and IDE controller and build a clock slowdown circuit for accessing slow devices. Stay tuned.

2 thoughts on “Assembler and instruction set update

  1. Reply Armando Acosta Jul 31,2011 10:21 pm

    Good to hear about you again. I was missing your posts.

    Slow development process… well, I guess it is at this point where a traditional minicomputer console would help, but that is not your case anyways. It occurred to me once that I could create a simple program that dumps binaries directly from RS232 port into memory; that would replace the tedious EPROM burning process… after RS232 support exists and it is stable, of course.

    Another idea that I wanted to share with you is: VTAPE (virtual tape simulated with a floppy disk). The advantage of this over IDE hard drive is simplicity (specially at early development phase), since a file system is not required.

  2. Reply dawid Aug 10,2011 10:46 am

    Loading programs over serial – that is exactly what I want to do. I will have my OS load user program binaries in Intel HEX format via serial and execute them. I believe my RS232 routines are now stable enough to handle this. Regardless of this, I think an emulator is a must so I am mentally getting ready for this task, too.

    Your idea of VTAPE is an option I might give a try one day. It is actually an interesting concept, and gives a flavour of how it was in the past. I still have some old PC floppy drives I might use for the job.

Leave a Reply




Time limit is exhausted. Please reload the CAPTCHA.