The simulator is ready and running first simulations of my CPU, proving for the first time that the envisioned mesh of wires and chips may actually work. Writing the simulator was not easy, but I think it was a great exercise. Not using Verilog for this purpose was not a bad idea at all.
The CPU simulator is a custom C++ application, with slightly above 100kB of source code now. It models the design down to the level of a single logic gate and wire, taking into account timing (device propagation delays, to be more specific). It is based on a very simple recursive principle. Whenever a signal is driven to a wire, all devices whose inputs are tied to this wire are notified to do their job (but synchronous devices are only notified on clock edges). The devices’ job is to perform some logic and assert their outputs with a propagation delay. As there are some wires tied to these outputs, signals are driven to them, and are propagated down the path. The cycle starts with a clock tick on a clock wire, and finishes when there are no more devices to be notified (no new values are asserted to any of the wires). You may be thinking now that this must be awfully slow, and you are right. It takes about 1 real-time second to compute a single machine clock cycle in most complex cases, but at least it works. At some point I may decide to replace gate-level descriptions with their higher order representations (e.g. the behavior of a whole control module, or at least its major components could be reflected by a behavioral description, while still exposing the some “pins” to the rest of the machine). This will certainly speed up the simulation, yet at the expense of model fidelity.
Constructing the simulator, at first I developed a class framework for three-state logic, synchronous and asynchronous logic devices and wires. My mistake (resulting from laziness, as I cannot find any other explanation at this moment) was not to implement busses similar to those in Verilog (collections of wires) which later forced me to provide lengthy lists of wires as arguments. Maybe I will correct my mistake one day, maybe not. The framework provides a logic probe class, which allows me to tap into a wire or multiple wires in the machine description and display signal values at arbitrary simulation times for debugging purpose.
Having completed the framework, I implemented a couple dozen specific 74-series TTL devices (selected gates, encoders, decoders, multiplexers, the ALU), memories (SRAM, EPROMS) and other devices (power source providing me with wires of VCC and GND for hardwiring, the oscillator and reset signal generator).
Finally, I took the devices, the framework and modeled the machine down to the lowest component detail. This was the hard part. I was constantly finding bugs in the framework, in specific devices code, and in the model. There were (and I am pretty sure there still are) bugs in the microcode. It took some tough debugging until I finally obtained a first simple computation. Fortunately, apart from some minor issues, I haven’t encountered any serious conceptual flaws in the overall design.
The most obvious conclusion of initial simulation is that the machine actually has a chance to work when built. I am pretty sure that the control subsystem works, registers, ALU and data paths all seem to be all right at this point. I also used the simulator’s timing information to measure timing critical path which is about 600ns with worst case propagation delay assumptions for the slowest 74LS devices. With this result, theoretically I should be able to run the machine at ~1.6MHz. By using some 74F-type chips on the critical path I hope to manage to do 2MHz or faster, which is what I had expected.
To prove that I am not making things up, below is my first simulation example. A simple program will compute an integer sum of 1+2+…+5. We are expecting a result of decimal 15, or 0x0F in hex. First, the source code, along with hand assembled binary:
ld x, 1 // 2b 01 xor y, y // b0 loop: mov a, x // 02 add a, x // 6f mov y, a // 08 add x, 1 // c1 01 mov a, x // 02 cmp a, 5 // c5 05 jle loop // d7 ff f5 halt // 01
The simulator outputs debug information after each clock cycle, for each active logic probe, values of all machine registers, and status flags. Here is a screenshot of the simulation’s final cycles, before the machine comes to a halt (opcode 01) in cycle #73. The value in register Y clearly indicates a correct result of summation, which is hexadecimal 0x000F.
Full simulation dump is here.
Now I need to think about getting some kind of assembler. Hand assembling test routines is not really my hobby. I will look for something readily available and see what it would take to port it. I don’t want to spend much time on it at this point, so if it turns out a complex task I will probably write my own. A simple absolute, one-pass assembler generating plain binary will do for the beginning. It should be fairly easy to implement. Ultimately, I will definitely need a full-blown assembler (macro assembler, maybe) and linker.
I will also prepare a test covering each of my instructions, assemble and run it back and forth to eliminate as many microcode bugs as possible. After all, it is one of the reasons why I have developed the simulator.
Then I will definitely return to design and make final decisions on two remaining aspects – the memory subsystem organization for virtual memory, and faults and interrupts mechanisms. Of course, this needs to be simulated, too.
The simulator in its current version is available for download here (this time it is a Visual C++ 6.0 project). I have also published the new microcode source after having several bugs fixed and the new microcode assembler (fixed a nasty bug in Intel HEX file generation routine).