|
|
By dawid, on May 5th, 2012
The UART board from BatchPCB must be cursed. First, it took it (and the accompanying new memory board) over four months to get here, then it silently refused to cooperate when I finally soldered it. I was surprised, as it was really an easy board to design and solder, with only few components and lots of breathing room for everything. I thought there was nothing that could have gone wrong, but still, it had gone wrong.
Gender mismatch
 Eagle's DB-9 footprints for male (left) and female (right) connectors
Immediately after soldering, I beeped out all the connections on the board, checked for shorts between power and ground with a multimeter (as I always do before plugging anything new to the computer), connected it with a null-modem serial cable to my laptop and… I saw a black screen. The board was dumb and deaf.
It took me an hour or two of fiddling before I realized I had made a silly mistake. Before that, I replaced all chips, beeped out the connections again, swapped the UART ICs between the PCB and my previous wire-wrapped prototype but to no avail. So, I decided to break out the oscilloscope and probe signals on DB9 connectors, to see what was happening.
It wasn’t long before I realized that the signals I was looking for were all there, but not on the pins where I was expecting them to be. In fact, the layout of the connector was reversed. I made the mistake during one of the last cleanups of the schematics when I changed the version of a connector footprint to explicitly indicate that its metal shield was grounded (G1 and G2 on the diagram). In doing that, I mistakenly took a female connector, even though Eagle CAD indicates quite clearly which is which. If I recall correctly, I even noticed the different shape of the footprint back then, but somehow decided not to bother and simply neglected that fact without verifying.
Incorrect connector gender was the reason why I observed the signals in reversed order on the scope. Here is how the male and female DB-9 connector pins are ordered so that they match when connected (note the location of pin 1 in both cases):
 DB-9 connector layouts - male (left) and female (right)
My very first thought was that I wasted the board and that it had to go to trash. On a second thought though, I realized that there was a quick and dirty hack to solve the problem and still save the board. I simply needed to solder the connectors on the bottom side of the board. The final result looks a bit awkward, but works perfectly fine and is as solid as in the original plan.

Having fixed the connection problem, I fired up the whole thing again, and… I still saw nothing on the terminal screen.
Ground spill
The second problem was entirely not my fault, and as such was a bit more difficult to track down. It also proves that BatchPCB’s policy to manufacture more boards that you actually order is not really a special offer, but their means to avoid excessive returns.
My board had a manufacturing fault. There was a ground spill, shorting one of my address bus bits to ground, and effectively making this bit always-zero. Below is a close-up photo of how BatchPCB messed up my board. Compare it to a screenshot of the original design in Eagle CAD. Red arrow points at the place when ground layer went a bit beyond its designed boundary.

When I finally discovered the short, I fixed it by cutting a ground plane with a precision knife (this may be seen on the photo as well). After that, the board greeted me with a nice “Hello world!” from my UART test code.
Out of curiosity, I checked the other UART board I got from BatchPCB for free. It was OK, at least in this board region (no short to ground on the pin). I was unlucky to pick the wrong board. Nevertheless, there is a valuable lesson learned in this. BatchPCB may say that their minimum spacing requirement is 8 mils, but one should not trust that. My board was within the specs, but ground spill happened. From now on, whenever possible, I will leave more space between the polygons (ground or power) and the traces on my boards.
The schematics package and Eagle files I am posting in downloads today already have the connector gender problem fixed and the ‘isolation’ parameter for the ground planes set to 16 mils.
By dawid, on April 30th, 2012
Some time ago, when fiddling with my Monitor/OS program I decided to make a few changes to the instruction set. I guess that’s the real flexibility when you build your own CPU – if you don’t like the instruction set, and you suddenly come up with a need for an instruction that is missing, you simply add it, hardware permitting. I know this is not the right way to do CPU design, and instruction set architecture (ISA) should be a cornerstone to CPU design, not the other way around. On the other hand, what I am doing is not a real CPU design, but a cool hobby project, so I am free to do whatever I want, within reason.
I don’t have any spare opcodes anymore, so the change involved removing instructions with fancy indexed addressing modes with 16-bit offsets, which were (or seemed) not very useful as they have their more useful 8-bit versions. Instead I added four instructions to load and store bytes to code memory, and a couple of 8-bit arithmetic instruction that were missing, and lack of them was quite painful. Now it seems strange to me that I hadn’t added them to the instruction set in the first place (POP AH and POP AL really seem to be must-have instructions at the very first thought). All in all, here is what I removed:
| Opcode |
Mnemonic |
Description |
| 40 |
LD AH, (SP:#i16) |
load high byte of A with memory at address in SP plus 16-bit signed offset |
| 41 |
LD AL, (SP:#i16) |
load low byte of A with memory at address in SP plus 16-bit signed offset |
| 63 |
ST (SP:#i16), AH |
store high byte of A at address in SP plus 16-bit signed offset |
| 64 |
ST (SP:#i16), AL |
store low byte of A at address in SP plus 16-bit signed offset |
| 92 |
AND A, (SP:#i16) |
bitwise AND on A and memory at address in SP plus 16-bit signed offset, result in A |
| 96 |
AND AH, (SP:#i16) |
bitwise AND on high byte of A and memory at address in SP plus 16-bit signed offset, result in high byte of A |
| 9A |
AND AL, (SP:#i16) |
bitwise AND on low byte of A and memory at address in SP plus 16-bit signed offset, result in low byte of A |
| A1 |
OR A, (SP:#i16) |
bitwise OR on A and memory at address in SP plus 16-bit signed offset, result in A |
| A5 |
OR AH, (SP:#i16) |
bitwise OR on high byte of A and memory at address in SP plus 16-bit signed offset, result in high byte of A |
| A9 |
OR AL, (SP:#i16) |
bitwise OR on low byte of A and memory at address in SP plus 16-bit signed offset, result in low byte of A |
| CB |
CMP A, (SP:#i16) |
compare A to memory at address in SP plus 16-bit signed offset |
| CF |
CMP AH, (SP:#i16) |
compare high byte of A to memory at address in SP plus 16-bit signed offset |
| D3 |
CMP AL, (SP:#i16) |
compare low byte of A to memory at address in SP plus 16-bit signed offset |
And here’s what was added instead (note different opcodes used – apart of replacing instructions I also made an overall layout change):
| Opcode |
Mnemonic |
Description |
| 40 |
LDC AH, (DP:X) |
load high byte of A with code memory at address in DP plus 16-bit signed offset in X |
| 41 |
LDC AL, (DP:X) |
load low byte of A with code memory at address in DP plus 16-bit signed offset in X |
| 63 |
STC (DP:X), AH |
store high byte of A at code memory address in DP plus 16-bit signed offset in X |
| 64 |
STC (DP:X), AL |
store low byte of A at code memory address in DP plus 16-bit signed offset in X |
| BD |
SUB SP, #i8 |
subtract 8-bit signed immediate from SP |
| BE |
SUB DP, #i8 |
subtract 8-bit signed immediate from DP |
| BF |
SUB X, #i8 |
subtract 8-bit signed immediate from X |
| C0 |
SUB Y, #i8 |
subtract 8-bit signed immediate from Y |
| C3 |
POP AH |
pop high byte of A from stack |
| C4 |
POP AL |
pop low byte of A from stack |
| F8 |
XOR AH, #i8 |
bitwise xor on high byte of A and 8-bit immediate, result in low byte of A |
| F9 |
XOR AL, #i8 |
bitwise xor on low byte of A and 8-bit immediate, result in low byte of A |
Of course, any ISA update requires rebuilding the microcode so I am publishing today a complete instruction set description in excel spreadsheet, and a new revision of microcode source. All new files are downloadable from here. A complete instruction set reference has also been updated on instruction set page.
Although I am releasing this update today, I am sure now that it is not the last revision of the instruction set. Recently, I have started reading the LCC book (see this post) in a preparation for porting a C compiler. It is very probable that this process will reveal further flaws in my ISA and I will need to rework it to streamline code generation.
Another issue I have realized is a bit more serious and is related to how user mode programs call (or will call) OS routines (or software interrupts) by using SYSCALL instructions. Here is how I currently call software interrupts when in supervisor mode:
.text
ld dp, msg_hello
syscall KRN_UART_PSTRNL
.data
msg_hello db 'Hello World!',0
This code snipped prints a string to a VT100 serial terminal with a newline character. As an input, the programmer must provide an address pointer to a zero-terminated string by using DP register (global data pointer). It works great within supervisor mode, because the SYSCALL does not have to switch CPU modes. However, when executed from user mode, SYSCALL switches context to supervisor mode in order to start executing OS routine KRN_UART_PSTRNL, effectively losing access to a string referenced by DP register (because data memory is already supervisor, not user data memory at that point). I may solve it by detecting in the OS service routine that it is being called by a user mode program (by looking at the MSW mode bit on the stack) and performing memory page switching to access the user mode data. Or I may add new instructions to the instruction set, allowing supervisor mode code to access user mode data segments without the necessity to swap supervisor mode memory banks. The latter seems more appealing to me now, but it will require not only ISA update, but also a small change to the hardware. I would need a new signal to override the machine mode flag of the MSW. This signal would only be asserted by the new instruction’s microcode and would force the memory system to access user mode banks and data, even though the CPU was in supervisor mode. I will set this aside for a while to see if I come up with a better solution.
By dawid, on March 14th, 2012
Finally, after a very long wait, I have found an envelope from BatchPCB in my mailbox. I placed an order on November 1, 2011 for two euro sized boards in a hope that I would receive them after three to four weeks. I received them last week, so after more than four months. That’s way too long. The fact that my first boards shipment was lost by the carrier contributed a lot to the order-to-delivery time. Eventually, BatchPCB offered to redo my boards (although not without exchanging several emails in which I expressed my disappointment with increasing intensity) but they did not seem to be in a hurry with panelizing my design again. It is quite probable that I will use BatchPCB again anyway, because they are one of the least expensive choices for prototypes, but from now on I will always use more expensive expedited international shipping (with insurance and tracking). Always, with no exceptions.

Unfortunately, extended lead time was not the only problem I have encountered with this order. I noticed that the boards silkscreen layer quality is inferior to that from some time ago. I am not sure if BatchPCB have changed their vendor in China, or it is just a temporary problem, but it is noticeable at the very first glance. The photos below compare silkscreen from July 2010 and February 2012 (note the difference in crispiness and continuity of the lines and text).
 July 2010 |
 February 2012 |
Apart from the messy silkscreen, the rest seems to be fine. Quality of electrical traces, board edges, pads, drill hits look exactly as I expect them to look, and more importantly, as I designed them. Another point on positive side – I have received not two but four boards (two each). Apparently, BatchPCB had some spare panel space, and I was among the lucky ones to receive a surplus.
Time to break out the soldering iron!
By dawid, on February 17th, 2012
Believe it or not but I am still waiting for memory and UARTs board from BatchPCB. I am waiting for a redo after my last shipment was lost but looks like these guys assign lower priority to redo jobs. All in all, I have been waiting for two boards since November 1 last year. My design has just been panelized. It sucks.
Meanwhile, I received an IDE/RTC board I ordered at a local fab house (board described here). I worried that quality would not be as good as BatchPCB’s but at a first glance I was positively surprised. Traces and ground planes revealed no pour, silkscreen was crisp, soldermask and drills were all well aligned and board edges were nice and smooth. All this cheaper and faster than at BatchPCB. This is how the bare board looked like:

The only quality-related problem I encountered when assembling this board was a pad that detached from the boards when treated it with a solder sucker during desoldering. I may have overheated the pad but this as well may indicate that these boards are not good for any kind of rework. I don’t recall this kind of trouble with BatchPCB’s boards although I am pretty sure I was desoldering components from them.
I also struggled a bit soldering the 3V battery holder but this time it was only my mistake, not the board quality issue. I was unable to find a footprint in Eagle matching the holder I had so I created my own. Unfortunately, I chose a drill size a bit too small and it took some force to push the leads through the holes. Other than that, everything went rather smoothly. Here’s the final result:

The peggy board sticking out the right side of the board is an IDE – CompactFlash adapter got from eBay for 4 euros. I haven’t tried it yet but ultimately I will be using CompactFlash rather than real IDE hard drives which are big, heavy and may easily fail considering their age (although I salvaged four old drives from two PCs I had in my basement and they all seem to still work fine). Still, CompactFlash is simply more handy for development purpose, and it can be easily swapped between my computer and a PC (connected via USB reader) to exchange files.
Bring-up of RTC
Immediately after the board assembly, my first attempts to read real-time clock ticks failed. The clock seemed dead – all its registered read $ff as if it was unconnected and despite I studied the datasheet again and verified all connections it just would not work. I replaced the chip, the 32kHz crystal, even checked the battery but none of these helped. Surprisingly, the same chip connected on the breadboard immediately produced good results (i.e. the seconds register started outputting changing values). When I connected its least significant bit to the LED, it generated one-second pulses. I was stuck.
At some point I changed the power supply from a desktop lab power supply to an old AT power supply. I only did this in order to power an IDE drive, not because I suspected problems with my power lines. Out of a sudden, the RTC started ticking. I was really baffled.
Now that I knew that the problem is related to power or ground I started playing around. I tried connecting my lab power supply here and there to the boards and observed the RTC work or fail, depending on where I connected the ground. I also observed ground noise on a scope. Ground was never perfectly flat but the noise was not really that bad. At least, It had never caused problems before.
I started googling this problem and I came across this excellent article on using crystal oscillators with Maxim’s RTCs. It was an eye opener. I have been suspecting the crystal oscillator to malfunction, but only after reading this paper I realized how ignoring simple physical characteristics of a circuit got me into trouble.
A quick look at the PCB photos above reveals that I have placed a ground plane below the crystal (the crystal is this little metal cylinder right next to the big RTC chip). Ground plane in fact runs on both sides of the board, acting like a capacitor (two conductors are separated by a dielectric). For the low-power clock crystal to work, the circuit must load the crystal with correct capacitive load. If the capacitive load is greater than the crystal’s parameter, it will slow down or fail to swing completely. I guest this is exactly what happened in my case – power planes around the crystal caused stray capacitance of the board layout which sometimes prevented the crystal from starting (depending on how/where I actually connected the wires to the boards, although this phenomenon I don’t really grasp).
From the Maxim paper I learned that the following guidelines should be followed for board layouts with extremely low power crystals:
- crystal should be as close as possible to the chip inputs (this condition was met on my board)
- trace width should be kept minimal
- no other signals should run directly below the crystal or its traces to the chip
- a guard ring may be routed around the crystal to prevent it from noise from neighbouring chips or traces
- for the same reason a local ground plane may be placed on a layer immediately below the crystal (but only in one layer to prevent stray capacitance)
The board layout I am publishing today already has all the above recommendations implemented. I have not tested yet how it improves clock crystal performance, but I am pretty sure that stray capacitance was a primary cause for failures. For now, as a dirty workaround, I have learned how to connect power so that the RTC works.
Bring-up of IDE
IDE bring-up, on the other hand, went smoothly. I was worried that I had made mistakes in IDE circuitry, but it worked at the very first try. I connected and old 13GB IBM hard disk to the controller and I was immediately able to send IDE commands to it. It responded to spin-up, spin-down and read sector commands. At the same times the HDD activity LED blinked nicely to indicate drive business. One of the first exciting things to do was reading drive identifier, which every IDE device exposes (IDE command $EC). Full list of ATA/ATAPI commands is here.
The sequence of steps to set-up IDE and read IDE identifier block (512 bytes) is the following:
- reset IDE (write to control register)
- spin-up drive (send command $E1 to command register)
- wait for RDY=1 and BUSY=0 flags (by polling status register or using interrupts)
- issue drive identify command (send $EC to command register)
- wait for DRQ=1 and BUSY=0 flags (again, by polling status register or using interrupts)
- read 256 words from 16-bit data I/O port
- interpret data just read (e.g. drive label – 40 bytes at offset $36, drive size in sectors – 4 bytes at $78)
The final result of my recent experiments is below and shows that both IDE and RTC are alive. The line staring with ‘IDE:’ presents selected information retrieved from drive identifier block (in this case my old 13GB IBM hard-disk was connected). The ‘time’ shell command simply displays current time read directly from RTC registers.

It is now time to think of implementing a filesystem, like read-only FAT16 or something custom, simpler but with write support. I am thinking of leaving only a simple bootloader program in ROM and booting a Monitor/OS from IDE device completely. This would allow me to give up using the EPROM emulator which, although extremely helpful, is not very handy and increases setup/cleanup time when I need to free up my desk for other activities.
What next?
As usual, I have uploaded the updated schematics and board layouts here. I think I am getting close to declaring basic hardware completed. It is high time to make this project more software-oriented and move on to things I have been declaring since long time ago, like C compiler, linker and real multitasking OS.
By dawid, on January 8th, 2012
My recent order of boards from BatchPCB got lost somewhere on its way. I am not sure if it was lost by USPS, at customs or at my local post office here in Warsaw, but the fact is I am still waiting for a new memory and UARTs boards. BatchPCB is re-doing my order now at not extra cost, but it will take another 2 or 3 weeks for the boards to arrive. The IDE interface board which I ordered at a local fab house is not here neither, so I used the little spare time for a simple but important hardware tweak.
I implemented wait states logic in my clock generation circuit. Wait states are intentional delays in CPU/bus clock cycle introduced in order to allow more time for slow external devices to complete their work. Wait states are not necessary for a CPU to operate with devices, but without them the maximum clock rate is limited by the slowest device’s timing constraints. Until now, I was able to live without it but since I am about to build and test an IDE interface (which has quite rigorous timing characteristics) I decided to implement it. Otherwise, my maximum clock speed (currently at 4MHz) would have to be reduced quite significantly.
There are many ways to implement wait states. The most elegant approach would be to inhibit the clock whenever a slow device is accessed and let the device inform the CPU when it completed. In this scenario, wait states are device controlled. This way, wait states duration would be dependent on the accessed device type and speed and CPU time would be used most effectively. The drawback of this approach is increased circuit complexity (wait states logic has to be implemented for each peripheral).
Another way is to introduce uniform wait states into clock cycle controlled by the CPU. This is the approach I took. There is only one wait states circuit (in the CPU) and idleness duration is always the same. My CPU’s clock generation circuit is built using a simple Johnson counter generating two clock signals (one offset by quarter of a cycle with respect to another) at a quarter of the frequency of a generator. I described it a while back in this post. Call these signals Q0 and Q1 (generator signal is Q).

I am using the /IO signal generated by the memory module whenever device control memory range is accessed (addresses $1000-$1FFF). When this signal becomes active (indicating that a device is accessed) it is used to toggle a dedicated wait state JK flip-flop on a falling edge of Q1 to generate a stop signal that inhibits a clock. This is achieved by activating the /CLR input of a D flip-flop that forms a master clock generation Johnson counter. JK flip-flops holds this D flip-flop input low until next falling edge of Q1, effectively keeping the Qo and Q1 low for an entire extra cycle. The duty cycle of a modified clock in such case is 25% (75% of extended clock cycle is low) but since I am generating /ENMEM and /LDMEM signals during the low clock period, it is perfectly fine. You may think – how is the JK flip flop toggled out of the stop clock state, when there is no clock to unfreeze the system? That’s solved by adding a copy of a clock generation Johnson counter, which is non-maskable and is used only for controlling the wait state circuit and remains intact even if the stop clock signal is asserted. Please see updated schematics in the downloads section for details.
The screen from logic analyzer show how this works:

Signals C0A and C1A are master clock signals, whereas C0B and C1B are non-maskable clock. SLOW is the low active stop-clock signal which is generated by the JK flip-flow from a /IO signal on a falling edge of C1B. When this happens, both C0A and C1B are kept low for one full clock cycle until C1B (which remains operational at all times) toggles JK flip-flop again and brings the system back to normal operation. At 4MHz clock speed the low phase of the clock when wait state is introduced lasts 750ns which should be enough even for the slowest devices.
|
Comments