Architecture update

At a certain point when microcoding I realized that my architecture was obviously wrong in one aspect. Frankly speaking it was so wrong that I am embarrassed I had not noticed this earlier.

The problem was that all address registers were connected to the 8-bit data bus. With such layout, every address loading operation consumed two clock cycles, and addressing modes which required multiple address loads were slower than they could be, without any justification. The solution to this problem was as obvious as the problem itself – I relocated all address registers so that they still drive the address bus directly, but can be loaded from the 16-bit ALU bus. This way it costs only one clock cycle to store address values in them. I am not sure now what I was thinking when I was drawing the initial diagram. Anyway, here is its current revision.

 

cpu3

Here is a full list of all recent improvements:

Address registers. Address registers were relocated to the ALUBUS, but their functionalities were kept. MAR, SP and PC are still counters (with the ability to increment without the ALU, and SP with the ability to increment and decrement). MAR (same as MDR) can be loaded to its lower byte, upper byte, or both.

The MSW. A couple of days ago I also decided to move the MSW (machine status word) from the DATABUS to the ALUBUS and connect it to the RBUS (which now has 3 possible sources). The new MSW is 16-bits wide (previously I had only 1 unused status bit left) and may be used like any other register. Instructions like MOV A, MSW or MOV MSW, A will not require any bus interfacing now, and will not block two busses making fetch operation possible on the same clock cycle.

Improved accumulator register. I decided to add the ability to load the 16-bit accumulator register (A) a byte at a time (like MDR or MAR). Without this, the machine needed as many cycles to perform some 8 bit operations as their 16-bit counterparts, which was quite bizarre. For example, in order to compute a 16-bit operation SUB A, (DP) we need to:

  • Copy DP to MAR
  • Store byte from memory location at MAR to high byte of MDR; increase MAR
  • Store byte from memory location at MAR to low byte of MDR
  • Store result of operation (A – MDR) in A, set flags in MSW and fetch the next instruction

This takes four cycles. Now, in order to compute 8-bit SUB AL, (DP) we needed to:

  • Store byte from memory location at DP to low byte of MDR
  • Store result of operation (A – MDR) in MDR (cannot just store to A because we would destroy its high byte in case there was a borrow from high byte by low byte resulting from subtraction), set flags in MSW
  • Store high byte of A in high byte of MDR (to fix the high byte had borrow occurred)
  • Copy MDR to A, fetch the next instruction

That’s also four cycles. If we could just store the result in the second step to low byte of A, we would have finished then (the second step does not involve memory reading, so fetch would be instantly possible):

  • Store byte from memory location at DP to low byte of MDR
  • Store result of operation (A – MDR) to low byte of A, store flags in MSW, fetch the next instruction

Now, the operation is just two cycles. I think it is worth the little extra hardware it requires.

Leave a Reply

  

  

  

Time limit is exhausted. Please reload the CAPTCHA.