[CST-2] Comp Arch

Martin Harper mcnh2@cam.ac.uk
Sat, 19 May 2001 17:41:10 +0100


Fou dans la noix de coco wrote:

> Hi
> 
> Does anyone have a good explanation for what is going on on page 56 or
> comparch as far as aligned sub-word load logic is concerned.  What is
> there seems to make sense but no real answer for exactly how and why...
> 
> Garan

The diagrams on both page 56 and 57 make no sense whatsoever: bunch of 
random trapezoids with letters in them - bunch of nonsense. As I 
understand the idea, you have a lot of different paths for the data, 
gated by the bottom bits of the address to be loaded/stored.

EG, dumb hardware which supports 2 bits and 4 bits: we want an "aligned 
subword load", which means loading 2 bits of info from a mod 2 address. 
The entire memory is (EG) 16 bits, so we have 4 bit addresses. By 
convention, the least significant bit is bit 0, while the most 
significant is bit 3.

So there are several stages in the hardware:
1) load the whole 16 bits from memory or cache - do this by just setting 
bit 1 of the address to zero and then using the standard aligned word 
load hardware.
2) in parallel with (1), check that the load we're being asked for is 
aligned: IE check bit 0 of the address is zero.
3) select the two bits of the four that we want to load. We just create 
a path from bits 0-1 of the data, gated by bit 1 of the address, and 
another path from bits 2-3 of the data, gated by !(bit 1). That's for 
big endian: for little endian it's the other way round. You could use 
AND and OR gates too, but that would be SLOW...
4) optionally sign extend the most significant of the two bits, and 
place the result into your four bit destination register.

The whole thing is pretty quick, and shouldn't take significantly more 
time than an aligned word load.

Aligned subword stores are simpler, provided your memory system can 
handle you doing writing only one subword: ie provided it has "byte 
store hardware". Otherwise, you have to load the whole word, place the 
subword you want to store in it, and store it back - and this is a load 
and a store, so it's going to take an extra pipeline stage. Or two 
instructions, more likely.

Reading between the lines, it seems that most chips have the ability to 
do this merging in of bytes into words ("byte store hardware"), so it's 
only unaligned loads and stores which require multiple instructions and 
such.

Martin
[disclaimer: I always get Big and Little Endianness mixed up...]