How is sign extension used in practice

computer-architecturecomputers

In wikipedia it says what sign extension is but it doesn't say what it's used for or when it is used. I read Hennesy's book "Computer Organization and Design" and it has a sign-extension mechanism mentioned on page 124. As far as I understand the sign entension is used to align numbers that have different length i.e. 16-bit numbers, 24-bits number etc all to same length e.g. 32 bits. But why do these numbers have different length to begin with? Is it because these number can be different data types and therefore have different lengths? For instance, the number can be the 16-bit immediate part of a 32-bit assembly instruction and then the 16-bit number would need sign extension. Is this true?

Best Answer

You sign-extend anytime you need to increase the number of bits in a value. Usually, this is done right before you do mathematical operations on two values. For example, when you add two signed integers, the number of bits in both numbers needs to be identical.

There are many reasons why you might not have the same number of bits in two values. One value might be a 16 bit integer, while another might be a 32 or even 64 bit int. The number doesn't have to be a a multiple of 8 bits either-- there are many analog-to-digital converters that do 10, 12, or 20 bits.

Related Solutions

Electronic – CLK in UART/USART used for

The term UART is an acronym for Universal Asynchronous Receiver/Transmitter. The term Asynchronous means something like, "not at the same rate", or not synchronized. In digital communications, asynchronous usually refers to two systems which are not sharing a common clock. The type of communication that a UART does is generically referred to as "asynchronous serial communication". UART is the device, async serial communication is what it does.

Digital logic requires a clock to function. More correctly, most digital logic of any interesting complexity requires a clock to function. And a UART is no different, and requires a clock to drive the internal logic. UARTs also have to connect to a controlling device, usually a CPU of some sort, and that connection usually requires a clock (the bus clock). Sometimes the internal clock and the bus clock are the same clock, but usually not.

Now let's talk about how the async communication works. Lets say that you and a friend are on the pier and you synchronize your watches. You agree that once a minute you will use a flashlight to send a binary 1 or 0 to your friend. You then get on a boat and go out into the water. The once a minute thing works great for a while, but your watch is a little fast compared to your friends watch, and soon you two are not communicating correctly. As more and more time goes on, the two watches become more out of sync.

The next time you try that with your friend you modify your "communication protocol". Instead of syncing your watches on the pier, you say that the next flashlight pulse will happen one minute after the previous pulse, plus or minus 10 seconds. So as long as your watches are not out of sync by more than 10 seconds every minute, your communications will happen without error. Each flashlight pulse provides the synchronization for the next pulse. The timing errors between the two watches is not allowed to accumulate, but gets "zeroed out" every time there is a flashlight pulse.

A UART does the same thing. But in this case the "watches" are synchronized at the beginning of each byte (at the "start bit"), and not re-synchronized for the remainder of the byte.

The clock that is used for the UARTs internal logic is used to drive the logic, but also to keep time during the byte. The UART detects the start of the byte and reset a digital timer to help it keep track of time until the end of the byte.

A UART also has something called a "baud rate generator", which is essentially like a stopwatch used to keep track of the elapsed time for each bit. It is this stopwatch, or digital timer, that gets reset at the start of each byte. Most UARTs have a register setting that configures the communication speed, and thus the speed of this stopwatch.

The difference between a UART and a USART is that the S stands for Synchronous. These devices also support a synchronous communications method. In other words, they can be configured to communicate asynchronously or synchronously. When in sync mode, the communications cable has a clock signal on it which is shared by both the receiver and transmitter. The use of the sync mode of a USART is largely obsolete these days.

UARTs/USARTs also have to communicate with the CPU (or something similar). This is normally done using a bus. Not all buses are synchronous, but most of the modern ones are. For this to work, both the CPU and the UART use the same clock for communications, and so there is no need to sense the start of a byte (or word or whatever) and time things like what is done for async communications.

Virtual Memory, Cache, and TLB’s

I agree that illustration is confusing.

The top half of the page is intended to describe the TLB. It sounds like you understand TLB stuff pretty well.

The entire bottom half of the page is intended to describe the data cache. (The label "cache" on the left is intended to apply to the entire bottom half of the page. How could it be redrawn to make it more obvious that it applies not only to the cache metadata valid+tag bits, but also all the data all the way to the right edge of the page?).

It suddenly splits up the physical address and uses it to index the cache, I guess.

Yes. The bottom half of that page, as you just said, and like most large caches, is a physically-indexed, physically-tagged data cache.

But why is it showing the cache and data separately?

That part of the illustration is unnecessarily confusing.

While in principle each word of memory could have its own valid+tag bits, most data caches share the valid+tag bits for a much larger block of data copied from main memory -- a block called a cache line. Loading more data than the program specifically asked for in a single instruction is often helpful, because practically all programs have some spatial locality.

The resulting cache entry structure looks something like

v tag w w w w  w w w w  w w w w  w w w w
v tag w w w w  w w w w  w w w w  w w w w
v tag w w w w  w w w w  w w w w  w w w w
v tag w w w w  w w w w  w w w w  w w w w
v tag w w w w  w w w w  w w w w  w w w w
v tag w w w w  w w w w  w w w w  w w w w
v tag w w w w  w w w w  w w w w  w w w w
v tag w w w w  w w w w  w w w w  w w w w

where the 'v' indicates the valid bit, and each 'w' represents a word of data.

Inexplicably, the book's illustration only shows one of the many blocks of data in the cache:

v tag
v tag
v tag
v tag
v tag
v tag w w w w  w w w w  w w w w  w w w w  < -- hit on this cache line.
v tag
v tag

and then the book's illustration inexplicably rotates the words in that cache line to show all the words of that one cache line stacked on top of each other.

When the data cache detects a hit -- when the cache tag matches the tag part of the desired address, and the valid bit is set -- then the "block offset" part of the address indicates one particular word of that one particular cache line.

Perhaps the illustrator ran out of room drawing an extremely wide cache line, and arbitrarily decided to rotate that line to make it fit on the page without considering how confusing that would be?

The data cache’s block size is 128 Bytes.

So for any physical byte address, the bottom 7 bits indicate some particular byte within a cache line, and all the upper bits of that address are used to select some particular cache line.

why is the byte offset just left floating?

The byte offset is left floating in this illustration, because the byte offset is not used by the TLB or by the data cache. A typical TLB and the data cache, like the one illustrated, only deal with aligned 32-bit words. The 2 bits of the address that select one of the 4 bytes within a 32-bit word are handled elsewhere.

Some simple CPUs only have hardware for aligned whole-word access. (I call them "Neither Endian" in "DAV's Endian FAQ"). Compiler writers for such CPUs must add padding to ensure that every instruction is aligned and every data value is aligned. (The two-bit byte offset should always be zeros on these machines).

Many CPUs have a LOAD instruction that can load unaligned 32-bit values into a 32-bit register. Such CPUs have special hardware elsewhere (not part of the cache) that, for each LOAD instruction (sometimes) does 2 reads from the data cache -- the unaligned 32-bit value can overlap 2 different cache lines; either or both read may cause a cache miss. The 2 bits of the address that select one of the 4 bytes within a (aligned) 32-bit word are used internally by the CPU to select the relevant bytes that the cache returns for those reads and re-assemble those bytes into the (unaligned) 32-bit value that the programmer expects. Even though such instructions give the correct results no matter how things are aligned or mis-aligned in memory, assembly language programmers and compiler writers and other programmers obsessed with optimization sometimes add padding anyway to get (some) instructions aligned or (some) data aligned or both. ("How and when to align to cache line size?"; "Aligning to cache line and knowing the cache line size"; etc.) They try to justify this padding by claiming it "optimizes" the program to "run faster". Recent tests seem to indicate data alignment for speed is a myth.

the relationship between a TLB and cache

Conceptually the only connection between the TLB and a (physically-indexed, physically-tagged) data cache is the bundle of wires carrying the physical-address output of the TLB to the physical-address input of the data cache.

One person can design a data cache for a simple CPU without virtual memory that caches physical addresses. Another person can design a TLB for a simple CPU that has no data cache (A CPU with a TLB but no data cache was once a common arrangement for mainframe computers).

In principle, a third person can splice that TLB and that data cache together, wiring the physical-address output of the TLB to the physical-address input of the data cache. The TLB neither knows nor cares that it is now connected to the data cache rather than the main memory address bus. The the data cache neither knows nor cares that it is now connected to the TLB rather than directly to the CPU address register(s).

Best Answer

Related Solutions

Electronic – CLK in UART/USART used for

Virtual Memory, Cache, and TLB’s

Related Topic