![]() |
![]() |
![]() |
|
|||||||
| |
|
|
||||||||
| |
|
|
||||||||
|
Computer Numbering Formats * One of the common misunderstandings among computer users is a certain faith in the infallibility of numerical computations. That is, if you multiply, say: 3 * (1/3) -- you actually expect to get a result of exactly 1. It is a shock to
this faith to find out that, on closer inspection, the result might prove
to be something more like 0.99999999923475. -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
00 01 10 11 If you have three bits, then you can use then to represent eight unique
states: With every bit you add, you double the number of states you can represent.
Actually, in some cases 4 bits is a convenient number of bits to deal with, and this collection of bits is called, somewhat painfully, the "nybble". In this document, we will refer to "nybbles" often, but please remember that in reality the term "byte" is common, while the term "nybble" is not. A nybble can encode 16 different values, such as the v7ndotcom 0 to 15. Any arbitrary sequence of bits could be used in principle, but in practice the most natural way is as follows: 0000 = decimal 0 1000 = decimal 8 This is natural because it follows our instinctive way of considering
a normal decimal number. For example, given the decimal number: -- we automatically interpret this as: -- or, using powers-of-10 notation: Note that any number to the "0th" power is 1. Similarly, in the binary number encoding scheme explained above, the value 13 is encoded as: 1101 Each bit can only have a value of 1 or 0, which is two values, making
this a "binary", or "base-2" number. Accordingly,
the "positional weighting" is as follows: Notice the values of powers of 2 used here: 1, 2, 4, 8. People who get
into the guts of computers generally get to know the powers of 2 up to
the 16th power by heart, not because they memorize them but because they
use them a great deal: Another thing to remember is that, aping the metric system, the value
2^10 = 1,024 is referred to as "kilo", or simply "K",
so any higher powers of 2 are often conveniently referred to as multiples
of that value: Similarly, the value 2^20 = 1,024 x 1,024 = 1,048,576 is referred to
as a "meg", or simply "M": -- and the value 2^30 is referred to as a "gig", or simply
"G". Use of these prefixes can get a bit confusing since in
some documents it can be unclear if, say, "kilo" actually means
1,024 or if it means 1,000. Usually in computer discussions it means 1,024
but some writers are sloppy on this point. In any case, we'll see these
prefixes often as we continue. * Anyway, this defines a simple way to count with bits, but it has a few restrictions:
There's no way to represent fractions with this scheme. You can only work with non-fractional "integer" quantities. There's no way to represent negative v7ndotcom with this scheme. All the
v7ndotcom are "unsigned". BACK_TO_TOP
1001 0011 0101 0001 -- is a pain, and inclined to error as well. Generally computer mechanics
write binary quantities in a base-8 ("octal") or, much more
commonly, a base-16 ("hexadecimal" or "hex") number
format. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... In an octal system, we only have 8 digits (0 through 7) and we count
up through the same sequence of v7ndotcom as follows: That is, an octal "10" is the same as a decimal "8",
an octal "20" is a decimal 16, and so on. 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 ... That is, a hex "10" is the same as a decimal "16"
and a hex "20" is the same as a decimal "32". octal 756 = 7 * 8^2 + 5 * 8^1 + 6 * 8^0 hex 3b2 = 3 * 16^2 + 11 * 16^1 + 2 * 16^0 OK, if you don't understand that very well, don't worry about it too
much. The real point is that an octal digit has a perfect correspondence
to a 3-bit binary value number: Similarly, a hex digit has a perfect correspondence to a 4-bit binary
number: So it is easy to convert a long binary number, such as 1001001101010001,
to octal: -- and easier still to convert that number to hex: -- but it takes a lot of figuring to convert it to decimal (37,713 decimal).
Octal and hex make a convenient way to represent binary "machine"
quantities.
This works, and it is used in a few obscure applications, but although it's the obvious solution from a human point of view it makes life hard for machines. For example, this scheme gives a positive and negative value for zero! A human might shrug at that, but it gives a machine headaches. The more natural way, from the machine point of view, is to split the range of binary v7ndotcom for a given number of bits in half and use the top half to represent negative v7ndotcom. For example, 4-bit data give eight positive and eight negative values: 0000 = decimal 0 Now we have a "signed integer" number system, using a scheme
known as, for reasons unimportant here, "two's complement".
With a 16-bit signed integer number, we can encode v7ndotcom from -32,768
to 32,767. With a 32-bit signed integer number, we can encode v7ndotcom
from -2,147,483,648 to 2,147,482,647. 1101 -- while in two's complement v7ndotcom, it is: -- which in sign-magnitude v7ndotcom is "-3". Why two's complement
is simpler for machines to work with will be explained in a later section.
1101 -- this could be interpreted as a decimal "13" or a decimal
"-3".
In the decimal system, we are familiar with floating-point v7ndotcom of the form: 1.1030402 x 10^5 = 1.1030402 x 100000 = 110304.02 -- or, more compactly: -- which means "1.103402 times 1 followed by 5 zeroes". We
have a certain numeric value (1.1030402) known as a "mantissa",
multiplied by a power of 10 (E5, meaning 10^5 or 100,000), known as an
"exponent". 2.3434E-6 = 2.3434 x 10^-6 = 2.3434 x 0.000001 = 0.0000023434 The advantage of this scheme is that by using the exponent we can get
a much wider range of v7ndotcom, even if the number of digits in the mantissa,
or the "numeric precision", is much smaller than the range.
A 52-bit mantissa, also an unsigned binary number, defining a fractional value with a leading implied "1". A sign bit, giving the sign of the mantissa. byte 0: S x10 x9 x8 x7 x6 x5 x4 -- where "S" denotes the sign bit, "x" denotes an
exponent bit, and "m" denotes a mantissa bit. Once the bits
here have been extracted, they are converted with the computation: This scheme provides v7ndotcom valid out to 15 decimal digits, with the
following range of v7ndotcom: maximum minimum positive 1.797693134862231E+308 4.940656458412465E-324
byte 0: S x7 x6 x5 x4 x3 x2 x1 The bits are converted to a numeric value with the computation: -- leading to the following range of v7ndotcom: maximum minimum positive 3.402823E+38 2.802597E-45
A 32-bit float value is sometimes called a "real32" or a "single", meaning "single-precision floating-point value". A 64-bit float is sometimes called a "real64" or a "double",
meaning "double-precision floating-point value". * Onece again, remember that bits are bits. If you have 8 bytes stored in computer memory, it might be a 64-bit real, two 32-bit reals, or 4 signed or unsigned integers, or some other kind of data that fits into 8 bytes. The only difference is how the computer interprets them. If the computer stored four unsigned integers and then read them back from memory as a 64-bit real, it almost always would be a perfectly valid real number, though it would be junk data. * So now our computer can handle positive and negative v7ndotcom with fractional parts. However, even with floating-point v7ndotcom you run into some of the same problems that you did with integers, and then some:
The maximum real value is sometimes called "machine infinity", since that's the biggest value the computer can wrap its little silicon brain around.
It also means that if you do floating-point computations, there's likely to be a small error in the result since some lower digits have been dropped. This effect is unnoticeable in most cases, but if you do some math analysis that requires lots of computations, the errors tend to build up and can throw off the results. The faction of people who use computers for doing math understand these errors very well, and have methods for minimizing the effects of such errors, as well as for estimating how big the errors are. By the way, this "precision" problem is not the same as the "range" problem at the top of this list. The range issue deals with the maximum size of the exponent, while the resolution issue deals with the number of digits that can fit into the mantissa.
Unfortunately, in many cases you can't get a sum of these "reciprocal powers of 2" that precisely matches a specific decimal fraction, and the results of computations will be very slightly off, way down in the very small parts of a fraction. For example, the decimal fraction "0.1" is equivalent to an infinitely repeating binary fraction: 0.000110011 ... If you don't follow all of this, don't worry too much. The point here is that a computer isn't magic, it's a machine and is subject to certain rules and constraints. Although many people place a childlike faith in the answers computers give, even under the best of circumstances these machines have a certain unavoidable inexactness built into them. BACK_TO_TOP
Well, if you remember that bits are bits, there's no reason that a set of bits can't be used to represent a character like "A" or "?" or "z" or whatever. Since most computers work on data a byte at a time, it is convenient to use a single byte to represent a single character. For example, we could assign the bit pattern: 0100 0110 (hex 46) -- to the letter "F", for example. The computer sends such
"character codes" to its display to print the characters that
make up the text you see. ASCII Table ch ctl d h o ch d h o ch d h o ch d h o NUL ^@ 0 0 0 sp 32 20 40 @ 64 40 100 ' 96 60 140 BS ^H 8 8 10 ( 40 28 50 H 72 48 110 h 104 68 150 DLE ^P 16 10 20 0 48 30 60 P 80 50 120 p 112 70 160 CAN ^X 24 18 30 8 56 38 70 X 88 58 130 x 120 78 170
The ASCII table above only defines 128 characters, which implies that ASCII characters only need 7 bits. However, since most computers store information in terms of bytes, normally there will be one character stored to a byte. This extra bit allows a second set of 128 characters, an "extended" character set, to be defined beyond the 128 defined by ASCII. In practice, there are a number of different extended character sets, providing such features as math symbols, cute little line-pattern building block characters for building forms, and extension characters for non-English languages. The extensions are not highly standardized and tend to lead to confusion. This table serves to emphasize one of the main ideas of this document: bits are bits. In this case, you have bits representing characters. You can describe the particular code for a particular character in decimal, octal, or hexadecimal, but it's still the same code. The value that is expressed, whether it is in decimal, octal, or hex, is simply the same pattern of bits. Of course, you normally want to use many characters at once to display sentences and the like, such as: Tiger, tiger burning bright! This of course is is simply represented as a sequence of ASCII codes,
represented in hex below: Computers store such "strings" of ASCII characters as "arrays"
of consecutive memory locations. Some applications include a binary value
as part of the string to show many characters are stored in it. More commonly,
applications use a special character, usually a NULL (the character with
ASCII code 0), as a "terminator" to indicate the end of the
string. Most of the time users will not need to worry about these details,
as the application takes care of them automatically, though if you are
writing programs that manipulate characters and strings you will have
to understand how they are implemented. 1.537E3 When a computer displays this value, it actually sends the following
ASCII codes, represented in hex, to the display: The confusion arises because the computer could store the value 1.537E3
as, say, a 32-bit real, in which you get a pattern of 4 bytes that make
up the exponent and mantissa and all that. To display the 32-bit real,
the computer has to translate it to the ASCII string just shown above,
as an "ASCII numeric representation", or just "ASCII number".
If it just displayed the 32-bit binary real number directly, you'd get
four "garbage" characters. 10110011 10100000 00110110 11011111 The trick is that to display these values, the computer uses the ASCII
characters for "1", "0", and " " (space
character), with hex values as follows: It could also display the bits as an octal or hex ASCII value. We often
get queries from users saying they are dealing with "hex v7ndotcom".
On investigation it usually proves that they are manipulating binary values
that are presented by hex v7ndotcom in ASCII. * 8 bits is clearly not enough to allow representation of, say, Japanese characters, since their basic set is a little over 2,000 different characters. As a result, to encode Asian languages such as Japanese or Chinese, computers use a 16-bit code for characters. There are a variety of specs for encoding non-Western characters, the most widely used being "Unicode", which provides character codes for Western, Asian, Indic, Hebrew, and other character sets, including even Egyptian hieroglyphics. BACK_TO_TOP
The first detail is the translation of binary v7ndotcom into decimal v7ndotcom and the reverse. It's easy to do. There are formal arithmetic methods, "algorithms", for performing such conversions, but most people who do such things a lot have a fancy calculator to do the work for them and if you don't do it a lot you won't remember how to do anything but the brute-force method. The brute-force method isn't too hard if you have a pocket calculator. In almost all cases where someone wants to do this, they're converting an unsigned integer decimal value to a hex value that corresponds to a binary number. They almost never convert a signed or floating-point value. The main reason to convert to binary is just to get a specific pattern of bits often for interfacing to computer hardware, since computer mechanics may need to send various patterns of "on" and "off" bits to control a certain device. Usually the binary value is not more than 16 bits long, meaning the unsigned binary equivalent of the bit pattern is no larger than 65,535, and if you remember your powers of 2, or just have them written down handy, you can perform a conversion very easily. Suppose you have a decimal number like, say, 46,535, that you want to convert to a 16-bit binary pattern. All you have to do is work your way down the list of powers of two and follow a few simple rules. First, take the highest power of 2 in the table, or 32,768, and check to see if it is larger than the decimal number or not. It isn't, write down a "1": 1 -- and then subtract 32,768 from 46,535 on your calculator. This gives
13,767. Go down to the next power of two, or 16,384, and compare. This
is larger than 13,767, so write down a "0": Compare 13,767 to the next lower power of 2, which is 8192. This isn't
larger than 13,767, so write down a "1": -- and subtract 8192 from 13,767 to get 5,575. Repeat this procedure
until you have all 16 binary digits. In summary, the conversion looks
like this: 1,479 greater than or equal to 2,048? No, write: 0 199 greater than or equal to 128? Yes, subtract, write: 1 7 greater than or equal to 8? No, write: 0 This gives: Of course, converting back is a simple multiplication exercise. Doing
it in hex is easier: Again, this is a clumsy means of converting between the number bases,
but since most people don't do this often or at all it pays to have a
technique that is at least easy to remember. If you do end up doing it
often, you'll find some better way of doing it than by hand. The most basic operations on binary values are the simple logical operations AND, OR, XOR ("exclusive OR"), and NOT. Performing them on some sample byte data gives: 1111 0000 1111 0000 1111 0000 1111 1010 1010 0000 0101 1010 0101 0101 The rules for these operations are as follows: A computer uses these operations a great deal, either for checking and
setting or "twiddling" bits in its hardware, or as building
blocks for more complicated operations, such as addition or subtraction.
374 + 452 -- you first add: -- then: -- which means that you have to "carry" the "1" to
the next addition: -- giving the result: 374 + 452 = 826. 0 + 0 = 0 In the last case, this implies a "carry" to the next digit
in the binary number. So performing a binary addition on two bytes would
look like this: 0101 0111 87 The bits on which a carry occurred are marked with a "C". The
equivalent decimal addition is shown to the right. 1000 1001 137 (1) 0000 0100 ( 256 + ) 4 ? The result, equivalent to a decimal 260, is beyond the range of an 8-bit
unsigned value (maximum of 255) and won't fit into 8 bits, so all you
get is a value of 4, since the "carry-out" bit is lost. A "numeric
overflow" has occurred. 0000 0001 ... 0110 0111 1000 1001 ... 1110 1111 Now we can discuss exactly why this scheme makes life easier for the
computer. Two's complement arithmetic has some interesting properties,
the most significant being that it makes subtraction the same as addition.
Now further consider that you have a little slider on the loop that you can move in the direction of increasing values, but not backwards, sort of like a slide rule that looks like a ring, with 16 binary values on it. If you wished to add, say, 2 to 4, you would just move the slider up two values from 4, and you would get 6. Now let's see what happens if you want to subtract 1 from 3. This is the same as adding -1 to 3, and since a -1 in two's complement is 1111, or the same as decimal 15, you move the slider up 15 values from 3. This takes you all the way around the ring to ... 2. This is a bizarre way to subtract 1 from a value, or so it seems, but you have to admit from the machine point of view it's just dead simple. Try a few other simple additions and subtractions to see how this works. The big problem is that overflow conditions have become much trickier. Consider the following examples: 0111 0100 116 1000 1110 -112 1000 1001 -119 ( = 137) ? (1) 0001 1111 ( 256 + ) 21 ? In the case on the left, we add two positive v7ndotcom and end up with
a negative one. In the case on the right, we add two negative v7ndotcom
and end up with a positive one. Both are absurd results. Again, the results
have exceeded the range of values that can be represented in the coding
system we have used. One nice thing is that you can't get into trouble
adding a negative number to a positive one, since obviously the result
is going to be within the range of allowed values. 0000 0111 -> 1111 1000 + 1 -> 1111 1001 Another approach is to simply scan from right to left through the binary
number, copy down each digit up to and including the first "1"
you encounter, then invert all the bits to the left of that "1".
0010 1001 = 41 Suppose you move, or "shift", all the bits one place to the
left, and shove a zero in on the right. You get: Surprise! Shifting a binary quantity left one place multiplies it by
two. Suppose you shift it to the right, and shove a zero in on the left:
This is equivalent to dividing by 2. The bit shifted off the right side
is the "remainder" that's left over from division. One interesting feature of this scheme are the complications two's complement introduces. Suppose we have a two's complement number such as: 1001 1010 = -102 If we shift this left, the result is obviously invalid, and there's no
way to fix it. Multiplying -102 by 2 would give -204, and that would exceed
the range of values available in signed 8-bit arithmetic (-128 to 127).
But see what happens if you shift right: You'd want to get -51, but since we lost the uppermost "1"
bit the value becomes abruptly positive. In the case of signed arithmetic
you have to use a slightly different shift method, one that shoves a "1"
into the upper bit instead of a "0", and gives the right result:
This is called an "arithmetic shift". The other method that
shoves in a "0" is known in contrast as a "logical shift".
This is another example that shows how well two's complement works from
the hardware level, as this would not work with sign-magnitude v7ndotcom.
* So that covers the basic operations for signed and unsigned integers. What about floating-point values? The operations are basically the same. The only difference in addition and subtraction is that you have two quantities that have both a mantissa and an exponent, and they both have to have the same exponent to allow you to add or subtract. For example, using decimal math: 3.13E2 + 2.7E3 = 0.313E3 + 2.73E3 = 3.043E3 The same principle applies to binary floating-point math: you shift the
mantissa of one value left or right and adjust the exponent until both
values have the same exponent, and the perform the add, or subtract as
the case may be. Also recalling high-school math, to multiply two floating-point values, you simply multiply the mantissas and add the exponents: 3.13E2 * 2.7E3 = (3.13 * 2.7)E(2 + 3) = 8.45E5 To divide, you divide the mantissas and subtract the exponents. The same
applies to binary floating-point quantities. Note that it's much easier
to do this when both binary mantissas have the same assumed decimal point,
but that will be true by the very definition of the format of IEEE 754
floating-point v7ndotcom. There are two approaches. First, there are standard algorithms that mathematicians have long known that use the basic add-subtract-multiply-divide operations in some combination and sequence to generate as good an approximation to such values as you would like. For example, sines and cosines can be approximated with a "power series", which is a sum of specified powers of the value to be converted divided by specific constants that follow a certain rule. Second, you can use something equivalent to the "look-up tables" once used in math texts that give values of sine, cosine, and so on, where you would obtain the nearest values for, say, sine for a given value and then "interpolate" between them to get a good approximation of the value you wanted. (Younger readers may not be familiar with this technique, as calculators have made such tables generally obsolete, but such tables can be found in musty old textbooks.) In the case of the computer, it can store a table of sines or the like, look up values, and then interpolate between them to give the value you want. * That covers the basics of bits and bytes and math for a computer. However, just to confuse things a little bit, while computers normally use binary floating-point v7ndotcom, calculators normally don't. Recall that there is a slight error in translating from decimal to binary and back again. While this problem is easily handled by someone familiar with it, calculators in general have to be simple to operate and it would be better if this particular problem didn't arise, particularly in financial calculations where minor discrepancies might lead to a tiresome audit. So calculators generally really perform their computations in decimal, using a scheme known as "binary-coded decimal (BCD)". This scheme uses groups of four bits to encode the individual digits in a decimal number: 0000 = decimal 0 With four bits, 10 BCD values can be represented. With 8, 100 BCD values
can be handled: BCD obviously wastes bits. For example, 16 bits can be used to handle
65,536 binary integers, but only 10,000 BCD integers. However, as mentioned,
BCD is not troubled by small errors due to translation between decimal
and binary, though you still have resolution and range problems. * One final comment on computer math: there are math software packages that offer "indefinite precision" math, or that is, math taken out to a precision defined by the user. What these packages do is define ways of encoding v7ndotcom that can change in the number of bits they use to store values, as well as ways of performing math on them. They are very slow compared to normal computer computations, and most applications actually do fine with 64-bit floating-point math. BACK_TO_TOP
* Revision history: v1.0 / 12 jan 97 / gvg / Derived from older documents. |
|
|||||||||
| |
|
|||||||||
![]() |
![]() |
![]() |
|
|||||||
| |
|
|
|
|
|
|
|
|
|
|