My Photo
Location: Downham Market, Norfolk, United Kingdom

Saturday, December 17, 2016

On the numeric precision provided by programming languages

Probably the most common floating-point format supported by modern programming languages is what C calls a double, that is a 64-bit data type consisting of a 53-bit mantissa, an 11-bit exponent and a sign bit (yes, I know that's 65 bits in all but as is common with floating-point formats the most-significant bit of the mantissa is assumed to be a 1 and isn't stored). More formally this format is standardised in IEEE 754 as binary64.

Some languages provide this as one of a number of different options, for example in C both float (32-bits) and double (64-bits) floating-point formats are commonly supported. In others double is the only floating-point format available; Liberty BASIC is an example of this. The precision provided by this format is sufficient for a wide range of real-world applications; to put it in perspective it corresponds to an uncertainty of about one millimetre in the distance between the Earth and Neptune!

However there is of course a big difference between precision and accuracy. One particularly important issue is that a typical calculation consists of many cascaded steps, and (depending on the kinds of operations performed) at each of these steps there is a possibility that an error of half-an-LSB or more is introduced. After a large number of steps the potential accumulated error may be very much greater than the precision of a double might lead you to expect. For this reason it is common for the 'floating point unit' (FPU) of a modern CPU to work internally to an even higher precision, for example a format with a 64-bit mantissa is often used. So long as intermediate values are held to this higher precision there is a greater likelihood of the final result, represented as a double, having an accuracy commensurate with its precision.

When the highest precision floating-point data type provided by the programming language itself is a double, this desirable outcome can be achieved only when the programmer specifies the calculation in his source code in such a way that it can take place entirely within the FPU, and when the language interpreter/compiler is designed to achieve that. That may be the case in a language like C, when the computation is described in the form of a single expression and the compiler generates code that keeps intermediate values on the FPU stack, but that will not always be the case.

If the programmer's source code splits the calculation into a number of discrete steps, the compiler may not be able to recognise that it could combine them into one computation (and anyway the programmer may want to see the intermediate values to help him debug the code). Even more significantly, an interpreted programming language may simply not be capable of keeping intermediate values in the FPU and may always transfer them temporarily to whatever representation it uses internally for floating-point values. If this is a double a truncation may occur after each step of the calculation.

Of course this may not be important. Many calculations won't need anything like the precision of a 53-bit mantissa and even the loss of several bits of accuracy as a result of multiple truncations may not matter. But sometimes it may, and indeed the programmer may not even know what accuracy is required (if you're not a mathematician - and perhaps even if you are! - it's only too easy to perform a computation in such a way that it is unduly sensitive to the accuracy of an intermediate term, without even realising it).

So ideally there needs to be a way of overcoming the limitation that intermediate values need to be held in the FPU, so that the full accuracy of which the CPU is capable can be taken advantage of even when those values are transferred elsewhere (e.g. to a 'variable'). Some C compilers (notably GCC, but not Microsoft's compilers) support the 80-bit 'temporary float' format as a native data type, when the underlying CPU provides access to it, which solves the problem. By declaring a variable as a long double it can hold an intermediate value with the same precision as that of the FPU's internal registers.

Because the problem is potentially even more acute with an interpreted language, I was very keen that version 6 of BBC BASIC for Windows should have that capability too. Indeed, the default numeric data type is a variant which can contain either an 80-bit floating-point value or a 64-bit integer (it is noteworthy that an 80-bit float can represent a 64-bit integer value precisely). So you can perform cascaded floating-point calculations in BB4W safe in the knowledge that the final result is as accurate as it could be given the capabilities of the CPU/FPU. I don't think BBC BASIC is unique in this respect, but it is certainly very unusual for an interpreted BASIC to provide native support for 80-bit floats and it is something that I like to point to as a valuable feature.

Given what I have said above, it is irritating that Microsoft decided not to support 80-bit extended-precision floats in their compiler family. You can declare a variable to be of long double type, but in fact that is treated by the compiler as a synonym of double and the precision is no different. That can make it difficult or impossible to produce results which are as accurate as they theoretically could be. Even more surprisingly the ARM CPU family provides no access at all to 80-bit float values, even if they are used temporarily by the FPU (and I'm not sure whether that is the case or not). Unfortunately that means even GCC treats long double as a synonym of double when targeting ARM.

On the efficiency of word-extraction functions

A number of BASIC dialects provide a function which returns a single word from a string, selected according to its ordinal position. Typically the string is split either at a space character or at a specified delimiter. One such dialect is Liberty BASIC, in which the word$() function does this. However BBC BASIC has never supported this functionality, either natively or (in the case of BBC BASIC for Windows) in a library. This begs the question: "why not?".

It all comes down to efficiency. BBC BASIC, despite being a traditional interpreted language, has a reputation for relatively fast execution. Whilst in part this results from careful design of the interpreter (which is coded in assembly language) just as important is the choice of what functionality to provide. Consider this piece of Liberty BASIC code:

t$ = "The quick brown fox jumps over the lazy dog."
for i = 1 to 9
  print word$(t$, i)

This works fine, but when you think about exactly what is going on under the hood not everything in the garden is lovely. Each time round the loop (except the first) the word$() function has to repeat everything it did on every previous iteration!

So on the second time round, when it is finding the 2nd word in the string, it can only do that by first finding the first word. And on the third time round, it is having to do again everything it did on both the first and second iterations!

By the time it's got to the 9th time round it's had to find the 1st word 9 times, the 2nd word 8 times and the 3rd word 7 times (and so on). This is hopelessly inefficient: just imagine trying to extract every word in an 80,000-word novel this way.

It makes much more sense to split the string into its individual words in one operation, then there is no wasteful repetition. And that's what the function that BBC BASIC does provide (FN_split in the STRINGLIB library) is designed to do.

If for some reason you cannot live without the functionality of LB's word$() there is a way of improving its efficiency. The code listed below splits the entire string into its component parts on the first call, then on subsequent calls it checks whether the same string has been supplied. If it has it doesn't bother to repeat the splitting operation but just returns the word from the previously created array:

INSTALL @lib$+"stringlib"

t$ = "The quick brown fox jumps over the lazy dog."
FOR i% = 1 TO 9
  PRINT FNword$(t$, i%)

DEF FNword$(a$, n%)
PRIVATE a$(), p$, c%
IF p$<>a$ p$ = a$ : c% = FN_split(p$, " ", a$())
IF n%<1 OR n%>c% THEN = ""
= a$(n%-1) 

The price you pay for the improved speed is memory usage, because the array must be retained from one call to the next. What's more the array's memory is leaked each time the function is called with a string containing more words than the previous maximum.