My Photo
Location: Downham Market, Norfolk, United Kingdom

Sunday, May 19, 2013

Arguably, the most serious incompatibility between LB Booster and 'genuine' Liberty BASIC is the lack of support for huge strings. The maximum string length in LB4 isn't quoted anywhere, as far as I know, but is probably limited only by the memory available to BASIC, which is usually stated as being in the region of 70 Mbytes (although presumably one shouldn't expect it to work properly if a single string uses a significant proportion of that memory). In any case it is a lot bigger than BBC BASIC's 64 Kbytes (65535 characters) limit.

The ability to have multi-megabyte strings isn't necessarily desirable, because it can encourage the use of inappropriate algorithms. For example suppose one wants to count the number of lines in a text file. The traditional way, and the 'natural' way in a language which doesn't support huge strings, is to read the file sequentially, incrementing a counter for each line read. Only one line needs to be buffered in memory at a time, and the process is quite efficient.

In a language like Liberty BASIC, however, there is a temptation to read the entire file into a string - because you can - and then to scan the string for line delimiters (for example using the INSTR function) in order to count the lines. That will work, but it is highly inefficient of memory usage and CPU time. Firstly it involves holding the entire file in memory, which may be many megabytes, and secondly the INSTR function may need to scan the string from the beginning every time (or at least make a copy of the string first).

In a simple benchmark using LB 4.04, counting the lines in a 1 Megabyte text file with the first (sequential read) approach took about one second and with the second (whole file loaded into a string) approach took about 130 seconds. Repeating the test using LBB (for how that was possible, read on!) gave timings of 0.2 seconds and 20 seconds respectively. This illustrates the kind of speed improvement possible by using LBB, but in both cases loading the entire file into a string was about one hundred times slower than the sequential approach!

However, whilst it is possible to argue that supporting huge strings isn't always desirable, a significant proportion of LB programs do rely on that capability. To improve compatibility there is no alternative but for LB Booster to do the same. In investigating how that might be possible I first looked at the practicality of adapting the HUGESLIB library, which provides limited support for huge strings in BBC BASIC for Windows. Although in principle having the required functionality, it would involve identifying which string variables need to be able to hold huge strings (in itself non-trivial) and then substituting calls to library functions whenever those strings are manipulated.

I eventually decided that such an approach would be too complicated and too likely to suffer from obscure bugs. The only other possibility was to adapt BB4W itself to handle huge strings natively. Whilst clearly far from trivial, in fact such a modification wasn't desparately difficult for a couple of reasons. Firstly, I had already tackled increasing the maximum string length once before, when it was increased from 255 bytes (the limit in Acorn versions of BBC BASIC) to 65535 bytes, and the techniques used on that occasion are, in principle at least, fairly readily extended to support even longer strings. Secondly, because the 64K limit implied the string length being stored as a 16-bit quantity, the fact that the CPU's native registers are in fact 32-bits made it quite easy to adapt existing code (typically where the string length had previously been stored in the 16-bit CX register it could instead be stored in the 32-bit ECX register).

So it was that I modified the BB4W interpreter to support huge strings, specifically for the benefit of LBB. It's quite likely that the resulting performance is not as good as it might have been had huge strings been anticipated from the start, since in some cases the algorithms don't scale too well, but assuming that most strings will still be shorter than 64 Kbytes a typical program shouldn't experience much impairment. The version of LB Booster with this modification is, at the time of writing, in a beta test phase and is what was used to run the benchmark program.

Of course this raises a particular difficulty in respect of BB4W - should I make the 'huge string' version available to BBC BASIC programmers as well as to Liberty BASIC programmers? That is a much more tricky proposition, because unlike LBB - where the huge strings improve compatibility - the change from 16-bit to 32-bit lengths will inevitably impair compatibility and break some existing BBC BASIC programs. For example programs which assume the 'string descriptor' is 6 bytes long won't work when it increases to 8 bytes, and programs which include assembler code to manipulate strings almost certainly would need to be changed.

So this is quite a dilemma for which I haven't found a satisfactory resolution. I could simply not make the huge-string version available to BBC BASIC programmers at all, but that would seem rather unfair. Or I could upgrade BB4W and document the program modifications necessary (it's happened before, after all, when the strings were changed from 255 to 65535 characters). Or I could somehow make both versions available, but that greatly complicates version control, maintenance, automatic upgrades etc. There's no perfect answer!

Sunday, September 11, 2011

"If you can't beat them, join them" is what they say, and that's exactly what I've done in respect of Liberty BASIC! In an earlier blog post I compared BBC BASIC for Windows and Liberty BASIC, and explained my reasoning for believing BBC BASIC to be the better of the two. But that's not to say I can't see the attraction of LB to some people, particularly in having support for Windows GUI features as a standard part of the language rather than being provided in a library.

I've argued before, and still very much believe, that building in a rich set of GUI features as standard can result in a bloated language. This is especially true in the case of an interpreted language when every 'compiled' executable needs to include support for all those features, whether the user's program needs them or not. I've always suspected that this explains, at least in part, the huge size of Liberty BASIC executables (not far short of 3 Megabytes, however small the program).

But whilst the principle of providing GUI support by means of libraries (which is what BBC BASIC does) is sound, in practice there hasn't actually been support for the full range of facilities provided by Liberty BASIC, at least not in a straightforward way. Certainly there are BB4W libraries for the main Windows GUI elements, but LB goes somewhat further, for example you can change the colours of standard Windows controls (whether that's a good idea is arguable) and you can arrange that buttons are automatically relocated when a window is resized.

So I've recently remedied this by writing a BBC BASIC for Windows library LBLIB which makes available the full range of LB features to a BBC BASIC program. Indeed, it does so as far as possible in a way that is fully compatible with Liberty BASIC, and this has made possible a brand-new product: Liberty BASIC Booster!

But I'm getting ahead of myself in the story. Some while ago I wrote an automated translator from QBASIC (or Quick Basic) to BBC BASIC - QB2BBC. Unlike many translators this attempts to perform a 'perfect' translation that requires no user tweaking to achieve identical functionality to the original. The down-side of producing such an accurate translation is that the code it generates is often quite unlike what a human would write, and it doesn't give a good indication of what well-written BBC BASIC code looks like.

At the heart of QB2BBC is a parsing engine which breaks down the QBASIC program to its fundamental elements and then constructs the BBC BASIC program from them. It occurred to me some time ago that this parsing engine could be adapted quite easily to translate other languages into BBC BASIC, one obvious candidate being Liberty BASIC. So earlier this year I started to look quite seriously at the possibility of writing a Liberty BASIC to BBC BASIC translator (tentatively to be called LB2BBC).

But of course there was a big problem. With QBASIC originating from the 1980s, in the era of CP/M and MS-DOS, pretty much every feature it provides has a direct or near equivalent in BBC BASIC. Whilst not always straightforward, the task of QB2BBC was primarily one of translation rather than emulation (the only major exception being QBASIC's music capabilities, which are emulated at run-time by the QBLIB library). With Liberty BASIC however, translating the language syntax to BBC BASIC was only a small part of the challenge; it left the GUI facilties (principally windows, controls, graphics and sprites) completely unsupported.

This is where we complete the circle. By combining the translation capabilities of LB2BBC with the emulation capabilities of LBLIB it becomes possible to run most Liberty BASIC programs, without actually needing Liberty BASIC itself! In consultation with some leading Liberty BASIC advocates it became clear that there might be a market for this, not as as translator to BBC BASIC but as a 'Liberty BASIC Booster'. To the user it would appear as a way to speed up LB programs (up to ten times in some cases) and to create compact single-file executables, without it being obviously apparant that BBC BASIC was at work 'under the hood'.

The compatibility of LBB with LB isn't perfect. As I noted at the end of my article comparing the two BASICs there are two areas in which LB provides a capabilility that BBC BASIC doesn't: support for huge strings and arbitrary-precision integer arithmetic. Therefore programs relying on either of these features cannot run under LBB.

Some may find it superficially strange that I should be supporting Liberty BASIC in this way. One reason is that whilst LBB doesn't directly benefit BBC BASIC, neither does it significantly damage it: I wouldn't expect that any users of LBB might have been persuaded to convert to BB4W instead. Secondly, LBB demonstrates the amazing power and flexibility of BBC BASIC for Windows; with how many other high-level programming languages could it have been achieved? But mostly it's been a fascinating technical challenge, pushing my programming skills to the limit.

Thursday, July 22, 2010

On the same topic as last time, there's another good example of how assembling code at run-time can provide a worthwhile advantage. The x86 processor's division instructions are particularly slow, even on the newest CPUs, and it's common for modern compilers to convert division by a constant to a multiplication followed by a shift. If the multiplicand (which is effectively the reciprocal of the divisor, multiplied by a power of two) and the shift count are chosen carefully - and in some cases a final adjustment made - the result is identical to the division instruction but the speed is much faster.

The rub, of course, is that this technique is normally only applicable to division by a constant which is known at compile time. If you don't know what value you want to divide by until run-time you've got to use a conventional division instruction (strictly speaking if you know the divisor is one of a set of predetermined values you could select from a matching set of routines, or look up the multiplicand and shift count from a pair of tables, but that could eat up most if not all the speed advantage).

There's no such limitation in the case of BBC BASIC. Because the code is assembled at run-time the multiplication-plus-shift trick can be used for any divisor, so long as it remains unchanged for that particular run. For example you might want to divide by the screen width, a value which is constant at run-time but usually not known at compile time. The application would need to contain code for calculating the multiplicand and shift count from the divisor, but that is used only once during the initialisation phase of the program (in practice one would probably use an 'assembler macro' to do it).

Wednesday, December 30, 2009

BBC BASIC allows you to incorporate assembly language code in your BASIC program, typically for use when interpreted BASIC isn't fast enough or for specialised purposes such as coding interrupt service routines. Using a hybrid of BASIC and assembler code is a powerful technique, allowing you to use BASIC for the bulk of the program (typically including the user interface) and assembler for the speed-critical parts (which are likely to be a relatively small proportion of the whole).

Needless to say the assembler provided in BBC BASIC is specific to the particular processor on which it runs. So the original BBC Micro BASIC incorporates a 6502 assembler, Acorn's BASIC for RISC OS machines incorporates an ARM assembler, BBC BASIC (Z80) (e.g. for CP/M or on the Z88) incorporates a Z80 assembler and BBC BASIC (86) for MS-DOS incorporates a 16-bit x86 assembler. BBC BASIC for Windows incorporates a full 32-bit x86 (IA-32) assembler, including support for floating point and MMX instructions.

Whilst support for 'inline' assembly language code is not unusual in programming languages (for example it's a standard feature of C), BBC BASIC provides it in a unique way. Conventionally, any assembler code included in a program is converted to machine code (i.e. assembled) at 'compile' time and incorporated in the final executable as binary data. In contrast, BBC BASIC incorporates the assembler 'source' code in the executable (usually in a compressed format) and assembles it to machine code when the program is run.

Let that sink in for a moment. If the source code is assembled at run-time, it means the BBC BASIC executable must itself contain an assembler - which indeed it does (albeit a small and very fast one). As far as I know no other programming language creates executables which contain an assembler, let alone a full IA-32 assembler.

In many cases this very different way of dealing with embedded assembler code won't have any practical consequences - the end result will be equivalent to the more conventional method. However it makes possible some things which would otherwise be very difficult or messy to achieve. In particular it allows the assembled machine code to vary according to some condition which is determined at run-time.

Why might that be valuable? Suppose, for example, that the assembler code accesses one or more memory locations, which means that the addresses of those locations must be known. Typically, those addresses will vary from one run of the program to the next, or between one PC and another, so they aren't known when the executable is created (i.e. when the program is 'compiled'). Conventionally this means that the addresses can't be incorporated as 'constants' within the assembler code, instead they must be passed to the code as parameters (e.g. on the stack) when it is run.

However in the case of BBC BASIC things are very different. The assembler source code is converted to machine code at run time, so it can usually be arranged that the addresses are known by then. As a result, the 'variable' memory addresses (which change from one run to the next) can be incorporated in the assembled machine code as 'constants', making the code both simpler and faster.

To take a practical example, suppose the assembler code needs to load the eax register from a memory address that is known only at run-time. In a conventional programming language the assembler source code will look something like this (where the address is passed to the assembler code on the stack):

mov ebp,esp ; get pointer to stack frame
mov ebx,[ebp+offset] ; get address
mov eax,[ebx] ; load eax from memory

But in the case of BBC BASIC the code can simply be as follows, because the address is already known:

mov eax,[address] ; load eax from memory

If there are many such memory references the impact on execution speed can be considerable. Similar considerations may apply whenever the code can be simplified by incorporating values as 'immediate constants' rather than 'variables'.

There's another valuable benefit of BBC BASIC assembling source code at run-time. In Microsoft Windows different processes run in separate (virtual) address spaces and are effectively quite well isolated from each other (one process cannot see another process's memory directly). However it is possible for one process to 'inject' code into another process and then run it there. The snag is that the address in the remote process at which the code must be loaded, and executed, isn't known until run-time.

This poses quite a problem for a conventional language, as IA-32 machine code isn't (in general) 'position independent' and so must be assembled (and linked, if appropriate) to run at a specific address. Typically the assembler code has to be written in such a way that it can be 'relocated' at run time - by 'patching' it using, for example, C code - according to the address at which it will eventually be loaded in the remote process. This is potentially very messy, prone to error and difficult to debug.

With BBC BASIC this is all very much easier. Because, by the time the code is assembled, the address at which it must eventually run is known, there is no requirement for it to be 'patched' or 'relocated' (the BBC BASIC assembler has a specific feature whereby the code can be assembled so that it will run at an address different from that at which it is initially stored). This makes it exceptionally well-suited to injecting code into another process.

Saturday, November 08, 2008

There's little doubt that the main competitor to 'BBC BASIC for Windows' is Carl Gundel's 'Liberty Basic'. Superficially the two products have a lot in common:

* They are both 'Windows only'.
* Their IDEs have very similar interfaces and capabilities.
* Both provide direct access to the Windows API and other DLLs.
* Both are interpreters (neither compiles to native code), in particular this allows them to support an EVAL function.
* Both use a 'run-time engine' to execute a tokenised version of the user's BASIC program.
* Both allow BASIC programs either to be executed directly from the IDE, or to be converted to a self-contained executable.

I have even been described in one blog as "a sort of British Carl Gundel"!

However when one looks under the surface there are in fact major differences between the two BASICs. Perhaps the most important is their contrasting approaches to providing capabilities beyond those of 'standard' (i.e. traditional) BASICs. Such capabilities include support for Windows GUI components (menus, dialogue boxes and so on), sorting of data arrays, sprite commands etc.

Liberty Basic builds these capabilities into the language itself. It therefore needs significantly more keywords than, and is more complex than, BBC BASIC. By contrast 'BBC BASIC for Windows' is a 'lean and mean' language: the interpreter provides only the core set of BASIC language features. Support for capabilities such as those mentioned above is provided by means of libraries (surprisingly, Liberty Basic doesn't support code libraries at all).

In this respect you can say that BBC BASIC has a lot in common with the C language, which also relies greatly on libraries (there are other ways in which BBC BASIC has similarities to C, which I will touch on later).

Perhaps the most striking consequence of this fundamental difference between the BASICs is in the size of their 'compiled' executables. The 'BBC BASIC for Windows' run-time engine is tiny (less than 64 kBytes) and its standalone executables - consisting of a single EXE file - are often less than 100 kBytes in size. Liberty Basic's executables, on the other hand, are never smaller than about 2.7 Mbytes (uncompressed) and consist of around ten separate files.

Carl Gundel, perhaps not surprisingly, defends the Liberty Basic executables on the grounds that modern PCs have so much memory and so much disk space that even 3 Mbytes isn't "large". However, compact executables really can have significant benefits, even on such systems.

For a start a small interpreter will tend to run more quickly, because it makes more efficient use of the CPU's instruction cache. Indeed BB4W's interpreter is so small that it will fit in its entirety within the instruction cache of some modern processors (e.g. the 64K cache of the AMD Athlon 64). This contributes to the marked speed difference between BBC BASIC and Liberty Basic (using a Sieve of Eratosthenes benchmark 'BBC BASIC for Windows' took 1.1 seconds and Liberty Basic took 16.6 seconds: fifteen times slower!).

The other big advantage of compact executables is speed of download via an internet connection. BBC BASIC's small, single-file EXEs can be placed on a web site and, with a couple of clicks of the mouse, can be running on the user's PC within a fraction of a second. There's no need to download them (manually) to the PC, nor to decide where to store them; Windows does it all behind the scenes.

I mentioned earlier that BBC BASIC has other similarities to C. These include its ability to access memory directly, at an arbitrary address, and the incorporation of an 'inline' assembler. Of course these are 'advanced' features, and most users will neither need nor want to utilise them, but in the hands of an expert they make it possible for BBC BASIC for Windows to do things that Liberty Basic could never do on its own.

An example is accessing 'object-based' APIs such as Direct3D, COM automation (ActiveX) etc. BBC BASIC can manage this quite easily, but Liberty Basic needs to rely on external DLLs (typically written in C or Visual Basic) to achieve it. Even things as simple as accessing an API function exported by ordinal rather than name are beyond Liberty Basic's capabilities.

Although BBC BASIC for Windows is much faster than Liberty Basic, there can still be occasions when it is not fast enough. This is where the built-in assembler can be invaluable; programs can be fairly easily written in which the few time-critical parts are written in assembler and the rest in BASIC, a hybrid approach which is extremely powerful and flexible.

To be fair, there are a couple of respects in which BBC BASIC is less capable than Liberty Basic: these are the maximum sizes of integer and string variables. In common with most modern languages BBC BASIC uses 32-bit integers whereas Liberty Basic supports 'arbitrary precision' integer arithmetic. BBC BASIC's strings are limited to 65535 characters, whereas Liberty Basic's strings can be much longer. In those very rare circumstances when these are important there are workarounds for both issues in BBC BASIC.

In conclusion I am quite convinced that BBC BASIC for Windows is a better language than Liberty Basic. It creates smaller, faster executables. It allows you to do far more things without recourse to using an external DLL. BBC BASIC is a real enthusiasts' language; whatever you want to do it will rise to the challenge.

Wednesday, September 26, 2007

I've recently discovered something fascinating about the inner workings of Windows which, whilst known by the cognoscenti, seems not at all well understood and even less well documented.

It is well known that messages posted to a window, e.g. by calling PostMessage, are queued and only get dispatched to the Window Procedure by means of a message loop (or message pump) calling the GetMessage (or PeekMessage with the PM_REMOVE option) and DispatchMessage functions. It is also quite well known that messages sent to a window from the same thread, e.g. by calling SendMessage, are not queued and call the Window Procedure directly. Note that this can easily result in the Window Procedure being called re-entrantly, for example when a message handler itself calls SendMessage. It also means that sent messages are never filtered by the GetMessage wMsgFilterMin and wMsgFilterMax parameters.

But what happens to messages sent to a window from another thread? They neither get queued (in the sense of the posted messages) nor do they call the Window Procedure directly, but instead they are dispatched to the Window Procedure within a GetMessage, PeekMessage or SendMessage function. This dispatch happens invisibly and automatically within the functions; GetMessage and PeekMessage will never return sent messages, but they nevertheless dispatch them!

This knowledge is invaluable for preventing deadlocks. Suppose Thread A receives a message, and within the message handler it waits on some event happening in Thread B. Suppose also that Thread B sends a message to Thread A and does not action the event until the SendMessage function returns. This is a recipe for a deadlock: Thread A is waiting for Thread B's event and Thread B is waiting for Thread A to respond to its message, which it cannot do because it is waiting for Thread B....

With our new-found knowledge the solution is simple. Whilst Thread A is waiting for Thread B's event, it calls PeekMessage. It needn't remove any messages from the queue, nor take any notice of what is returned from PeekMessage, simply calling the function is sufficient! Now the message sent from Thread B is dispatched (within PeekMessage), Thread A responds, and Thread B continues execution and actions the event. Result: no deadlock!

Sunday, November 05, 2006

Some more examples of shortcomings in the Windows API documentation have been causing me grief recently. The first concerns the WM_CREATE notification and what happens when you return -1 from it. According to the docs "the window is destroyed and the CreateWindowEx or CreateWindow function returns a NULL handle"; what it doesn't say is that the WM_DESTROY handler is called in the process! This is not a trivial omission because you are very likely to include in the WM_DESTROY handler some 'cleanup' code which you may not necessarily want to execute if WM_CREATE has been aborted. For example - and this is what affected me - WM_CREATE might launch a worker thread that WM_DESTROY terminates; if the thread was never created (because WM_CREATE was aborted beforehand) then WM_DESTROY might wait forever for the non-existent thread to exit.

Fortunately a solution is to hand in the form of the WM_NCCREATE notification: this is called before WM_CREATE and again can be aborted (this time by returning FALSE) with the result that "the CreateWindow or CreateWindowEx function will return a NULL handle". Note that it doesn't say "the window is destroyed" and indeed the WM_DESTROY handler is not called in this case. So by moving some of the initialisation from WM_CREATE to WM_NCCREATE you can abort it without activating WM_DESTROY. But yet again the docs let you down: there is no hint that processing WM_NCCREATE will have any untoward side-effects but in fact it causes the window title not to appear! To get the normal window title you must exit from WM_NCCREATE via DefWindowProc.

The second case where the API documentation leaves a lot to be desired is the MenuItemFromPoint function. This "determines which menu item, if any, is at the specified location" and must be passed the "Handle to the window containing the menu", the handle of the menu itself and the location to test: "If hMenu specifies a menu bar, this parameter is in window coordinates. Otherwise, it is in client coordinates". Already we are led astray, at least in the case of a popup menu. Firstly, the "window containing the menu" isn't the main window containing the menu bar - it's the actual menu itself (you'd be forgiven for not knowing that a popup menu is a window!). Secondly you have to pass screen coordinates always, not "window coordinates" (whatever they are) or client coordinates!

This is a pretty comprehensive set of errors, but it gets worse. How do we find the window handle for a popup menu? This is quite an elusive beast and there are no direct ways of discovering it as far as I know. The only methods I am aware of are to use "WindowFromPoint" if you know a point within the popup menu (which you are likely to if you intend to use MenuItemFromPoint!) or "GetTopWindow" (with a NULL parameter) if you know that the menu has just been displayed. This latter method isn't reliable if the menu item you are interested in opens a sub-menu - which will become the 'top' window if you pause long enough with the mouse over the item.

At least if you do go to the trouble of finding the menu's window handle you can use it to find the menu handle itself via the MN_GETHMENU message, although according to Microsoft that only works on Windows 2000 or later. But that's wrong too - did you guess? It actually works fine right back to Windows 95 according to my tests. So once you've gleaned all this hard-to-find information you can discover the menu item just from the screen coordinates:

hwnd = WindowFromPoint(x, y) ;
hmenu = SendMessage(hwnd, MN_GETHMENU, 0, 0);
item = MenuItemFromPoint(hwnd, hmenu, x, y);

Microsoft takes pity on us in recent versions of Windows by allowing us to pass NULL as the "Handle to the window containing the menu", thereby making it unnecessary to discover it by nefarious means; but just to rub salt in the wounds they get the documentation of this wrong too by stating that it works in "Windows 98/Me and Windows 2000/XP" whereas in fact it only works in Windows 2000/XP. Don't you just love it?

The sting in the tail, just for good measure, is that the return values from MenuItemFromPoint are incorrectly documented too. According to the docs it "Returns the zero-based position of the menu item at the specified location or -1 if no menu item is at the specified location" but I know for a fact that values -3 and -4 can be returned, although precisely what they signify remains a mystery.