My Photo
Location: Downham Market, Norfolk, United Kingdom

Wednesday, December 30, 2009

BBC BASIC allows you to incorporate assembly language code in your BASIC program, typically for use when interpreted BASIC isn't fast enough or for specialised purposes such as coding interrupt service routines. Using a hybrid of BASIC and assembler code is a powerful technique, allowing you to use BASIC for the bulk of the program (typically including the user interface) and assembler for the speed-critical parts (which are likely to be a relatively small proportion of the whole).

Needless to say the assembler provided in BBC BASIC is specific to the particular processor on which it runs. So the original BBC Micro BASIC incorporates a 6502 assembler, Acorn's BASIC for RISC OS machines incorporates an ARM assembler, BBC BASIC (Z80) (e.g. for CP/M or on the Z88) incorporates a Z80 assembler and BBC BASIC (86) for MS-DOS incorporates a 16-bit x86 assembler. BBC BASIC for Windows incorporates a full 32-bit x86 (IA-32) assembler, including support for floating point and MMX instructions.

Whilst support for 'inline' assembly language code is not unusual in programming languages (for example it's a standard feature of C), BBC BASIC provides it in a unique way. Conventionally, any assembler code included in a program is converted to machine code (i.e. assembled) at 'compile' time and incorporated in the final executable as binary data. In contrast, BBC BASIC incorporates the assembler 'source' code in the executable (usually in a compressed format) and assembles it to machine code when the program is run.

Let that sink in for a moment. If the source code is assembled at run-time, it means the BBC BASIC executable must itself contain an assembler - which indeed it does (albeit a small and very fast one). As far as I know no other programming language creates executables which contain an assembler, let alone a full IA-32 assembler.

In many cases this very different way of dealing with embedded assembler code won't have any practical consequences - the end result will be equivalent to the more conventional method. However it makes possible some things which would otherwise be very difficult or messy to achieve. In particular it allows the assembled machine code to vary according to some condition which is determined at run-time.

Why might that be valuable? Suppose, for example, that the assembler code accesses one or more memory locations, which means that the addresses of those locations must be known. Typically, those addresses will vary from one run of the program to the next, or between one PC and another, so they aren't known when the executable is created (i.e. when the program is 'compiled'). Conventionally this means that the addresses can't be incorporated as 'constants' within the assembler code, instead they must be passed to the code as parameters (e.g. on the stack) when it is run.

However in the case of BBC BASIC things are very different. The assembler source code is converted to machine code at run time, so it can usually be arranged that the addresses are known by then. As a result, the 'variable' memory addresses (which change from one run to the next) can be incorporated in the assembled machine code as 'constants', making the code both simpler and faster.

To take a practical example, suppose the assembler code needs to load the eax register from a memory address that is known only at run-time. In a conventional programming language the assembler source code will look something like this (where the address is passed to the assembler code on the stack):

mov ebp,esp ; get pointer to stack frame
mov ebx,[ebp+offset] ; get address
mov eax,[ebx] ; load eax from memory

But in the case of BBC BASIC the code can simply be as follows, because the address is already known:

mov eax,[address] ; load eax from memory

If there are many such memory references the impact on execution speed can be considerable. Similar considerations may apply whenever the code can be simplified by incorporating values as 'immediate constants' rather than 'variables'.

There's another valuable benefit of BBC BASIC assembling source code at run-time. In Microsoft Windows different processes run in separate (virtual) address spaces and are effectively quite well isolated from each other (one process cannot see another process's memory directly). However it is possible for one process to 'inject' code into another process and then run it there. The snag is that the address in the remote process at which the code must be loaded, and executed, isn't known until run-time.

This poses quite a problem for a conventional language, as IA-32 machine code isn't (in general) 'position independent' and so must be assembled (and linked, if appropriate) to run at a specific address. Typically the assembler code has to be written in such a way that it can be 'relocated' at run time - by 'patching' it using, for example, C code - according to the address at which it will eventually be loaded in the remote process. This is potentially very messy, prone to error and difficult to debug.

With BBC BASIC this is all very much easier. Because, by the time the code is assembled, the address at which it must eventually run is known, there is no requirement for it to be 'patched' or 'relocated' (the BBC BASIC assembler has a specific feature whereby the code can be assembled so that it will run at an address different from that at which it is initially stored). This makes it exceptionally well-suited to injecting code into another process.