Note from Fuzz: The Ultimate 64 can replicate the SuperCPU registers listed below so 6510 based software coded to make use of the SuperCPU’s turbo mode can be used on U64 as well. In addition, the timing and speed fundamentals discussed also apply (memory DOES end at $FFFF on U64). This will ensure your software is compatible across as broad a range of systems as possible.
Learn programming in a SuperCPU-compatible way. This is important even if you don’t own a SuperCPU yet. Make your programs benefit from the accelerator when it is present. Here is how it’s done.
By ThunderBlade/Protovision
Whether you own a SuperCPU or not – your programs should work with it. And while the engineers at CMD managed it to make their accelerator as compatible as it was technically possible, there are still some things to take care of when writing programs that are to be SuperCPU compatible.
Don’t use illegal opcodes
The first thing, which is obvious and therefore quite known, is the fact that you must not use any of the so called “illegal” opcodes of the 6510. These opcodes usually save two or three cycles by performing several operations which normally would have required more than one instruction code, thus needing more cycles. Such opcodes do not work on the 65816. The reason for this is simple: These “illegal” opcodes are undocumented and the developers of the processor did not even intend them to exist. So, the people at Western Design Center, makers of the 65816, defined and implemented all of the opcodes not already in use by one of the standard 6510/02 processors, to include new, more powerful and more useful instructions. Thus, all possible opcodes from $00 to $FF are used. The values of the “illegals” now have documented meanings which are, of course, completely different. This usually leads to a program crash when illegal opcodes are used. Therefore, please avoid them to keep your program SuperCPU compatible. You probably don’t need to save those two cycles anyway.
Load with 1 MHz
Loading routines are timing dependant. Small waiting loops or NOPs are used to create an exact timing – for example to get rid of the necessity of an extra clock line for the data transfer. This enables special written loading routines to perform disk operations much faster than the stock routines supplied in the ROMs by Commodore. JiffyDOS is also based on the timing of a plain C64, this is why the SuperCPU switches down to 1 MHz during disk access. Note that it is not the SuperCPU itself (or some hardware component inside it), but the kernal that does the switching, as soon as disk routines from it are used. The conclusion: If you don’t use the kernal in your own fastload routine or IRQ-enabled-loader, you must switch down to 1 MHz manually. This is easily done with:
STA $D07A
After loading, switch back to turbo mode with a simple:
STA $D07B
The value of the accumulator is unimportant in this case, as the registers are write-sensitive. Any write to the register will trigger it’s function, regardless of the value written.
Perform timing critical stuff in 1 MHz mode
What applies for loading routines is also important for everything else which is dependant on 1 Mhz timing. For example, if you want to display a picture in FLI mode and cannot (or don’t want to) write a routine which generates the FLI in turbo mode, switch to 1 MHz before starting the display routine, and switch back to turbo afterwards. The same goes for similar stuff like opening the side borders or even just creating color rasterbars – the 1 MHz original 6510 timing is the base you can always return to if needed. However, all processor-intensive routines of course should be executed in turbo mode whenever possible. When it comes to special routines only used when a SuperCPU is present, for example an FLI routine which is designed to run in 20 MHz, it frees a LOT of processor time for other tasks. Please consult the paragraph above (about loading) on how to switch between 1 MHz and 20 MHz.
Don’t calculate in the IRQ routine
SuperCPU owners have experienced it: Some programs simply don’t get faster with the SuperCPU! One of the main reasons, especially when the program contains vector calculations or similar routines, is that the main work is done within an IRQ routine. A frame oriented way of programming is very common among demo coders, however, it is not very efficient, especially on the SuperCPU. In 20 MHz mode, a heavy calculation would then simply take less raster time to execute, leaving a lot of unused time until the VIC (or CIA in some cases) triggers the next IRQ. So if you calculate something, scroll, move, whatever – do it in the main program. Do not “split” the calculations on frames. Let your routine calculate as much as it can in a given time – it will manage to do a lot more with a SuperCPU then.
The memory doesn’t end at $FFFF
Some programmers assume that the memory is like a globe. They leave Europe to sail westwards and expect to reach India. In other words, it is assumed that after $FFFF, the next address is $0000 again. Consider this: a program initializes the IRQ/NMI vectors (located at $FFFx) and some zero page addresses with a single loop like:
LDX #$00
loop LDA data,x
STA $FFFA,x
INX
CPX #$2A
BNE loop
data .byte $00,$c0,$00,$c0,$00,$c1,$2f,$37,$a5,$76,$a1,$98 …
The loop starts with a value of $00 in the x-register. The NMI vector ($FFFA/FB) and later the IRQ vector addresses at $FFFE/FF are filled with the desired values. With the next increment of the x-register, a kind of overflow occurs. On a plain C64, the address wraps to $0000. However, the 65816 is capable of addressing up to 16 MB directly. Since the SuperCPU has a second 64K bank of SRAM to keep the ROM images there for fastest execution, (located at $010000 upwards!), the little loop above will store the values intended for the zero page at $010000+ instead. After that, your program assumes that the right values are in the right zero page addresses, which is not the case when a SuperCPU is active – and because of this it will crash or at least not work as it should! The solution is simple: A bit more “proper” programming. Fill the two areas ($FFFx and the zero page) with the desired values in two separate initialization loops.
There are no 2 MHz
In some programs, a manipulation of $D030 on a C128 in C64 mode (with the screen turned off) is used to switch to 2 MHz and gain speed. Unfortunately, this doesn’t work with the SuperCPU. So if you want to offer 20 MHz to SuperCPU users instead of 2 MHz to C128 owners, don’t use things like INC $D030.
Use only documented locations
While most cases of SuperCPU incompatibility resulting from the usage of illegal opcodes and the other compatibility problems outlined above only apply to a minor number of programs, usually we find them right in those applications and games we would like to run with a turbo enabled SuperCPU.
However, there is one point left to mention: Please only use documented locations in the I/O-area! Although it may sound unbelieveable to some, there a programmers who write to $D220 instead of $D020, $D212 instead of $D012 and so on! Due to the nature of the C64’s design, these locations do work and do what their counterparts at the real addresses usually do. But on the SuperCPU, methods like this can not work. The SuperCPU handles I/O accesses in a very special way and was not designed to support writes or reads to undocumented mirror locations. In addition, the SuperCPU has it’s own little portion of RAM in the I/O area – $D200-$D2FF is used by the system, while at $D300-$D3FF is an extra block of RAM available for SuperCPU-supporting programs. The conclusion is simple: Only use documented address locations – this applies not only to the I/O area, but also to Kernal routines.
Detecting a SuperCPU
With the knowledge we have gained so far we are able to write programs that are 100% compatible to CMD’s accelerator board and even can benefit from the presence of a SuperCPU. If you want to let your program “know” whether it is executed by a SuperCPU or a plain Commodore system, this can be done in a simple way: Just check whether bit 7 of $D0BC (a special SuperCPU register) is zero. A plain C64 always returns a one, like every read from an I/O address that is not used. But on a SuperCPU, this bit is zero – thus giving us the possibility to easily detect a SuperCPU. Once detected, we are able to select alternative routines for SuperCPU owners, or even use full SuperCPU only programs which could take advantage of all the powerful new opcodes and addressing modes the 65816 offers.
Do you think something is missing? Do you have any kind of feedback? Then contact me at [email protected]
(C) Copyright Malte Mundt in 1999. Reproduction only with permission of the author.