I can't say I ever experienced this. In your position I'd try removing the wait loops altogether and insert a few nops in there, just to give time for the lmc to do its job. If you're sure that only one kind of machine (for example, plain 8mhz ste) will run your code, you can time it down and convert the clock cycles needed from the loop into nops.