After several very frustrating days, I have found the root cause of my bad readings. It was not in the sensor or in its setup. It was in fact the the TWI/I2C driver code that I took from the ATMEL SAM7 libraries. So far every bug that has taken me more than 2 days to find has been in the ATMEL SAM7 libraries. They are riddled with bugs and simplifications that are not mentioned.
The issue here was that the TWI/I2C peripheral is rather different than something like a UART or SPI bus. You start the thing and then you need to hang on for the ride as the peripheral clocks in data. As a byte arrives, you need to be right there to pull it out of the receive buffer. When the last but arrives, you then need to set the STOP condition bit to have the peripheral stop clocking in/out data.
Most of the time this works just fine. You activate the peripheral and sit in a tight loop watching the status register for new bytes. But what happens when an interrupt occurs in the middle of the polling loop? Perhaps an interrupt that takes more 23 microseconds (1 byte at 400kHz) to execute? Perhaps the RTOS tick timer that might need to swap in and out a task and might take several hundred micro seconds? Well the TWI peripheral continues to march along clocking in bytes that get overrun because you are not there to pull them out of the receive register. And because the LSM303 has an auto incrementing address scheme it wraps around back to the start of the data registers this is why I would seem to get all the bytes, but just in funky orders.
It certainly would have been nice had ATMEL included the interrupt disabling code, or at least MENTIONED that any interrupts could screw you over royally. From now on no ATMEL supplied SAM7 code is going to be used without a major scrubbing.
So the solution was simply to disable interrupts for the short amount time needed to clock in the 6 bytes of data (135 micro seconds). A better approach should I fell like being an overachiever would be to make the whole thing interrupt driven with a higher priority interrupt that the RTOS tick timer.
I ran the system for four hours today without a single fault. Before the fix, the problem would phase in and out (as the timer tick overlapped the TWI transfer) and would happen at least every 20 minutes.
“Thank you ATMEL”