I have been working on a custom DataAbortException handler, and have noticed some inconsistency with the instruction being reported as the instruction at fault. Ideally, the instruction that triggers the exception should be a load or store instruction. However, this is not always the instruction that the exception triggers on. It can vary between ~1-10+ instructions after the faulty load or store. This behavior makes it hard to make a handler that allows execution of the software to continue, as it is not deterministic. This variation occurs not only for different memory addresses, but even for the the same memory access (i.e. trying to read from 0xE0007100 repeatedly will give different instructions at fault, while reads from 0x40003000 will consistently correctly report the load instruction at fault).
This poses a problem because to continue execution, the handler must modify the stack such that it will not return execution to the instruction which causes the fault in the first place. Otherwise it will be in an infinite loop. However, since the instruction that is reported as the one at fault is not always accurate, we cannot simply skip to the next instruction.
To workaround this problem, I have created two functions: unsafe_read, and unsafe_write. These are used whenever a read/write request is received externally from the system. These functions pad extra NOP instructions after the load/store to allow time for the exception to trigger. This does not fully protect the exception from triggering outside of the NOP padding, as there were some cases where it was delayed by 20+ instructions. Due to this, the handler will ignore the first exception at a particular instruction, and let execution proceed from it. Then if the next exception is at that instruction, then it will be skipped. Even then, there may be a possibility that an instruction is incorrectly reported twice in a row, which is then skipped.
Does anyone know of a better solution to this problem? Checking if the address is within a valid range is more time consuming than the padded NOP's, as there are many staggered valid regions which may also change with different revisions.