The issue with inconsistent results is with AE using its cached results. So, the best tests are those that start off fresh, with no cached data – purging cache data prior to starting a preview is a good starting point for conducting tests.
BTW, do your Expressions use the valueAtTime() method? And is there a reason not to bake the Expressions into keyframes or even perhaps to just use keyframes instead of Expressions? I’m asking because I’m working on a script that works well with lots of layers and keyframes.