Hey Dion, you should post for the community your solution – so others can learn from you!
This doesn’t need to be done with cameras at all, I don’t think. I can see it as just keyframing the scale, rotation, and position, with a wiggle expression thrown in.
The key is that the initial comp, that the text and “scribbling” and BG are in, is much larger than the final comp that this will be nested into. Then you just keyframe the animation!