Does the camera arrive to a really close up of the picture?
If it doesn’t, then I would use a virtual camera adjusting its focus to the picture, so everything else gets blurred (and the pixelation is not too blatant). I would center the camera to the frame, and use a slight transition from the picture to the footage once the image fills all the frame.
I see that the picture is held slightly backwards, so you will have to adjust the perspective in order to match the picture with the footage.
I hope that works.