There’s no need to do 2 pass encoding because as there’s really nothing to improve with the second pass because it’s mostly a still image. Also you didn’t mention it but turn off maximum render quality if you have it on. You should also add a value to the keyframe field though because you don’t need to encode and write every frame (or have the encoder decide for you) so you set a longer time before the encoder adds a keyframe.
For example I rendered an hour long full HD sequence with a still image and a music track.
10Mbps no keyframes, file size a bit over 2 gigs
10Mbps keyframe every 300 frames, file size a bit less than 400 megs