Yes, I think debe is right when it comes to DV: it’s more efficient and faster to keep DV video and audio together, especially when you know that you won’t be rendering. FCP will open the QT mov files and have immediate access to both video and audio.
However, I’m not certain whether it’s better to keep audio and video together when you know that large portions of the video side of your sequence will be replaced by render files. In that case, FCP has to open (at least) two chunky streams at once: the render files and the QT mov files that combined audio and video. If the audio were separated beforehand, then FCP would only have to open one chunky render stream and a much lighter audio stream. This should reduce the chance of dropped frames.
At least this is what I surmise is happening in theory.