This is how it could be set up.
For Audio Amplitude (created from music), Both Channels:
smooth(.1,5)// 5 samples over .1 seconds
More samples over longer time gives you progressively smoother results.
For Audio Levels of your voice channel:
minvol = -6;//minimum volume level
offset = -6;// offset value for volume
mult = .5;//multiplier for voice volume
vol = Math.max(minvol,thisComp.layer("Audio Amplitude").effect("Both Channels")("Slider")*mult+offset);
[vol, vol]
Doesn’t have to be that complex, but this’ll give you some control over how much the volume gets changed.
—
When you’ve tried that, here’s the likely more workable option of ducking the music when there is voice.
Create the Audio Amplitude from the voice, put the smooth expression on it, then use this for music volume setting:
maxvol = -6;//maximum volume level
offset = -6;// offset value for volume
mult = .5;//multiplier for music volume ducking amount
vol = Math.min(maxvol, offset-thisComp.layer("Audio Amplitude").effect("Both Channels")("Slider")*mult);
[vol, vol]