I’m actually working on a project that is supposed to be a zoom call and I’ve experimented with a couple different things.
If I’m not mistaken, what Zoom and other software does is (in combination with some sort of difference matte) use the depth data from the camera to create a mask that removes anything that isn’t in focus – the camera can parse this out because it knows more or less what it’s focusing on. After it’s shot, of course, AE can’t tell what the camera was doing, so it’s harder to reverse engineer. I suspect what you’d need is some way for the camera to record focus info in the metadata which AE would then interpret, but I’m not sure that’s a thing either software can do.
The tough part is that for the vast majority of footage, unless you have raw, locked off footage with zero grain or compression, a difference matte is essentially worthless.
Fortunately, if you’re actually going for a zoom look, then the roto brush (don’t forget to extend the span so you don’t have to keep messing with it), or a track matte set to position/scale/rotation/skew with adjustments as needed should get you about as close as Zoom would (which is to say, not close at all, but it will have a similar look). If it needs to look good, then you probably need to do a more precise roto with the brush or Mocha.