No matter how much you try to have computers do this job for you (and I have tried many times) there is still no substitute for hand-rotoing these things. You need the mediation of the human artist’s eye to eliminate a lot of the extraneous detail and keep the lines smooth and gestural and natural-looking. No automation can really do it as well as humans can.
But humans are also inconsistent: Richard Linklater at one point I think hired every unemployed artist, hipster, and grandma in Austin to work on segments of that film of his, each working on one little segment, and it was a challenge to keep them all true to one set visual standard “look” for each character. I think the best way, though the slowest, is for a single roto artist to do the whole thing, which tends to keep it all “of a piece”.
In the sample you show, some of the shots have elements that look like a layered AE comp. I can’t imagine how tedious this would be to do in AE. I read somewhere that Linklater used some custom software for his that wa a hybrid of compositing and paint tools.
The last time I tried to do this kind of effect myself in a serious way, I used Synthetik StudioArtist to hand-color and treat each individual frame for a 30-second spot, and before I could work with the frames, I first ran them thru several batch processes in photoshop to reduce their color depth, do an initial garbage-mask roto, and create the proper contrast. I also deliberately lit things harshly, blowing out details. The less detail the source footage has, the better the final effect is. That’s been my experience of it, anyway.
And don’t even start this project unless you have a good graphics tablet already… and a LOT of patience:-)