The problem with most of these vocal removal plug-ins is the anomalies created through the process. There are a number of reasons why this process cannot be accomplished truthfully and completely.
However, before I go into a diatribe of why this method is not too great, it might actually help if I described the process.
1. Open the stereo file in an audio editor like Sound Forge, Wavelab etc.
2. Select and highlight one channel of the two channels (usually the right channel).
3. Select 'Flip or Invert' from the edit menu.
4. Now 'Sum' the file to Mono.
This process is also called the Karaoke Effect.
Basically, what is happening here is that you are inverting the polarity of one channel, also known as phase reversal, and then mixing it with the other channel and then mixing the two down to a single summed channel, and anything which is identical in both channels will cancel out, and this is why the vocals disappear.
Generally, in most recordings the lead vocal, the kick drum and bass will invariably be recorded in the centre and these will disappear through the process above.
The problem with this method is that you cannot remove panned sounds, particularly the backing vocals that are panned across the stereo field, only the centre field passes will be removed so you might actually be left with sections of the recorded material. Additionally, you need to remember that effects are often panned across the stereo field, particularly when dealing with vocals; reverbs come to mind. Sadly, the process and inherent algorithms could have more of a destructive effect than a very useful one and even more importantly: to remove vocals from a final mastered stereo mix would still not work properly as the effects and dynamics used create the mix have their own transients. If you were to use a hardware mixer, with a global reverb running on the master stereo buss, and then you muted the source signals , you would still hear the wet signal that is the reverb signal, i.e. the processed signal. In a final mix the algorithms would have to take into account all efx and dynamics used to extrapolate the vocal frequencies. Cannot be done without a destructive outcome to the other frequencies in the mix. The whole essence of efx is to give the illusion of space and spread. This colouration would also have to be accommodated for when coding as would any dynamics used. So your coding would now have to accommodate almost every type of process available plus recognise artifacts created during the process, along with the coding for the above.
However, ignore my whinging and try the process for yourself, but please bear in mind that the process of 'keeping vocals' and removing all else cannot happen because of the stereo panning of the other sources. For this to work all the sound sources would need to be identical in both channels, with the vocals panned to one side.