music collections far exceed 100,000 songs and users expect instant performance. I've done a lot of work with natural language processing as part of speech recognition but it adds overhead so that probably won't be the direction we go.
windows is based off of directshow which provides support for all of those. The nice part about the modular design is that you can add support for whichever backend you prefer.
1) Visualizations are added as part of the AV player plugin and respond to the visualize and stop visualize commands
2) Visualizations are their own type of plugin
3) Visualizations fall into the other category (aka implement IOther) as an unofficial part of the framework
All of which would require very very minor changes to add support. My personal preference would be option 1 but that would slightly limit the diversity of standalone visualization engines.