I beleive that is is already technically possible, but I don't think it is practical as yet. As far as I can recall, the data rate for GSM phones is 9600 bps.
Therefore it would take you an hour to download a 4.5 minute song encoded at 128000 bps. As well as being time consuming this method would also be rather expensive.
I have no doubt that future mobile phone systems will have higher data rates and this will make your idea more practical. However the telephone companies will probably charge a premium price for higher data rates than for voice calls.