ARM processors have multiply-accumulate too. I also wonder if the DSP is really faster for many tasks, or is the advantage just that you can have the CPU doing other stuff at the same time. It is hard enough finding tutorial type information for NEON, nevermind the DSP