I'd be curious to hear about profiling results.
What I've done to profile my startup in the past is to comment out the gtk.main() call and run it through a profiler.
Code:
$ python -m cProfile -o .profile PROGRAM
$ python -m pstats .profile
> sort tottime
> show
(From memory, so forgive me if I messed up on some pstats commands).
I was looking through the fmms source code. I have a feeling that some things might be slow (ctypes.CDLL(...)) but I can't say for sure to judge whether these calls should be made lazily (either by thread or upon first use). Dialcentral has historically worked to get started as soon as possible and then pushed off a lot of initialization to a thread, including importing various components. Sadly I can't say how much that helps besides the network communication.
Also something I just remembered is there can be performance differences in how you handle calls to "show"/"show_all" in your code (as judged by a profiler).