Llama is right. We use a custom kernel and we've compiled almost everything as modules.
We load our UI first (from a custom startup script) and we then insert the modules in a sequential fashion. We're using tinylogin (compiled for ARM7) for to automatically login as root on the box. We also re-wrote the startup scripts from scratch. We're on LFS 5.1; so we're not using any off-the-shelf distro. We wrote our own.
As for tweaking your kernel - there are TONs of ways to accelerate it's startup time. One way is to use XIP (eXecute In Place) directly from Flash. The kernel is uncompressed in flash and loads into ram in a 1 to 1 basis. The net result is that the kernel doesn't need to decompress and it loads about 1100ms faster. You can't do this on x86 architecture because of hardware limitations (ahem 640k limit; thanks Bill!). Here's a list of some great kernel tweaks:
Apply a preset "loops per jiffie" patch
Apply a patch for eliminating IDE probing; You can preset your IDE data after the kernel enumerates the device for the first time. Subsequent boots never have to probe again.
Remove all the "printk()'s" from the kernel source. This eliminates all the startup text, but you can still debug via serial.
We've had NUMEROUS people ask for a copy of the distro. Here's our standard reply: Our distro is not for x86 computers. It won't work on an EPIA or anything like that. We're on an embedded BIOS-less platform. As such, we can't provide you with a CD-Distro for your carptuer. Sorry!
By tweaking the hell out of our kernel, we're down to about a 5 second boot. That's power-on to prompt. Our app loads in about 2-3 seconds. A cold-start boot takes roughly 8 seconds from power to UI.
We have hardware suspend and we can resume from that in about 2500ms.
Sorry for the super long post - I just wanted to provide as much info as possible.