Reply
Thread Tools
Posts: 2,102 | Thanked: 1,309 times | Joined on Sep 2006
#11
So, to clear up the confusion re NEON instructions, these are an extension of the main CPU instruction set (actually handled by a coprocessor iirc, like the VFP instructions) which perform some SIMD operations.

The DSP is a separate chip, with its own compiler and assembly language and indeed its own kernel (that runs on the DSP and performs IO access, multi-tasking of DSP threads, etc.)

So, there is an obvious advantage that the DSP can run in parallel with the CPU. That's the first thing.

The second, and the reason you have a DSP rather than another CPU is that the DSP is optimised for arithmetic operations in various ways. It can perform SIMD, efficient MAC (both in terms of time and power), the instructions generally inform you if they have had problems (e.g. overflow, underflow) so that explicit checks do not need to be coded in-line.

Another advantage of the DSP is the speed of the memory that it can access and the number of cycles required to access said memory.

I should say that I've not done any DSP programming for the C6x which we have in the N900/beagleboard, as I've been too busy, but I can comment on the disadvantages of the C55 which we had in the N8x0 and say how things have improved with the C6x which we have in the N900.

With the C55 there was no 8bit type (i.e. a char was 16bits wide) which made porting code a real pita. Thankfully the C6x does have an 8bit type which should make things much easier.

My understanding is that the DSP bridge (which is a Linux kernel side link to the DSP's own kernel, and allows data and object code to be passed back and forward) is also nicer to use now than the DSPgateway we used to use on the N8x0.

Originally Posted by S0urcerr0r View Post
Now to the questions:

1. Whats so special with the DSP? is it really worthwhile developing for a 430mhz DSP rather than a 600mhz CPU? i know Mhz vs. Mhz can be a whole science, so to make the question short: Where does the DSP shine, from a system level perspective? (...Floating point calculations? MOV's?)
Not normally floating point, but certainly fixed point. In fact many DSPs (C55 in the N8x0 and I think the C6x in our N900s included) don't have fp hardware anyway so it's all done using fixed point.

Originally Posted by S0urcerr0r View Post
2. Can the DSP be used in conjunction/simultaneously with the CPU on the N900? or does it generate too much overhead on the BUS?
Yes it can be, however how this is done may be troublesome.

The DSP runs a multitasking kernel (which is separate to the Linux kernel) and includes an MMU. Data transfer is carried out by mapping shared memory on both the DSP and CPU sides (this was done by the DSP gateway kernel-side code.)

On the 770 the DSP was used to decode video and on the N8x0 it was used only for audio decoding. With the N900 we're back to using the DSP for video.

I ported the SBC codec to the C55 DSP on the N8x0 in the hope that it would offload some CPU load and allow better quality movies to be played in mplayer while using A2DP headphones (the next step being to implement an mp3->PCM decoder that could keep the PCM data on the DSP rather than passing it back to the CPU via the shared memory buffer), but this was not to be.

While the Nokia supplied DSP kernel included an audio hw codec driver, no api docs were available so we would have had to have rewritten the audio hw codec driver and replace the kernel in order to directly output audio from the DSP (not the end of the world, but it would stop the existing mp3, aac decoders, etc., from working). Therefore I used the shared memory buffer to both pass PCM data to the DSP and retrieve it from the DSP. It worked, but some sort of memory contention issue stopped it from being used with mplayer playing video files at the same time.

So yes, you may have troubles. I'm afraid that I don't know how the N900's DSP outputs video data, whether it's passed back to a shared framebuffer or output directly.

Originally Posted by S0urcerr0r View Post
3. Which applications can benefit from using the DSP?
i would guess all kinds of sound and video encoding/decoding is where it shines? But why is that? Can it also be used for regular file compression (ZIP/RAR)?
Anything repetitive basically, I don't know about the internals of file compression offhand and would have to sit down and look at some pseudocode to decide either way.

Originally Posted by S0urcerr0r View Post
4. Are there any tools available to compile and build code? Will it cause any software conflicts running such code builds in Maemo? .
Yes there are tools for the N900 which were originally here: http://omapzoom.org/wiki/DSPBridge_Project

There's a wiki page called "DSP Programming" on wiki.maemo.org with links for the N8x0 tools.

Originally Posted by S0urcerr0r View Post
#
i would guess it isnt a good idea to try using the camera video recording while simultaneously using the DSP for other activities - there hardly would be any resource handling or multitasking kernel running for the DSP..
Actually there is resource handling and multitasking, so it might work (quickly enough).

HTH
 

The Following 6 Users Say Thank You to lardman For This Useful Post:
Posts: 124 | Thanked: 52 times | Joined on May 2010 @ Sweden
#12
[lardman:]

Thanks legendary one!!!

Its a honor to recieve such an impressive detailed answer from a god.

i will take a good look on those tools you linkedme too as well. Right now i dont know where to start off with DSP asm but when i get more used to coding x86 asm for the 8088 cpu, i hope i will be experienced enough to logically understand why the DSP will perform better arithmetics. Atm im still learning the basics of x86 asm, and get confused how to efficiently know when to implement bit-shifts (SHR/SHL) for multiplication/division, and when a MUL/DIV may be more appropriate ...In other words i still have a very long way to go. i just hope that i will soon reach a level where i start handling basic instructions and syntax by routine (automatically knowing where results of different instructions ends up without mental effort (ex. AL/AH or AX)). Ive looked at the instruction list for the Intel 8086 and many of the instruction descriptions dont even make sense at the level im currently at. it will require much patience and some studying of mathematics in english)


i imagine that one key to logically understand the DSP's effectiveness is to learn how to properly use those 64 registers of the DSP simultanously. but that will probably be the hardest part to learn as well (considering all the rules that applies when different registers is used simultaneously with different instructions that uses different amount of cycles)

Your post (from beginning to the end) have been really helpful and i feel more encouraged that learning asm for the C64x DSP is actually possible. when i reach a certain level, i will start out DSP asm by making some simple calculation apps for it - almost like a "hello world" app (without display capabilities). they should be small enough to fit in the DSP's cache (if the DSP have cache) so i hope i wont need to use shared memory during the execution of the app, until output is needed from the DSP.

Thank you very much again!




[To the other people who have replied:]

Thank you as well. It have been very interresting to read your answers. Particullarly the info about where the DSP is applicable.

Last edited by S0urcerr0r; 2010-09-17 at 00:51.
 

The Following User Says Thank You to S0urcerr0r For This Useful Post:
Posts: 2,102 | Thanked: 1,309 times | Joined on Sep 2006
#13
Glad to help, please feel free to pick my brain either here (and if I don't see the post send me a PM/email and I'll respond).

One good thing about the tools is that you can write in C (they include a C compiler), then optimise those parts that need it in particular by writing those in ASM after you've got your code running. This menas the learning curve is not all that steep (at least to get something running, as opposed to getting something running as fast as possible!)

 

The Following User Says Thank You to lardman For This Useful Post:
Posts: 2,102 | Thanked: 1,309 times | Joined on Sep 2006
#14
Originally Posted by S0urcerr0r View Post
[lardman:]
when i reach a certain level, i will start out DSP asm by making some simple calculation apps for it - almost like a "hello world" app (without display capabilities). they should be small enough to fit in the DSP's cache (if the DSP have cache) so i hope i wont need to use shared memory during the execution of the app, until output is needed from the DSP.
I also never looked at the cache size of my code, though I'm sure if someone was doing this professionally they would do so (and so much more). What you may find a more immediate limitation of the DSP is that there is not much memory accessible to the chip (on the C55 there was some double access memory, some single access and some slow access shared memory). The C6x has more memory afaiu, but certainly on the C55 this started to become a problem when looking at porting code (though mainly because I wasn't an expert with Vorbis).

In any case I'd say go for it, being able to code in C, then optimise what needs it is a great way to get going easily, and the fact that the C6x has 8bit types removes that significant porting hurdle, do let us know how you get on (and if I eventually get some free time I may even try some DSP hacking on this device myself )
 

The Following User Says Thank You to lardman For This Useful Post:
Posts: 124 | Thanked: 52 times | Joined on May 2010 @ Sweden
#15
Thank you very much again!
The "final" goals ive setup for the future is to get enough experience to write some encoders for formats like MP3, MPEG2-video, JPEG (incl MJPEG) and maybe also file compression (zip/rar). i'll start out with the format that i'll consider easiest, and most well-documented.

...i have an old 10mhz Intel 8088 with 768KB ram and a superslow 30MB HDD, in the basement - that could handle playback of videos in resolution 160x100 / 8-bit color (no sound), with a nice framerate and almost no compression artefacts - on a regular vga card with no acceleration...
thats why i think the current encoders/decoders for the "newly" developed OMAP3430 (430mhz DSP) cant be properly optimised.
i will study all those different video formats and choose the one that puts as little strain on the DSP as possible.
Phone manufacturers seem to consider small filesizes, over output quality.
Many regular digital camera manufacturers instead seem to consider output quality first, and use older formats like MJPEG (i think) which make bigger output files but less compression artefacts.
if i can get 800x600 (or even 1024x768) video encoded from the n900 camera on the fly in a HQ format like MJPEG i will be very satisfied with that.
hopefully the system bus/storage will be able to cope with outputing video in ~1-2 MB/s(+all other memory transfers for encoding).

this is the major reason im studying low-level programming - but like you said: maybe its a better idea to first make the encoder in C, and while doing that, i can decide which algorithms will need asm optimization ...if i write portable C code i will also try it out on the CortexA8 to compare execution speed

PS. sorry if this reply look like a mess. ive been studying asm 10 hours straight (the whole night) and can barely write proper english now...

i will probably make the next post in this thread when i get a "hello world" app working (like a calculator / Pi engine) in C on the DSP.
If i encounter problems worthy of a gods attention, i'll get in touch. thanks again

Last edited by S0urcerr0r; 2010-09-18 at 05:35.
 

The Following 2 Users Say Thank You to S0urcerr0r For This Useful Post:
Reply


 
Forum Jump


All times are GMT. The time now is 08:46.