I think you should be using the standard NMEA sentences as it is multiplatform so there will be no more talk of the Windows v's Unix warring factions.
The other plus is that this is a documented industry standard. Also as discussed, there is a risk that if this data is filtered by the user, you will find that you want to use some data that you have not got a year from now.
The other advantage from using NMEA is that users could start contributing right now before the server is in place simply by enabling the logging in Xport becasue as you say almost everybody on the Windows Platform uses C's software...
Once the server is in place, they could start uploading all the data they have.
I already have Xport logging turned on so I have an evidence trail to refute a speed camera fine if it ever turns up in the mail so I could give you quite a deal of data straight away if you ever extend your reach to Australia!.
You might also look into how the US government manages place names. In Australia, there is a central government controlled database of place names that appear on topographic maps (towns, buildings, mountains, creeks and rivers etcc and some road names. This database includes the lat and lon of all these places and I have the file sitting on my Car PC where it is accessible from my topgraphic mapping application (and the 20 Gb of topo maps I have installed).
So guys, stop bickering about this, turn on logging and start collecting data straight away so there is some real data available for upload as soon as your server comes on line.
Oh and there is no reason why your server can't support multiple input file formats just like we do here http://www.4x4earth.com.au/
2007 Toyota Hilux with a CarPC..
OziExplorer GPS Embedded in RR: http://www.mp3car.com/vbulletin/sb-s...iexplorer.html
Sicarius: we could compress, or we could offload some of the work the people working with the data have to do onto the data collector, you'd probably see about the same amount of space savings. That and you could always zip up the binary data, see if it's any smaller then
NMEA is expensive to process in large amounts cycle-wise
Another question to answer is how many points per unit time do you want? Most people are going to be producing 1 point per second but mine is going to be putting out 10 per second.
If you're offloading to the client to pre-process before submitting you could also do some smart stuff like decreasing the report rate the slower you're moving, culling entirely when stationary.
We do want to know when someone is stationary, that could be used to figure (for example) stop signs, stoplights vs stop signs, what turn lanes have lights, etc. Like I said I don't think removing any data at all is a good idea. Space and bandwidth are relatively cheap, but going back and re-collecting a couple months worth of data from a couple hundred people because we forgot something... not so much.
I also think that getting as many samples per second should be key. As long as the samples are timestamped they can be of value. Until we start getting into the actual processing of this data, we really have no idea what is useful and what isn't, so I think what data to cull out is a discussion for much later.
The sentences are standards... Therefore, I vote they be stored intact, line-by-line, in sequence on the server. At that point, any further information can be propagated from there. Surely, the server will have a mildy idle time where it can perform daily "builds" of the data.
A user uploads the formatted GPS output (like in the .txt file attached above) to the FTP. A parser application runs, which opens each new file, and inserts each sentence into a table, and then moves the text file into a save folder under a formatted name.
This is the most basic of the tasks, but it will also be an incredibly processor intensive one. All other data could easily (though, not fast) be propagated from a raw data table.
Play with it, 'til it's broke.
@ FO and ecog - How about we do some experiments with NMEA data?
Why don't we compare a couple of datasets and compare the results of:
Pick some track data from a GPS that is 1 minute, 10 minutes, 20 minutes, 30 minutes and 1 hour long.
Figure the data sizes for all of the options and let's graph them to get an idea of how big the data storage requirements would be. Remember to take into consideration backups of the data both in time and in size/cost.
Then, we should see what kind of processing has to take place on the data to turn it into openstreetmap track information and run it against the datasets. You'll have to do this in the future to process the data and it would be a good idea to see if that would scale.
Then, we can calculate storage size and processing times for 1, 10, 100, 1000, and 10,000 users.
No sense starting it off one way and then finding out it won't scale.
For now we have created this account with the following permissions:
-upload files to the server
-view all the uploaded files in that directory
-Can't delete files
-can't download files
ftp port 21:
Get your ticket id: http://www.mp3car.com/ticket.php
The forum username on the ticket.php page is optional. It will only be used down the road for credit purposes (who uploaded how many tracks, etc....)
More on data format:
-Files should contain NMEA sentences
-files should be in text format
-a collection of text files in zipped format will be OK
-Files without the proper format will be given the lowest processing priority