Announcement

Collapse
No announcement yet.

What data to collect?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What data to collect?

    From this thread, one of the listed requirements is:

    Passively, crowd source collected map data
    This is the most graspable item on the list [as a developer], so I wanted to start a thread to gather what exactly the shopping list of data to passively collect is.

    As a trivial starting point, I would say:
    1) The raw sentences collected from the gps receiver. Actually parsing them on the server is probably unnecessarily expensive, so I am inclined to try to convert them to...
    2) position/heading/velocity. Individual clients could convert the data to something more useful [such as these values] before uploading them. It'd reduce strain on the server, and most clients will presumably already have the capability to do this.

    Suggestions?

    Gary (-;
    OBDGPSLogger, for logging OBDII and/or GPS data
    OBDSim, an OBDII/ELM327 software simulator
    mp3car forums: obdgpslogger, obdsim

  • #2
    I am not an expert on programming, algorithms, or the NEMA stream but my concern was that the parser might strip and delete too much raw data, which would limit the pool that an algorithm has to work with.
    Would it really take that much server load to send all the streams up and parse them server side? They could be processed once based on the currently accepted method.
    If the parsing theory changes the source data would always be there.
    If this concept isn't penny wise and pound foolish, I would be willing to fund all the CPU power we need to give algorithm developers the best chance for success.

    At the same time we don’t need to pay money to store and process junk.

    Comment


    • #3
      IMO, raw data would be best. Ad hoc info could be gathered from the same application, and the server(s) could be optimized to process data at idle time.

      Then again, I'm only about an hour into researching any of this...
      Play with it, 'til it's broke.

      Comment


      • #4
        Are there formats in addition to NMEA data that we can look at to see what data is collected? Maybe we should make a list of possible data to collect, check the goals of the project (free, open, routable data?) and select from that list plus any other data.

        For example, Thunderstick has a thread open right now where he is thinking about trying to build OCR recognition for speed limit signs in the US. If something like that were available, you would be able to estimate the posted speed limits on some roads.

        Given that we may not know all of the types of data that may come along, when formulating or picking a standard for the data, we ought to give some thought to providing the flexibility to extend the data, or integrate it with other data like video or audio. (Example: your GPS reports your position and as you drive, it pulls up text for historic markers along the roadside and reads them. Or, takes photos or vido and geotags them and uploads them to a server for real, real-time traffic info).

        Sorry, I think we're talking about the actual data to collect right now. I think I've veered into the standards area.

        How about we start this way: What is the goal of project and collecting the data?
        Originally posted by ghettocruzer
        I was gung ho on building a PC [until] just recently. However, between my new phone having internet and GPS and all...and this kit...Im starting to have trouble justfiying it haha.
        Want to:
        -Find out about the new iBug iPad install?
        -Find out about carPC's in just 5 minutes? View the Car PC 101 video

        Comment


        • #5
          Glad to see my brain wasn't the only one working in overdrive on this... Then again, it's what I get paid to do, and my brain seems to operate based in SQL.

          But basically, all data I can forsee would be centered around a few key items:
          1.) Position -- XYZ, or Longitude, Latitude, and elevation
          2.) Direction
          3.) Speed/Velocity
          4.) DateTime
          5.) User

          From that, all other information I can think of off the top of my head can be added separately via either user input, or some form of data acquisition from other sources. Things like road names, speed limits, addresses, destinations, detours, images, and any other number of details like BugByte suggested.

          As I said, my brain works in SQL, and it's used to old-school database layouts using header/detail tables, so basically I see data coming in, and either new XYZ coordinates (table_1) being put in or their info updated. Each time that XYZ is "hit", the datetime/direction/speed/user is recorded (table_2). From there, the most basic information can be gathered from the data. Heck, at this point 3D maps could be generated!

          Personally, I think that's a good starting point for discussion of the data itself... Now, onto data acquisition from the hardware... Something I know nothing of at the moment.
          Play with it, 'til it's broke.

          Comment


          • #6
            How about we start this way: What is the goal of project and collecting the data?
            Lol I like the way you think bugbyte.

            So the goal would be to collect data in one place, process it and dump it to the OSM database.

            A secondary goal would be for the community to have GPS data to play around for their own map projects or other applications that have dependency on map data.

            I have been looking at NMEA streams and what kind of data they record (a good article here http://www.gpsinformation.org/dale/nmea.htm). From what I saw in the gps streams I have recorded (from a BU353 puck) and read online, hardware providers most of the time follow the NMEA 0183 standard.

            I have also been looking at OSM's data and what they need. From a pure data perspective their gpx trax files consist of:
            -GPS Tracks can contain multiple Track Segments each of which contains track points
            -The Tracks have name tags
            -The Track points have lat and long coordinates and a time stamp

            I attached a sample download from their database(file called gpxTrax_sampleData.zip )

            I have been playing around with the whole process just to get a feel for what you need to do to actually get some data on a map, label the data, render it and upload it to OSM for other people to see.......It's actually kinda complicated so far and I'm stuck on labeling data (using JOSM).

            Here's what I've done so far:
            -Record some GPS data (attached as "GPSSim(4).txt")
            -convert it to gpx
            -import it into JOSM
            -play around with it and produce a gpx file with more information
            -render it and use it in OSM

            The cool thing I noticed is that the NMEA stream has a ton of cool information that could be of use beyond OSM map generation (like maybe nav routing down the road?).

            So, anyway here's what I've been thinking:

            -dump a whole bunch of raw NMEA data to a server. Which then we can use to extract the important stuff (Lat, Long, Timestamp and maybe Elevation, and Speed).
            -Figure out how to automatically process this data to be useful to the OSM project (in the process weed out bad data, label it and organize it)
            -Build a database with this data that in turn can be used for...I'm sure somebody's going to figure something useful to do

            Since the initial goal of the project is to help out the OSM guys, collecting the data is the first step and making it useful to the OSM project is the second step. The "making it useful" might get a little tricky as it seems it needs a human to interact with the data, but there has to be a way to automate it.

            As to the data collection, we have an FTP server up and running. Potentially down the road we'll have a database, but we don't really know what kind of data we need right now, and it's a lot easier to record and dump a file at this stage of the game.

            What do you guys think? Let me know, I have a whole bunch of ideas brewing, but I thought I'd start with a little rant and some ideas
            Attached Files

            Comment


            • #7
              How about accuracy? Knowing the number of satellites or the accuracy helps when you process the data - say if two track went down the same 4 lane highway, one might show off in the median while another is more accurate and is accurate enough to tell which lane it is in.
              Originally posted by ghettocruzer
              I was gung ho on building a PC [until] just recently. However, between my new phone having internet and GPS and all...and this kit...Im starting to have trouble justfiying it haha.
              Want to:
              -Find out about the new iBug iPad install?
              -Find out about carPC's in just 5 minutes? View the Car PC 101 video

              Comment


              • #8
                How about accuracy? Knowing the number of satellites or the accuracy helps when you process the data - say if two track went down the same 4 lane highway, one might show off in the median while another is more accurate and is accurate enough to tell which lane it is in.
                That's why I was thinking collect the raw NMEA sentences and figure out what we need later on.

                The OSM guys don't need that info, but it would help if we were to do our own routing engine based on the OSM data.

                Comment


                • #9
                  Originally posted by Bugbyte View Post
                  How about accuracy? Knowing the number of satellites or the accuracy helps when you process the data - say if two track went down the same 4 lane highway, one might show off in the median while another is more accurate and is accurate enough to tell which lane it is in.
                  i completely agree. I would think your readings would get more accurate with every pass.

                  You might even be able to determine where urban or physical signal "canyons" are to present a map of GPS dark areas. I am not sure what you would do with that, but it might be interesting.

                  Collection all the raw data might also give you an insight into the health of the GPS satellite network.

                  Comment


                  • #10
                    I think we should collect the same data that is contained in the NMEA sentences (eg all of it), but compress it to a binary format for upload to the server to save server space. It doesn't need to be collected and uploaded AS sentences, but I think we should save all the data. There is nothing worse than collecting a bunch of data, only to realize you left something out that could've been very helpful. Has there been any discussion on a data format as of yet? Fixed length field, TLV or something else? Being as GPS data is very reliably consistent, I think it would save space to go with fixed length field values, just my two cents.
                    "stop with the REINSTALLS, what do you think we got some lame-o installer!!!" - mitchjs
                    RevFE
                    My Shop

                    Comment


                    • #11
                      I think you should be using the standard NMEA sentences as it is multiplatform so there will be no more talk of the Windows v's Unix warring factions.

                      The other plus is that this is a documented industry standard. Also as discussed, there is a risk that if this data is filtered by the user, you will find that you want to use some data that you have not got a year from now.

                      The other advantage from using NMEA is that users could start contributing right now before the server is in place simply by enabling the logging in Xport becasue as you say almost everybody on the Windows Platform uses C's software...

                      Once the server is in place, they could start uploading all the data they have.

                      I already have Xport logging turned on so I have an evidence trail to refute a speed camera fine if it ever turns up in the mail so I could give you quite a deal of data straight away if you ever extend your reach to Australia!.

                      You might also look into how the US government manages place names. In Australia, there is a central government controlled database of place names that appear on topographic maps (towns, buildings, mountains, creeks and rivers etcc and some road names. This database includes the lat and lon of all these places and I have the file sitting on my Car PC where it is accessible from my topgraphic mapping application (and the 20 Gb of topo maps I have installed).

                      So guys, stop bickering about this, turn on logging and start collecting data straight away so there is some real data available for upload as soon as your server comes on line.

                      Oh and there is no reason why your server can't support multiple input file formats just like we do here http://www.4x4earth.com.au/
                      RodW
                      2007 Toyota Hilux with a CarPC..

                      Worklog: http://www.mp3car.com/vbulletin/work...ota-hilux.html
                      OziExplorer GPS Embedded in RR: http://www.mp3car.com/vbulletin/sb-s...iexplorer.html

                      Comment


                      • #12
                        Originally posted by malcom2073 View Post
                        I think we should collect the same data that is contained in the NMEA sentences (eg all of it), but compress it to a binary format for upload to the server to save server space.
                        Hmm...compress...binary format...save server space. How about we just zip the files? NMEA is just straight up text and should be highly compressible, plus you don't lose any data.

                        Comment


                        • #13
                          Sicarius: we could compress, or we could offload some of the work the people working with the data have to do onto the data collector, you'd probably see about the same amount of space savings. That and you could always zip up the binary data, see if it's any smaller then

                          NMEA is expensive to process in large amounts cycle-wise
                          "stop with the REINSTALLS, what do you think we got some lame-o installer!!!" - mitchjs
                          RevFE
                          My Shop

                          Comment


                          • #14
                            Another question to answer is how many points per unit time do you want? Most people are going to be producing 1 point per second but mine is going to be putting out 10 per second.

                            If you're offloading to the client to pre-process before submitting you could also do some smart stuff like decreasing the report rate the slower you're moving, culling entirely when stationary.

                            Comment


                            • #15
                              We do want to know when someone is stationary, that could be used to figure (for example) stop signs, stoplights vs stop signs, what turn lanes have lights, etc. Like I said I don't think removing any data at all is a good idea. Space and bandwidth are relatively cheap, but going back and re-collecting a couple months worth of data from a couple hundred people because we forgot something... not so much.


                              I also think that getting as many samples per second should be key. As long as the samples are timestamped they can be of value. Until we start getting into the actual processing of this data, we really have no idea what is useful and what isn't, so I think what data to cull out is a discussion for much later.
                              "stop with the REINSTALLS, what do you think we got some lame-o installer!!!" - mitchjs
                              RevFE
                              My Shop

                              Comment

                              Working...
                              X