Announcement

Collapse
No announcement yet.

Data Privacy HELP!

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Privacy HELP!

    As people submit their tracks, data privacy will be a huge issue.

    Some people don't care if everybody on the planet knows where they go and when, this would be you Bugbyte and some people like to keep their whereabouts private, for example super secret international spies who would like to contribute to the project.

    All jokes aside, if we don't address privacy issues many people will not be comfortable contributing.

    I'm not sure how to approach this. On one hand it is just a collection of tracks on a server that have no identifiable information about anybody (beyond lat/long location).

    Most of us wouldn't go through the effort to analyze data to use it for anything other than map generation. But then again I'd rather not be responsible for people's safety if somebody does find a way to figure out/track a specific contributor.

    So the question is, once the data is uploaded, how do we make it public and usable to anybody in the world without compromising the identity of the contributors?

    The only solution so far that comes to mind is stripping the beginning and trailing waypoints of each track. For example, if I have a track from my house to my friend's house and that track consists of 100 waypoints, I would strip the first and last 10 waypoints so anybody else would see that the track starts in the middle of a public street and ends in a public street.

    Let me know what you think.

  • #2
    While I think the entire argument is absolutely silly (along the lines of "oh my god they're out to get me" kind of silly), I agree that stripping a user configurable number of waypoints, or perhaps a user configurable distance from start and end might be the way to do it. Eg if the points are within say, a mile of a persons "secret" locations, then don't log or include them.
    "stop with the REINSTALLS, what do you think we got some lame-o installer!!!" - mitchjs
    RevFE
    My Shop

    Comment


    • #3
      You mean this isn't going to be a stalker's paradise?

      X-amount of seconds could automatically be taken off of each track at/by the server.

      Once data starts being loaded, processed, and placed onto the server/database, I would think that it should be a fairly easy process to edit out a block of points. For instance, if user-X doesn't want everyone to know when he goes to the strip club, he should be able to set up parameters which disregard any of his/her uploaded waypoints within those boundaries.
      Play with it, 'til it's broke.

      Comment


      • #4
        i agree that there needs to be some sort of protection to those that would contribute to the maps, the biggest problem i see is the valuable data that is lost by taking out a specific range (stripping out even a quarter mile from my house would leave alot of roads unmarked) i think the best way to acomplish this would be to have the user themself decide on what data to include/remove. what if part of the software that uploads the gps info gives the option to the person to add/remove certain areas?

        this way, anyone who doen't want to be tracked can determine at what point they don't want to contribute-- this would also come in handy for those that have very long driveways, that a gps could possibly interpet as being a cross road...

        i could also see a issue (at first at least- with minimal people being tracked) where having a circular gap in the map(for not tracking a certain distance from your house) would be just as easy for someone who wanted to find that person...
        My OLD 2001 Mitsubishi Eclipse GT:
        "The Project That Never Ended, until it did"


        next project? subaru brz
        carpc undecided

        Comment


        • #5
          How about simply the option to set or unset that user-configurable distance or time? Leave it up to the user as to how much of it to expose. I'm not real worried about someone figuring out where I live. That's pretty dead-simple to do.

          And if I upload in non-real time, they can't be sure when I'll be there or not.

          I do think an option to divorce the track data from user identification should be allowed. That way, no agency of the gov't could use it to issue you a speeding ticket, for example. Or, more likely, to use it to track a suspect in an investigation.

          I'm not a conspiracy theorist, but I'd hate to see it used against you simply because your data showed you in the vicinity of a crime at the time that it occurred.
          Originally posted by ghettocruzer
          I was gung ho on building a PC [until] just recently. However, between my new phone having internet and GPS and all...and this kit...Im starting to have trouble justfiying it haha.
          Want to:
          -Find out about the new iBug iPad install?
          -Find out about carPC's in just 5 minutes? View the Car PC 101 video

          Comment


          • #6
            Conspiracy theorist, no... That's not what I'd consider your opinion on the topic. I'd consider it more being smart about the liability of the project as a whole!
            Play with it, 'til it's broke.

            Comment


            • #7
              http://www.hhs.gov/ohrp/humansubject...ce/45cfr46.htm

              This is in the realm of protection of human subjects [or HSPC as they call it where I work]. I don't think this question is paranoid, or conspiracy-ish, or anything like that - at my place of work, this is of pivotal importance... And one thing specifically on our list of things to be de-identified is GPS co-ordinates.

              I do think an option to divorce the track data from user identification should be allowed. That way, no agency of the gov't could use it to issue you a speeding ticket, for example. Or, more likely, to use it to track a suspect in an investigation.
              It shouldn't just be "allowed", it should be *forced*. Data should *always* get de-identified, no matter what.

              Obviously searching on slashdot leads to a series of paranoid conspiracy theories, but I do find it a decent clearinghouse of legitimately useful links on this very topic. site:slashdot.org gps tax

              This topic is so much more important than it's currently assigned. I realise that I'm treading a deadly ground with obdgpslogger in this regard, but I made a pre-meditated design decision a long time ago to *not* attach any identifying information at all to the database. I normalise a lot of data exported to google earth, and I think I will, in future, also provide an option to normalise cvs data, or even normalise data going into the database.

              Gary (-;
              OBDGPSLogger, for logging OBDII and/or GPS data
              OBDSim, an OBDII/ELM327 software simulator
              mp3car forums: obdgpslogger, obdsim

              Comment


              • #8
                Originally posted by chunkyks View Post
                http://www.hhs.gov/ohrp/humansubject...ce/45cfr46.htm

                This is in the realm of protection of human subjects [or HSPC as they call it where I work]. I don't think this question is paranoid, or conspiracy-ish, or anything like that - at my place of work, this is of pivotal importance... And one thing specifically on our list of things to be de-identified is GPS co-ordinates.



                It shouldn't just be "allowed", it should be *forced*. Data should *always* get de-identified, no matter what.

                Obviously searching on slashdot leads to a series of paranoid conspiracy theories, but I do find it a decent clearinghouse of legitimately useful links on this very topic. site:slashdot.org gps tax

                This topic is so much more important than it's currently assigned. I realise that I'm treading a deadly ground with obdgpslogger in this regard, but I made a pre-meditated design decision a long time ago to *not* attach any identifying information at all to the database. I normalise a lot of data exported to google earth, and I think I will, in future, also provide an option to normalise cvs data, or even normalise data going into the database.

                Gary (-;
                I believe that in order to properly weight, ignore or qualify the tracks being uploaded it would be essential to be able to track the user that submited the trail.

                At the same time I completely agree with the need to protect privacy. We never should be in a position where we even have the ability to provide data for a subpoena.

                Is there technology that we could borrow from the medical or cryptography world to allow us to weight the inbound gps streams and still maintain privacy?

                I talk more about the need for weighting the quality of the upload here. Here is an excerpt. This link has more details.
                Originally posted by Fiberoptic View Post
                The algorithms would also be smart enough to possibly throw at anomalies. Let's just say for example that I am probe. I report with my iPhone. I regularly bike the wrong direction on one-way streets and speed 20 miles over the speed limit. The algorithm would eventually throw out certain parts of my data that are way outside the norm and negatively weight all of my other reports.

                Comment


                • #9
                  Is there technology that we could borrow from the medical or cryptography world to allow us to weight the inbound gps streams and still maintain privacy?
                  What we do here is have "cold rooms". PCs airgapped from the outside world, where linking tables are created. Data is split into two tables, one mapping the identifying data to an opaque id [usually generated with some kind of function from the other data in the row - eg, it might be SSN+DoB mangled in a specific way]. This table is stored where no-one can get it. The other table maps that opaque row ID to the actual data, and is the one that's actually copied out of the cold room and operated on.

                  Of course, this is technically subpoenable I believe. It's also not necessarily feasible for this scenario. I will ask around at work for some suggestions - there's people here who've been dealing with HSPC for literally decades.

                  Gary (-;
                  OBDGPSLogger, for logging OBDII and/or GPS data
                  OBDSim, an OBDII/ELM327 software simulator
                  mp3car forums: obdgpslogger, obdsim

                  Comment


                  • #10
                    Gary, would you happen to have a update for us on the best way to approach this?
                    (i have started logging, but am hesitant to upload until the privacy issues are worked out)
                    My OLD 2001 Mitsubishi Eclipse GT:
                    "The Project That Never Ended, until it did"


                    next project? subaru brz
                    carpc undecided

                    Comment


                    • #11
                      Same boat here, as far as starting logging...

                      Also, before I finalize a release of even the simplest of user apps, I'd need to know what can/can't be used.

                      And on the topic of that, I'm working on that app, in hopes that I can specify a singular lat/lon point, and then filter out a square chunk of data within X-distance of it. (and have multiple points like this, stored locally on the user's computer)
                      Play with it, 'til it's broke.

                      Comment


                      • #12
                        Originally posted by chunkyks View Post
                        I will ask around at work for some suggestions - there's people here who've been dealing with HSPC for literally decades.
                        Chunkyks, Did anyone from work give you any ideas? If not, who would be the expert to ask about this? We might be able to get them to solve the problem for free as part of a blog post.

                        Comment


                        • #13
                          Sorry, this completely slipped my mind. <walks off for twenty minutes and talks to some coworkers>


                          After some thought and discussion, the root problem is that we have two mutually incompatible requirements:

                          1) The question we explicitly want to be unable to answer is:
                          "Given a user, X, which traces belong to that user?"
                          2) The question we explicitly *do* want to be able to answer is:
                          "We're trying to audit user X, to find out how many miles they've uploaded to the database"

                          One guy I work with said that in the past, he's used trapdoor hashes on the uniquely identifiable IDs. eg, it was a large database of drivers license numbers, with sexual habits and STDs for those individuals. What he did was push all the drivers license numbers through a trapdoor function. This left him with a dataset that he couldn't use to uniquely identify the people involved, but anyone with the original database ["what are the sexual habits and STDs of the person with drivers license number XXX"] could answer.

                          Of course, this is backward from us [that model explicitly was able to answer the question "Given a user, X, what traces belong to that user?"], but I think there's some potential there, for ideas of hashes and stuff.

                          Perhaps it would be useful to hash each actual trace, and attach the hash to a user. That way you'd be able to answer the question, for each trace in the system, which user uploaded it. That would make auditing all the users in the system doable on occasion [depending on number of traces in the sytem].

                          Another option that I had been considering was converting the data upon upload. Convert traces to just lat/lon/alt traces, and make that identifiable to a user. We'd need some trusted way to do this, though, and trust in computers is hard to come by. Showing that someone has, at some point, been at a location is a lot less useful than showing when they were there. But still, that's quite a thing for someone to be able to use as subpoenable evidence. Alternatively, just a miles conversion... the problem is a lack of auditability.

                          So, that's a couple ideas. Discussion?

                          Gary (-;
                          OBDGPSLogger, for logging OBDII and/or GPS data
                          OBDSim, an OBDII/ELM327 software simulator
                          mp3car forums: obdgpslogger, obdsim

                          Comment


                          • #14
                            alright, i got a little lost in some of it (i don't understand hashes), but here is my new idea (i think this is similar to what you are implying)

                            instead of using/tracking specific users, what if we were to allow the computer to track user numbers-- ie. everyone is assigned a user id #, but no personal info is required to get a number/ or is saved on the upload, just the user id. in theory, this would separate the critical data from the gps data, an would make it harder for anyone to find anyone.(unless you know who is assigned a specific user id)

                            using this idea, anyone that would want to put a image in their sig for how many tracks they uploaded would just need to link the " 'user id #' = 'tracks uploaded' ".

                            i realize that nothing is that simple, but thats the idea...right?
                            My OLD 2001 Mitsubishi Eclipse GT:
                            "The Project That Never Ended, until it did"


                            next project? subaru brz
                            carpc undecided

                            Comment


                            • #15
                              Simple is good, yes. There's two problems; one, there's still a linking table in a database somewhere, that pretty much voids any data privacy stuff.

                              The second problem is that if you don't have that linking table, how do you know that users aren't gaming the system? You need to be able to verify how much gps track [eg, in miles] users have uploaded.

                              Gary (-;
                              OBDGPSLogger, for logging OBDII and/or GPS data
                              OBDSim, an OBDII/ELM327 software simulator
                              mp3car forums: obdgpslogger, obdsim

                              Comment

                              Working...
                              X