Same boat here, as far as starting logging...
Also, before I finalize a release of even the simplest of user apps, I'd need to know what can/can't be used.
And on the topic of that, I'm working on that app, in hopes that I can specify a singular lat/lon point, and then filter out a square chunk of data within X-distance of it. (and have multiple points like this, stored locally on the user's computer)
Chunkyks, Did anyone from work give you any ideas? If not, who would be the expert to ask about this? We might be able to get them to solve the problem for free as part of a blog post.
Originally Posted by chunkyks
Sorry, this completely slipped my mind. <walks off for twenty minutes and talks to some coworkers>
After some thought and discussion, the root problem is that we have two mutually incompatible requirements:
1) The question we explicitly want to be unable to answer is:
"Given a user, X, which traces belong to that user?"
2) The question we explicitly *do* want to be able to answer is:
"We're trying to audit user X, to find out how many miles they've uploaded to the database"
One guy I work with said that in the past, he's used trapdoor hashes on the uniquely identifiable IDs. eg, it was a large database of drivers license numbers, with sexual habits and STDs for those individuals. What he did was push all the drivers license numbers through a trapdoor function. This left him with a dataset that he couldn't use to uniquely identify the people involved, but anyone with the original database ["what are the sexual habits and STDs of the person with drivers license number XXX"] could answer.
Of course, this is backward from us [that model explicitly was able to answer the question "Given a user, X, what traces belong to that user?"], but I think there's some potential there, for ideas of hashes and stuff.
Perhaps it would be useful to hash each actual trace, and attach the hash to a user. That way you'd be able to answer the question, for each trace in the system, which user uploaded it. That would make auditing all the users in the system doable on occasion [depending on number of traces in the sytem].
Another option that I had been considering was converting the data upon upload. Convert traces to just lat/lon/alt traces, and make that identifiable to a user. We'd need some trusted way to do this, though, and trust in computers is hard to come by. Showing that someone has, at some point, been at a location is a lot less useful than showing when they were there. But still, that's quite a thing for someone to be able to use as subpoenable evidence. Alternatively, just a miles conversion... the problem is a lack of auditability.
So, that's a couple ideas. Discussion?
alright, i got a little lost in some of it (i don't understand hashes), but here is my new idea (i think this is similar to what you are implying)
instead of using/tracking specific users, what if we were to allow the computer to track user numbers-- ie. everyone is assigned a user id #, but no personal info is required to get a number/ or is saved on the upload, just the user id. in theory, this would separate the critical data from the gps data, an would make it harder for anyone to find anyone.(unless you know who is assigned a specific user id)
using this idea, anyone that would want to put a image in their sig for how many tracks they uploaded would just need to link the " 'user id #' = 'tracks uploaded' ".
i realize that nothing is that simple, but thats the idea...right?
Simple is good, yes. There's two problems; one, there's still a linking table in a database somewhere, that pretty much voids any data privacy stuff.
The second problem is that if you don't have that linking table, how do you know that users aren't gaming the system? You need to be able to verify how much gps track [eg, in miles] users have uploaded.
This is way beyond our area of expertise. If anyone from the forum has any ideas, please throw them out here. If not I can work on getting some guest writers to come by and throw in their two cents.
This is definitely nothing we have to deal with at work... In fact we're forced the opposite way: All data has to be directly linked to a user (though not always visible to the users).
I talked with Bugbyte about this on the phone. he has been on vacation the last two weeks which has limited his forum posting. Hopefully I am paraphrasing this correctly. He suggested the idea of encrypting the GPS trails where the user holds the decryption key and the user decides what to do with the data and if and when to delete it.
After further thought, I don't know that this addresses the problem where a user gave the server permission to use the trails for map creation or some other harmless permission, i don't see how that gets us out of the privacy or subpoena trap. I would think we would still have access to the data after the initial permission was given.
We could ask people to daily opt in, but that eliminates the passive nature of the data collection. Maybe Bugbyte can elaborate when he gets back from vacation later in the week.
I just talked with Peter Wayner, author of translucent databases.
He seemed to think we were off to a good start here with the ideas we have. Things we talked about were:
- creating an anonymous user name
- having the client randomly delete the first and last few minutes of every trail. He made the point to keep the amount deleted random to prevent revealing a range of potential starting points.
- Keep adding privacy features until the majority of the community is happy
If vacation plans allow, Peter is going to try to swing by our meet up on August 22nd.
So maybe we could start with a server or client side application that strips off a random amount of the beginning and the end of each trail. Each trail would also be associated with a random userid. We should also allow users to upload their full data if they have less privacy concerns. Maybe there could be a slider bar depending on how much privacy you want. Does that make sense?