DLP and honeytokens
Four years has passed since I coined the term "honeytoken". I talked a lot about it at that time, Lance Spitzner and others from the honeypots field too. The subject, however, hasn't been discussed extensively during the last years.
Well, not until the DLP - Data Leakage Prevention - fever started. I used to perform some Google queries for "honeytoken" to know how the concept was being used, but I haven't been doing that for some months. It was a great surprise to see the results when I performed the same query today. It is obvious that honeytokens are a good way to perform some DLP functionality. I'm thinking about trying to build some kind of dynamic system to deploy and monitor them. Here is how they would work:
Imagine that you want a bunch of Office sensitive files to be monitored by the system. You point the files to the system and it starts to monitor them by integrating itself to the operating system of the server where the files are hosted. When a user requests one of those files the system will dynamically generate a honeytoken and include it in the file. The system will link this honeytoken to that specific user and include it in a list of strings monitored by the main enforcement points, like Proxy servers, firewalls, IDSes and other UTM devices. It can also use some kind of distributed agent on the workstations to verify what users are doing with those files. I know that it seems to be a description of a DRM system, but the aim here is not to control what the user can do, but only to monitor the information flow.
I know that there are vulnerabilities on this design, all of them were already discussed when DLP started to gain attention. However, I'd really like to see a DLP using this approach, as it wouldn't have to analyze the information, but only to look for honeytokens. They will probably be easier to deploy and faster. Is there anybody trying to do something like this?
Well, not until the DLP - Data Leakage Prevention - fever started. I used to perform some Google queries for "honeytoken" to know how the concept was being used, but I haven't been doing that for some months. It was a great surprise to see the results when I performed the same query today. It is obvious that honeytokens are a good way to perform some DLP functionality. I'm thinking about trying to build some kind of dynamic system to deploy and monitor them. Here is how they would work:
Imagine that you want a bunch of Office sensitive files to be monitored by the system. You point the files to the system and it starts to monitor them by integrating itself to the operating system of the server where the files are hosted. When a user requests one of those files the system will dynamically generate a honeytoken and include it in the file. The system will link this honeytoken to that specific user and include it in a list of strings monitored by the main enforcement points, like Proxy servers, firewalls, IDSes and other UTM devices. It can also use some kind of distributed agent on the workstations to verify what users are doing with those files. I know that it seems to be a description of a DRM system, but the aim here is not to control what the user can do, but only to monitor the information flow.
I know that there are vulnerabilities on this design, all of them were already discussed when DLP started to gain attention. However, I'd really like to see a DLP using this approach, as it wouldn't have to analyze the information, but only to look for honeytokens. They will probably be easier to deploy and faster. Is there anybody trying to do something like this?

1 Comments:
Vontu (and perhaps one other) DLP vendor use algorithms that can essentially treat each record in a database as "honeytoken". Instead of watching for any one specific honeytoken record leaving the perimeter, you can watch for any individual row of the database. The accuracy is high and the risk reduction results are tangible.
These techniques were developed back in 2001 before the term "honeytoken" was even in common use.
Similar techniques exist for unstructured/document data as well. In this case, the honeytokens are fragments of the text of the documents.
Both of these approaches are hugely more accurate than current approaches using pattern matching and/or regular expressions.
Post a Comment
<< Home