Hello,
==Summary==
We are experiencing different DLP behavior for complex RegEx between two installations.==System==
Version: ciphermail-virtual-appliance-2.10.0-3.
1. Ubuntu pre-made virtual appliance (on my laptop)
2. Red Hat & CentOS gateway package (on a test server)==Configuration==
DLP: several triggers with "Must Encrypt"
Settings: Encrypt Mode "No Encryption"
Settings: DLP Patterns added==Example==
We want to search a message for [any text][four numbers][any text]
So we try this RegEx: *.\d{4}.*This works perfectly on the Ubuntu VA, but it encrypts EVERY message on CentOS.
Everything is back to normal when we disable the complex RegEx on CentOS.We also tried to search for a little more simple like: [0-9][0-9][0-9][0-9]
Ubuntu version is fine, CentOS version encrypts every message.==DLP Trigger Comparison ==
Ubuntu version:
- Single words work as expected
- Mail header works as expected
- Complex *.\d{4}.* works as expectedCentOS version:
- Single words work as expected
- Mail header works as expected
- Complex *.\d{4}.* works DIFFERENTDoes anyone have experience with this situation?
Is our installation perhaps incorrect?
It's quite likely that a message contains 4 digits. Could it be that the
mail sent via the CentOS gateway is sent with some other mail app than
the mail sent via the virtual appliance?
We will look at this tomorrow, but I'm quite sure it is a default
intallation as described in the CipherMail guide.
The DLP text extractor also extracts header values. So for example a
date header will also be extracted. Since almost all mails contain a
date header, almost any mail will contain 4 digits.
That's true. The original is 8 digits (simulate Dutch Personal Id)
but I get the point. What I don't understand (yet) is that my testing
method & messages are the same on Ubuntu and CentOS and that it works
on the Ubuntu version.
If you have the "raw" MIME content, you can see what text the DLP
engine see during scanning by uploading the MIME message to the "extract
text" tool (Admin -> other -> extract text). The "extract text" tool
will return the normalized text.
So we try this RegEx: *.\d{4}.*
If you want to trigger on 4 digits, you should use \d{4} , i.e., skip
the .* part. The .* is not needed, it will make scanning slower. The reg
exp is not required to match the complete text, i.e. .* is kind of
implicitly added to any reg ex.
BTW I made a typo in the mail, it was afcourse .* but I think you
saw that Okay, we will test without the wildcards. But when
I used 9 times [0-9] without wildcards, all messages were encrypted.
But again that could be because 9 numbers is in the headers...
Hmm, anyway, thanks for your support, we will try some more tests.
Kind regards,
CipherMail support
···
Cheers,
Raymond Bakker | Integration Consultant
T +31 (0)10 288 1600
M +31 (0)6 2222 5515
E raymond.bakker(a)vanadgroup.comVANAD Enovation
Rivium Westlaan 1
2909 LD Capelle aan den IJssel
The NetherlandsWebsite | Facebook | LinkedIn | Twitter
This e-mail is personal. For our disclaimer, please visit www.vanadgroup.com/disclaimer