I want do a DLP scan, if for example:
Header contains a keyword1 AND the subject (or body) contains keyword2
Can you give an estimation off the performance drop and extra memory, when scanning through all the parts?
The memory requirements are not the biggest problem. However if the
extracted text is very large (several MBs), it might be that the regular
expression engine might choke on scanning with an expression which
requires the reg ex engine to keep track of all scanned characters (for
example FOO*.BAR).
The text extraction is split up into overlapping parts to allow scanning
of very large attachments. In future releases we will add support for
scanning through word, pdf, etc. and compressed files. In those cases
the size of extracted text can be extremely large especially when
scanning through compressed files.
And what would it take to change the code to do so?
The text extraction is done in the following java class:
mitm.application.djigzo.james.mailets.AbstractRegExpPolicyChecker
See method #serviceMail
Instead of doing a piece by piece scanning, this might be changed to
collecting all text into the textNormalizer and then do the scanning
(policyChecker.update(context)). The changes should be relatively easy.
I can see whether I can make this optional if I have the time.
A better alternative would be to add some post DLP check. All the found
DLP matches are stored in the Mail object. You might add some matcher
which checks whether there are multiple DLP matches.
Kind regards,
Martijn Brinkers
···
On 29-01-16 13:18, Maarten Bout wrote:
-----Oorspronkelijk bericht-----
Van: users-bounces(a)lists.djigzo.com [mailto:users-bounces(a)lists.djigzo.com] Namens Martijn Brinkers
Verzonden: vrijdag 29 januari 2016 3:15
Aan: users(a)lists.djigzo.com
Onderwerp: Re: Combining DLP in subject and body
On 28-01-16 16:09, Maarten Bout wrote:
Hello,
I'm wondering if there's a possibility to combine a DLP in the subject, and a DLP in the body.
Encryption should triggering on the words: trigger1, trigger2
For example
Subject contains: trigger1
Body contains: trigger2
I'm using the following regexp: trigger1.*(\n|.)*trigger2
When the subject or the body contains: trigger1 and trigger2, the email gets encrypted.
But when the subject contains: trigger1, and the body contains:
trigger2, the email doesn't get encrypted
Does anyone have any experience with this situation?
Unfortunately, matching on multiple message parts is not supported. For performance and memory reasons, a message is scanned part by part and the headers of the message are considered to be a separate part. So if you have a multipart message, every part of the message is scanned on it's own. In principle it should be possible to modify the code to combine all parts into one large part and scan the complete text. This however makes scanning slower and require more memory.
What is the kind of DLP scanning that you want to accomplish? Only DLP scan if the subject contains some string?
Kind regards,
Martijn Brinkers
--
CipherMail email encryption
Email encryption with support for S/MIME, OpenPGP, PDF encryption and secure webmail pull.
https://www.ciphermail.com
Twitter: http://twitter.com/CipherMail
_______________________________________________
Users mailing list
Users(a)lists.djigzo.com
https://lists.djigzo.com/lists/listinfo/users
_______________________________________________
Users mailing list
Users(a)lists.djigzo.com
https://lists.djigzo.com/lists/listinfo/users
--
CipherMail email encryption
Email encryption with support for S/MIME, OpenPGP, PDF encryption and
secure webmail pull.
https://www.ciphermail.com
Twitter: http://twitter.com/CipherMail