I’m trying to extract PDF files using the bro 2.0 beta, so I added the following line to share/bro/site/local.bro.
redef HTTP::extract_file_types = /application/pdf/;
However, no files are being extracted. And if I open up BroControl and print out that variable, I get this:
[BroControl] > print HTTP::extract_file_types
bro HTTP::extract_file_types = /^?(NO_DEFAULT)$?/
Is there another variable I need to set?
After you added the redef, did you do the check, install, restart dance in broctl? Brocontrol uses cached copies of the scripts so that the running scripts are only updated when you are ready with the "install" command.
Variables that you redef can also be modified at runtime with the "update" command so instead you could do check, install, update. If you use the print command before and after you should see the change reflected. There is a bug in the HTTP file extraction in the beta too where it only extracts an initial chunk of the file, it's fixed in the git repository already though.
Files will also be extracted to the spool/bro directory too (assuming you haven't changed your node.cfg) and I don't know how they will be handled upon file rotation. We haven't had time to put a lot of thought to live traffic file extraction on clusters or with BroControl so behavior is a little unknown currently.
Ah, apparently I have two left feet, since I didn’t do the check and install part of the dance.
And thanks for the tip on the HTTP extraction bug, that explains why every pdf is only 1500 bytes. :o) I’ll grab the update from the GIT repo.
Ah, apparently I have two left feet, since I didn't do the check and install part of the dance.
We don't have it documented very clearly yet (unless I'm mistaken?) so don't feel bad.
And thanks for the tip on the HTTP extraction bug, that explains why every pdf is only 1500 bytes. :o) I'll grab the update from the GIT repo.
It's in master so after you clone the repository it should already be in place.