File extraction filters

Mike_Kolkebeck · July 28, 2014, 9:56pm

I have two questions on the file extraction framework:

1) If I only want to capture files from a specific worker or ip ranges, what is the best/simplest way to ensure that this happens?
-I've tried using f$info$tx_hosts with event file_new, but this seems inconsistently populated, and using f$conns with event file_new seems consistent, but I don't know if it's the best/simplest way.

2) If missing_bytes > 0, what is the best/simplest way to remove the file (and possibly clear it from logging a successful extract in the files.log file)?
-I've tested using event file_state_remove, and I can use system to rm the file, but again I'm not sure this is the best/simplest way, and the files.log continues to show this as extracted.

Siwek_Jon · July 29, 2014, 2:48pm

I have two questions on the file extraction framework:

1) If I only want to capture files from a specific worker or ip ranges, what is the best/simplest way to ensure that this happens?
-I've tried using f$info$tx_hosts with event file_new, but this seems inconsistently populated, and using f$conns with event file_new seems consistent, but I don't know if it's the best/simplest way.

In either case, I’d probably try using “file_over_new_connection” instead of “file_new” — it might end up not mattering for your use, but the fields you’re inspecting are more closely associated with the former event. A given file can technically be transferred over many different connections, depending on the protocol involved, so using “file_new” may not always give the full story since that’s only ever raised once for a given file.

Using f$info${tx,rx}_hosts may be better if transfer direction is important, otherwise f$conns should be fine.

2) If missing_bytes > 0, what is the best/simplest way to remove the file (and possibly clear it from logging a successful extract in the files.log file)?
-I've tested using event file_state_remove, and I can use system to rm the file, but again I'm not sure this is the best/simplest way, and the files.log continues to show this as extracted.

There’s the “file_gap” event that you might want to handle, call “Files::remove_analyzer”, then use a system call to rm the file, and finally “delete f$info$extracted;” to unset the field and prevent it from being logged in files.log.

- Jon

Mike_Kolkebeck · July 29, 2014, 3:38pm

Does "file_over_new_connection" fire at the same time as "file_new" when there is a new file? More specifically, will I ever lose any bytes by using this event over "file_new"?

Siwek_Jon · July 29, 2014, 4:37pm

“file_new” is immediately followed by at least one “file_over_new_connection” (if you’re dealing w/ only files extracted from the network), so there’s not a difference in terms of what bytes have been seen yet. But you may have to think about that event being raised more than once per file and possibly not at the start of a file after the first time, whereas “file_new” is guaranteed to be once at the start of a file. Not sure which will end up better/simpler for the code you’re writing, but hope that helps explain the differences.

- Jon

Topic		Replies	Views
- extracted filename with md5 Zeek	2	119	May 6, 2022
Zeek - Usecase based File Extraction Zeek	3	110	May 6, 2022
File extraction exclude local sites Zeek	2	77	May 6, 2022
File extraction package Zeek	4	194	May 6, 2022
Extract complete files Zeek	6	144	May 6, 2022

File extraction filters

Related topics