File extraction filters

I have two questions on the file extraction framework:

1) If I only want to capture files from a specific worker or ip ranges, what is the best/simplest way to ensure that this happens?
-I've tried using f$info$tx_hosts with event file_new, but this seems inconsistently populated, and using f$conns with event file_new seems consistent, but I don't know if it's the best/simplest way.

2) If missing_bytes > 0, what is the best/simplest way to remove the file (and possibly clear it from logging a successful extract in the files.log file)?
-I've tested using event file_state_remove, and I can use system to rm the file, but again I'm not sure this is the best/simplest way, and the files.log continues to show this as extracted.

I have two questions on the file extraction framework:

1) If I only want to capture files from a specific worker or ip ranges, what is the best/simplest way to ensure that this happens?
-I've tried using f$info$tx_hosts with event file_new, but this seems inconsistently populated, and using f$conns with event file_new seems consistent, but I don't know if it's the best/simplest way.

In either case, I’d probably try using “file_over_new_connection” instead of “file_new” — it might end up not mattering for your use, but the fields you’re inspecting are more closely associated with the former event. A given file can technically be transferred over many different connections, depending on the protocol involved, so using “file_new” may not always give the full story since that’s only ever raised once for a given file.

Using f$info${tx,rx}_hosts may be better if transfer direction is important, otherwise f$conns should be fine.

2) If missing_bytes > 0, what is the best/simplest way to remove the file (and possibly clear it from logging a successful extract in the files.log file)?
-I've tested using event file_state_remove, and I can use system to rm the file, but again I'm not sure this is the best/simplest way, and the files.log continues to show this as extracted.

There’s the “file_gap” event that you might want to handle, call “Files::remove_analyzer”, then use a system call to rm the file, and finally “delete f$info$extracted;” to unset the field and prevent it from being logged in files.log.

- Jon

Does "file_over_new_connection" fire at the same time as "file_new" when there is a new file? More specifically, will I ever lose any bytes by using this event over "file_new"?

“file_new” is immediately followed by at least one “file_over_new_connection” (if you’re dealing w/ only files extracted from the network), so there’s not a difference in terms of what bytes have been seen yet. But you may have to think about that event being raised more than once per file and possibly not at the start of a file after the first time, whereas “file_new” is guaranteed to be once at the start of a file. Not sure which will end up better/simpler for the code you’re writing, but hope that helps explain the differences.

- Jon