Currently we are using file_sniff events plus some conditional logic to selectively invoke Files::ANALYZER_EXTRACT on some subset of files for further analysis. However, this approach leads to a lot of duplication and we would like to hone it down - specifically, exclude files based on known hashes. So I tried:
file_sniff event -> sometimes invoke Files::ANALYZER_MD5
file_hash event -> after using some logic to make sure this is the hashing event triggered by the previous step, then try to invoke the full Files::ANALYZER_EXTRACT
but this approach results in
"Reporter::WARNING","message":"Analyzer Files::ANALYZER_EXTRACT not added successfully to file
which, based on these threads I found...
https://lists.zeek.org/archives/list/zeek@lists.zeek.org/thread/UUIA4PN4D24PNG5FG6TFRVGCC3VJTDN3/#UUIA4PN4D24PNG5FG6TFRVGCC3VJTDN3
https://lists.zeek.org/archives/list/zeek@lists.zeek.org/thread/AVODJCKRGC34JJOMTYUPZR2C76FOSDHS/#AVODJCKRGC34JJOMTYUPZR2C76FOSDHS
...what I got sounds like an expected result - the ANALYZER_EXTRACT call is "too late" in the event lifecycle for the file - presumably because a maximum of one ANALYZER submission per file is supported - but I'm still not clear on exactly why.
Is there documentation somewhere on the file analysis / event lifecycle that documents when and how file analysis can be triggered, and the limitations that appear to be implicit?
It also seems like this is a common enough use case that someone must have solved this problem at some point in a more elegant way than the threads I found have proposed (extract all, then delete some).
Hoping someone has some insights here they would be willing to share?