File extraction after checking hash.

Can’t you simply write a script that calls file extract at a later date? I would think to hook it into file intel which runs after the file analysis (its comparing hashes) and extract at that point, not before…

I've been thinking about some potential directions we could go that might open the door to doing this in some cases for the next release, but for now imagine that your file is 10G. We can't keep that much data in memory but you don't know the file hash until you've seen every byte of that file. You can't choose to extract the file at the end because all of the content for that file is already gone. You'd have to extract it up front and make the decision to keep it or delete it after the fact.

  .Seth

Hm, good point. Is there somewhere in the analysis framework where you can say, if a file is above x bytes, kill the analysis process? I ask, because I see this as somewhat related to the gridftp problem at lbl. If we have large tarballs or zip files or whatever crossing the wire, killing those off at say, a 5 gig point or so, seems reasonable. As you mentioned that is quite a lot of memory being consumed by extraction. :confused:

Hm, good point. Is there somewhere in the analysis framework where you can say, if a file is above x bytes, kill the analysis process? I ask, because I see this as somewhat related to the gridftp problem at lbl. If we have large tarballs or zip files or whatever crossing the wire,

Yeah, I've been thinking about this problem for a while and I might take a stab at addressing it in 2.6 (although there will be loads of caveats!).

killing those off at say, a 5 gig point or so, seems reasonable. As you mentioned that is quite a lot of memory being consumed by extraction. :confused:

Now what if you have 20 5gig transfers going on concurrently? :slight_smile:

  .Seth

I think following could be used to some extent for crude analyses of the file on wire (please correct me if m wrong):

event: file_extraction_limit Type: event (f: fa_file, args: any, limit: count, len: count) Desc: This event is generated when a file extraction analyzer is about to exceed the maximum permitted file size allowed by the extract_limit field of Files::AnalyzerArgs. The analyzer is automatically removed from file f.
Files::remove_analyzer Type:function (f: fa_file, tag: Files::Tag, args: Files::AnalyzerArgs &default =[chunk_event=, stream_event=,extract_filename=, extract_limit=104857600] &optional) :bool

Files::stop
Type:function (f: fa_file) : bool
Stops/ignores any further analysis of a given file.

That event is only if the maximum file size that you set for the file when you attached the extraction analyzer is about to be crossed. You would still have to start extracting the file for this event to happen.

  .Seth

Hmm, got it! :slight_smile: