Split path into directory and filename

Is there a way use regex to extract portions of a string? I'm trying to
write a function that accepts a path and breaks it into a directory and
filename (/tmp/file.txt => [ /tmp, file.txt ]). I would like to do
something as easy as /(\/.+)/([^\/]+)$/, but am not sure it's possible
with bro (I wrote the expr quick so there are probably typos).

Right now I have the following, but wondering if there is a better way:

function path_split(path: string): string_array {
        local cpath = split(path, /\//);
        local ret_val: string_array;

        ret_val[2] = cpath[length(cpath)];
        delete cpath[length(cpath)];
        ret_val[1] = join_string_array("/", cpath);

        return ret_val;
}

The reason I ask is I'm looking to modify the http/file-extract.bro
script so that the http responses are saved into a directory structure
based on the src and dst ip addresses (e.g. http-items/src_ip/dst_ip).
I plan to modify the generate_extraction_filename to create this path
and then send the filename to a function to create the directory
structure. (I know that modifying generate_extraction_filename will have
adverse affects on other scripts, but I plan to update those as well.)

If anyone cares, here is the function I wrote to recursively create the
directory structure.

function mkdirs(dir: string): bool {
        local path_split = split1(dir, /\/[^\/]*$/);
        local parent = path_split[1];

        if ( parent == "" || length(path_split) == 1 )
                return mkdir(dir);
        else {
                if ( ! mkdirs(parent) )
                        return F;
                return mkdir(dir);
        }

        return T;
}

Thanks in advance.

Is there a way use regex to extract portions of a string? I'm trying to
write a function that accepts a path and breaks it into a directory and
filename (/tmp/file.txt => [ /tmp, file.txt ]). I would like to do
something as easy as /(\/.+)/([^\/]+)$/, but am not sure it's possible
with bro (I wrote the expr quick so there are probably typos).

Nope, Bro's regular expressions don't support captures. You did it exactly the same way that I would have, by splitting on /\// and taking the last value as the file name and the rest as the path.

The reason I ask is I'm looking to modify the http/file-extract.bro
script so that the http responses are saved into a directory structure
based on the src and dst ip addresses (e.g. http-items/src_ip/dst_ip).

Ah, that's interesting. We need to rework the way that works to put more control of the file naming in users hands, it's a definite shortcoming in the current iteration. I'll refactor it a little bit soon so that you can accomplish what you want without having to rewrite bits of functionality. :slight_smile:

I plan to modify the generate_extraction_filename to create this path
and then send the filename to a function to create the directory
structure. (I know that modifying generate_extraction_filename will have
adverse affects on other scripts, but I plan to update those as well.)

Yeah, I generally don't like the way I wrote that.

function mkdirs(dir: string): bool {

Thanks for this function. I'll integrate it in some form soon.

Since I see that using the code from the repository, I'd be happy to find how your experience with it has been if you are interested in sharing.

  .Seth

** Seth Hall <seth@icir.org> [2011-08-15 09:20:59 -0400] **

> The reason I ask is I'm looking to modify the http/file-extract.bro
> script so that the http responses are saved into a directory structure
> based on the src and dst ip addresses (e.g. http-items/src_ip/dst_ip).

Ah, that's interesting. We need to rework the way that works to put
more control of the file naming in users hands, it's a definite
shortcoming in the current iteration. I'll refactor it a little bit
soon so that you can accomplish what you want without having to
rewrite bits of functionality. :slight_smile:

No need to spend your time doing it. I got it working over the weekend.
I updated the generate_extraction_filename to include a directory path
as the first argument... and then left everything else the same. After
obtaining the filename to use, I call the mkdirs command to create the
directory structure. I also updated the file-extract.bro script to
extract the client request payload as well. I'll try to attach my
updated scripts to this email, but it they are stripped let me know and
I'll send them to you directly.

One thing I did notice over the weekend was a potential problem in
file-extract (I'm using current as opposed to 1.5) with respect to http
POST requests. The file-extract script watches for first_chunk = T
before it starts capturing data, however with POST requests the
first_chunk is set, and subsequently set to F, within the client
request. Once the response gets processed, the first_chunk is F and the
payload is never saved (Hopefully that makes sense). I fixed this by
creating the following event which resets the first_chunk and mime_type
in preparation for the response. The -15 priority will make sure that it
executes AFTER logging the message to the logfile.

event http_message_done(c: connection, is_orig: bool,
        stat: http_message_stat) &priority=-15 {
            c$http$first_chunk = T;
            delete c$http$mime_type;
}

Since I see that using the code from the repository, I'd be happy to
find how your experience with it has been if you are interested in
sharing.

I'm not sure why I started playing with the current version in the repo
as opposed to 1.5, but I like it. The way the scripts are loaded and the
directory structure makes much more sense to me as opposed to having
them all in one directory. I also like the addition of the __load__.bro
scripts. As seen above in my fix for the http POST problems, the new
overloaded delete operator was a nice addition which made solving the
problem almost trivial. So far, I haven't seen any problems with the
current, but I have been running it one pcap files as opposed to live
traffic.

ext-files.bro (484 Bytes)

ext-paths.bro (777 Bytes)

file-extract.bro (2.22 KB)