Zeek installation failing to mime_type an ISO file

This malicious ISO is not being mime typed by my Zeek installation VirusTotal. I initially thought it was due to the magic string being outside of the bof_buffer however when I extended that buffer (and I see the string myself in fa_file), Zeek still doesn’t mime_type it. It’s this magic string:

ISO 9660 disk image
signature file-iso9660 {
file-mime “application/x-iso9660-image”, 99
file-magic /CD001/
}

found in base/frameworks/files/magic/general.sig

I’m not familiar with what event or function Zeek uses to determine a file’s mime-type so I can’t really give more info on it other than I see the magic string the bof_buffer field of fa_file but mime-type isn’t populated in either http.log or files.log.

I can provide a pcap if interested.

Hey Zach,

I think this is an issue with the signature actually, As you reported on Slack, the default_file_bof_buffer_size needs to be increased to cover the magic CD001 bytes. Additionally, my understanding is that all signatures are anchored to the beginning of the file stream and so this should really be .*CD001.

--- a/scripts/base/frameworks/files/magic/general.sig
+++ b/scripts/base/frameworks/files/magic/general.sig
@@ -300,5 +300,5 @@ signature file-windows-minidump {
 # ISO 9660 disk image
 signature file-iso9660 {
         file-mime "application/x-iso9660-image", 99
-        file-magic /CD001/
+        file-magic /.*CD001/
 }

Looking at the git history this signature wasn’t enabled until 2020 with a comment like:

Doubt it’s going to be common to have this many bytes buffered.

Let me open a PR and see what people think.

Thanks,
Arne

I wouldn’t have expected /CD001/ to be anchored at the beginning of the file stream, if it was anchored at the beginning I would have thought it would be /^CD001/. There are a lot of other signatures that use that anchor such as:

# Windows Minidump
signature file-windows-minidump {
    file-mime "application/x-windows-minidump", 50
    file-magic /^MDMP/
}

but you are indeed correct, if I expand the bof_buffer w/ that modified signature of /.*CD001/, zeek correctly mime-types the file.

I feel like the answer to this question is yes, but expanding the bof_buffer would likely increase Zeek’s memory usage exponentially right? I’m going to find some more examples of this file type to see if it’s a common occurrence to have 32000 bytes work of \x00 before getting to the magic string.

There are a lot of other signatures that use that anchor such as:

I believe they are all redundant. I’m a bit more worried about those that do not have anchors: Should they have a .* instead like CD0001 ?

Not clear why you say exponentially. But yes, if you increase from 4096 to 32k + wiggle, then for each file that is being transferred there’s more data held in the bof buffer, so it depends a bit on your average file size and the number of concurrent file transfers how much more ram you’ll be using.

Understood on the redundancy. I stated exponentially because I made a general assumption that jumping the buffer up x8 would just immediately increase memory usage because I’m assuming I’m seeing a lot of files. I’ll setup a few different sensors w/ increased buffers and perform some memory analysis on them.

Also, in my testing I noticed another oddity. If I expand the buffer to 32768 bytes and modify the iso9660 signature, Zeek still doesn’t mime-type the file even though I can see the magic string in the bof_buffer field of fa_file. Then if I expand it further to 40000 bytes, Zeek mime-types it with that updated signature. I can’t explain it because in both instances I see the magic string in the bof buffer.

I can’t explain it because in both instances I see the magic string in the bof buffer.

The buffer that’s used for actual MIME matching is hard-capped to the bof limit, I suspect if you print the size of the bof_buffer of the fa_file it is slightly larger because it’s created from chunks and the last one isn’t capped in size to fit the limit.

Not sure how strongly I feel about this, but given you were confused, maybe we should cap the bof_buffer size to the limit, too?

Ah I get it now, I calculated the byte size of what is printed to bof_buffer in fa_file and it is larger than what I set in a zeek script that did:

redef default_file_bof_buffer_limit = 32978;

I don’t think any changes are necessary for that except maybe some documentation here? base/init-bare.zeek — Book of Zeek (git/master)

Hmm, hmm. On the flip side: If one reduced the default_file_bof_buffer_size to 100 and the first payload fed into analysis was some 1000 bytes, the fa_file$bof_buffer would be 900 bytes too large. Seems capping would be the right thing to do, if only for consistency.

Hi @awelzel , I have another file typing question. I have this custom file signature loaded in Zeek to file type MS OneNote files:

/^xa1\x2f\xff\x43\xd9\xef\x76\x4c\x9e\xe2\x10\xea\x57\x22\x76\x5f/

The test file I have has this hex at the beginning of the file, verified through a hexdump as well as a custom java application I have that does file typing (it uses the same hex string for file typing). Zeek shows this in the bof_buffer instead of what I’d expect to see:

\xa1/\xffC\xd9\xefvL\x9e\xe2\x10\xeaW"v_\xcd6q

This isn’t valid hex but if I use this file signature in zeek, it correctly file types it because it’s what’s in the bof_buffer:

/\xa1\/\xffC\xd9\xefvL/

Do you know what I could do to debug Zeek to see why it’s not seeing the same hex as everything else or do you know what Zeek is printing here to the bof_buffer?

Thanks for all your assistance!

Hello @zrob12 ,

Zeek shows this in the bof_buffer instead of what I’d expect to see:
Do you know what I could do to debug Zeek to see why it’s not seeing the same hex as everything else or do you know what Zeek is printing here to the bof_buffer?

I think you may just be confused that printing the bof_buffer, Zeek will show bytes within the ASCII character unescaped.

For example, The \x43 is C and v is \x76 in the signature.

You could send the bof_buffer though bytestring_to_hexstr. That shows the individual hex bytes, though it’s not easier to read :slight_smile:

zeek -e 'print bytestring_to_hexstr("\xa1/\xffC\xd9\xefvL\x9e\xe2\x10\xeaW\"v_\xcd6q")'
a12fff43d9ef764c9ee210ea5722765fcd3671

Does that answer your question?

Edit, separately, if you print the signature bytes, it should look familiar, too:

$ zeek -e 'print "xa1\x2f\xff\x43\xd9\xef\x76\x4c\x9e\xe2\x10\xea\x57\x22\x76\x5f"'
xa1/\xffC\xd9\xefvL\x9e\xe2\x10\xeaW"v_

That does explain some things, I wasn’t sure exactly what I was looking at when I printed the bof_buffer. Does Zeek perform its signature detection on the bytestring and not the hexstring?

If I use this signature, Zeek does not mime-type the file

signature ms-onenote-3 {
        file-mime "application/x-ms-onenote", 100
        file-magic /xa1\x2f\xff\x43\xd9\xef\x76\x4c/
}

However if I do this, regex using escaped ASCII, it mime-types the file correctly

signature ms-onenote-3 {
	file-mime "application/x-ms-onenote", 100
	file-magic /\xa1\/\xffC\xd9\xefvL/
}

Nevermind, I see my mistake, I forgot to include the leading \ in the hexstring. This all makes sense now. Thanks for your help!

If these ms-onenote signatures work well for you, you may consider creating a zkg package so others can re-use them :slight_smile:

I never expected to see a .one file traverse the network when I had a request from my org to file type, then of course I missed this one due to my bug. From my understanding, you’d only see a .one file if someone exports their onenote notebook into an offline file, then uploads it across a network connection. Normal onenote files are simply XML files (I think, I could be very wrong). Once I get a full understanding of it I’ll see if I can make a zkg package.