str_split

Lorenzo_Cavallaro · May 24, 2008, 12:10am

Hi,

   I'd like to convert a string into an array of char (or a vector) so
   that it's possible to iterate over it (via the for stmt). Any idea
   about how to do it?

   I'm not sure if str_split is the right function but if so, I'm not
   sure what to use as index_vec argument. Iterate by using a set it'd
   be enough if I could generate the range of indexes belonging to the
   string...

TIA, bye
Lorenzo

Christian_Kreibich3 · May 28, 2008, 11:29pm

Lorenzo and I have been emailing off-list prior to his posting. I
believe what Lorenzo wants to do is match a regular expression against
flow content and obtain the matching part (or parts?) of the flow. For
example, if the regex is [0-9]{5}, he'd like to obtain the 5-digit
numerical string(s) that is/are present in the flow.

My understanding is that the signature_match() event does not guarantee
that all match-relevant data are actually passed to the event, so what
is the best option? Manual buffer management and regex matching via
{udp,tcp}_contents?

Vern · May 29, 2008, 5:23am

Given Christian's follow-on note, this might not be apt. But from
your original description, it sounds like split_all() will do the
trick. For example,

split_all("foobar", /./)

yields the table[count] of string:

  {
  [10] = a,
  [6] = o,
  [11] = ,
  [7] = ,
  [4] = o,
  [1] = ,
  [9] = ,
  [3] = ,
  [5] = ,
  [8] = b,
  [12] = r,
  [2] = f,
  [13] =
  }

which is enough to then iterate over each character.

Vern

Lorenzo_Cavallaro · May 29, 2008, 6:25am

Vern,

   thanks for the reply. I tried that before stucking on str_split.
   Unfortunately, I need to perform different operation depending on the
   character position that's why split_all didn't apparently give me any
   useful result.

   Actually, the string is splitted but, if I got it right, the order of
   splitted chars is not guaranteed. Moreover, what are those empty
   slots?

TIA, bye
Lorenzo

Vern · May 29, 2008, 2:58pm

Actually, the string is splitted but, if I got it right, the order of
splitted chars is not guaranteed.

Not quite. Since it's a table[count], it's implemented as a hash table,
and prints out in an arbitrary order. However, if you iterate from
1 .. its length, you can pull out its elements in order. (I thought there
was a built-in that makes doing this easy, but I'm not seeing it. )

Moreover, what are those empty
slots?

The characters of the original string!

The semantics of split_all is that it splits on anything that matches
the regular expression, returning the strings between the split points
intermingled with the separators themselves. For split_all("foobar", /./),
*every* character is a separator, so the strings between split points are
empty, and the original characters show up in the separator (even) slots.

Vern

Topic		Replies	Views
[JIRA] (BIT-1438) Code example from the documentation fails with "unknown identifier" error Development development	1	82	May 6, 2022
Potential bug with split_string_all? Zeek	1	79	May 6, 2022
how to split a substring from string? Zeek	2	96	May 6, 2022
Bro regex documentation Zeek	6	121	May 6, 2022
String as Vector of Bytes Zeek	3	101	May 6, 2022

str_split

Related topics