I wanted to make my disk-bound queries faster, so I wanted the fewest
files to search through for tm because it appears that every separate
file makes the interval searches in pcapnav slower if you're
requesting many packets. I found than when setting filesize > 289g,
tm creates a file per connection and trashes its working directory.
So two questions: am I right in thinking it is faster to search
through as few files as possible when using pcapnav? And secondly,
does anyone know why tm breaks when trying to create files larger than
289g?
I'm don't think that pcapnav speed is significantly influenced by
filesize. AFAIK pcapnav jumps to a random file offset, then tries to
sequentially read until it finds something that looks like a pcap
header. Then it checks the timestamp and reads sequentially or jumps
somewhere else until it finds the request timestamp.
If you have multiple files, then this is repeated for each file.
However, the TM knows which files cover which time periods, so it will
only access the files that it knows are candidates. So I would assume
that the lookup speed should be similar. I think that the specifics of
the query-result influence speed much more (e.g., is it only a single,
narrow time interval to search, or multiple small ones, or a few large
ones that cover almost the whole dataset).
Long story short: the number of files to search should not influence the
speed much.
If the number of files is huge, then the only thing I could imagine is
weird filesystem stuff going on when there are 1000s of files in one
directory and.....
OTOH, if the filesize is too large wrt the configured diskspace, the TM
will get troubles. It will delete old files, if writing more data (or
creating a new data file, can't recall which of the two). So if the data
files are huge, this will introduce quite some variance in diskspace usage.
That said: the TM definitely should not trash its working directory.....
Do I understand you correctly that you get a myriad of files in the
working directory. Do the files contain only a single (or handful) of
packets (possible from different connections). How many packets per file?
Also, how does your filesize relate to the configured disk-space?
My performance issues were noticed when making a query over a large
timeset with many packets involved. Since there is no way to specify
a limit of packets returned, the query takes forever. I was looking
to improve that performance. I will continue to play around with this
to see if there is any improvement worth the large hit for file
rollover.
With filesize set at exactly 280g (279g does not produce the problem)
tm will create one disk fifo file per packet in the workdir for each
evicted packet with a disk setting of 1000g. I am only using one
default class for "all."
With filesize set at exactly 280g (279g does not produce the problem)
tm will create one disk fifo file per packet in the workdir for each
evicted packet with a disk setting of 1000g. I am only using one
default class for "all."
That sounds like something is wrapping and going negative at the 2^38 barrier.