Connection/flow not detected in new_connection but in connection_state_remove

I was trying to print new connection detected from the below pcap file

The connection/flow from src port 20000 to dst port 20000 is not detected in new_connection (total 10 connections detected) but connection_state_remove showing that connection/flow (total 11 connections). Why so?

Regards,
Subhajit

Hmmm when I use the following script it prints 11, 11. This is with Zeek 5.0.5.

global new_c = 0;
global remove_c = 0;

event new_connection(c: connection)
	{
	++new_c;
	}

event connection_state_remove(c: connection)
	{
	++remove_c;
	}

event zeek_done()
	{
	print new_c, remove_c;
	}

Yes, sorry the question is incomplete . The issue is coming when I published the new_connection data to a python module using broker framework. Locally inside zeek it is printing 11 new conn and 11 in remove conn, but when in the same script I connect to a peer and sending the new_connection data, the first connection information is somehow lost and not received in the python script. I have started the python listener first then starting zeek with the pcap and zeek scripts . I am suspecting that first event is getting fired before peer is connected (peer connect code inside zeek_init()) and event new connection func is also in the same script from where I am publishing that event.
Thanks

Hello @biswa ,

I am suspecting that first event is getting fired before peer is connected (peer connect code inside zeek_init()) and event new connection func is also in the same script from where I am publishing that event.

You’re most likely on the right track here. What you can do is to suspend_processing() within zeek_init() and then call continue_processing() within Broker::peer_added when the process reading the pcap observes the Python side showing up as peer.

There’s a glitch with this approach: If you call suspend_processing(), the network_time() will end-up being the current time instead of pcap time. I’ve very recently proposed to change this. If you are confused/surprised by the observed timestamps, this might be why. Feel free to provide details of your use-case or chime in on the PR.

This is fairly advanced Zeek/Python/broker integration use-case, nice! :slight_smile:

Another peculiarity that might be relevant here is the fact that Zeek will always process the first packet to initialize network time. Events generated by this packet might be handled before broker is peered.

Oh! Thanks for that! This seems a bit random and “peculiarity” seems closer to “issue”. The mentioned PR actually changes this and at least Zeek’s test suite appears unaffected while it makes @biswa’s use-cases rather difficult. Maybe we get let go of that with 6.0.

Hi,

I have tried with suspend_processing() inside zeek_init() and continue_processing() inside peer_added(), but it seems the first new_connection() event is getting fired before that and because it is not yet established, peer is not receiving that message.

**NEW**, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
**PEER ADDED**, [id=212f24f5-6aa3-567b-aa22-23d19479859e, network=[address=127.0.0.1, bound_port=60000/tcp]]
NEW, [orig_h=10.10.20.5, orig_p=55355/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55356/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55357/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55358/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55359/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55361/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55362/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55363/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55366/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=55370/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55355/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55356/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55357/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55358/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55359/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55361/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55362/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55363/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55366/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=55370/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
event zeek_init() {
    suspend_processing();
    Broker::peer(addr_to_uri(127.0.0.1), 60000/tcp);
}
 
event Broker::peer_added(ep: Broker::EndpointInfo, msg: string)
{
        print "PEER ADDED", ep;
        continue_processing();
}
event new_connection(c: connection)
{
        print "NEW", c$id;
        Broker::publish(my_topic, new_conn_added, c);
}
event connection_state_remove(c: connection)
{
        print "REMOVE", c$id;
        Broker::publish(my_topic, conn_removed, c);
}

Please check if I am doing any mistake.

Thanks

Conceptually you’re not doing anything wrong. This is the unfortunate behavior of Zeek 5.2.0 in zeek -r mode. If it’s possible for you to build Zeek from master (or use the docker pull zeek/zeek-dev), the most recent version does not exhibit this behavior anymore. The previously mentioned network_time() issue still persists, so you might run into that next.

Hi,
I have built zeek from source , branch taken master only.
Version showing 6.0.0-dev.64

Thanks

Hi,
I have compiled latest zeek and installed.
I am getting two issues …

  1. Although the first packet handling is fixed (coming new_connection after peer_added) but the code is going into an infinite loop.

PEER ADDED, [id=3c0a8567-cbd5-5f26-b7b8-972ce4c0cde1, network=[address=127.0.0.1, bound_port=60000/tcp]]
NEW, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.8, orig_p=20000/tcp, resp_h=10.10.20.5, resp_p=20000/tcp]
NEW, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]
REMOVE, [orig_h=10.10.20.5, orig_p=20000/tcp, resp_h=10.10.20.8, resp_p=20000/tcp]

  1. Version :
    /usr/local/zeek/bin/zeek --v
    /usr/local/zeek/bin/zeek version 6.0.0-dev.250

  2. Using same zeek scripts, pcap etc.

Running using

/usr/local/zeek/bin/zeek -Cr scripts command

  1. Another issue : if running python broker script before zeek, then peer is not getting connected. It is only working if python broker script is running beforehand.
    Thanks

Although the first packet handling is fixed (coming new_connection after peer_added) but the code is going into an infinite loop.

I’m not able to reproduce this. I’m putting two scripts for testing at the end. If you post your full Zeek script / Python script for reproduction that could help, but maybe below gives enough to start with.

Your output looks like maybe there’s another subscription or publish happening and you have a message loop. Any chance you’re using auto_publish or re-publish messages back on the Python side? What topic name are you using?

Another issue : if running python broker script before zeek, then peer is not getting connected. It is only working if python broker script is running beforehand.

I suspect that this is just the overly long default retry connect interval. Below script sets it to 1sec and either way around works.


These are the scripts for testing. Except for the network_time() being set to current time, things seem reasonable well with the version you’re using here.

# biswa.py 
import sys
import broker

# Setup endpoint and connect to Zeek.
with ( broker.Endpoint() as ep,
     ep.make_subscriber("/topic/connections") as sub,
     ep.make_status_subscriber(True) as ss):

    ep.listen("127.0.0.1", 60000)

    # Wait until connection is established.
    print("waiting for peer")
    st = ss.get()
    print(st, st.code())

    print("waiting for messages")
    while True:
        (t, d) = sub.get()
        # Assume we only get events here that have a connection record
        # as their first parameter. We don't have field names...
        # https://github.com/zeek/zeek/discussions/2879
        conn_id = d[2][1][0][0]
        print("received", t, conn_id)
# biswa.zeek 

# More timely reconnects (default: 30 seconds)
redef Broker::default_connect_retry = 1sec;

global my_topic = "/topic/connections";

global new_conn_added: event(c: connection);
global conn_removed: event(c: connection);

event zeek_init() {
    suspend_processing();
    Broker::peer(addr_to_uri(127.0.0.1), 60000/tcp);
}

event Broker::peer_added(ep: Broker::EndpointInfo, msg: string) {
        print network_time(), "PEER ADDED", ep;
        continue_processing();
}

event new_connection(c: connection) {
        print network_time(), "NEW", c$id, c$start_time;
        Broker::publish(my_topic, new_conn_added, c);
}
event connection_state_remove(c: connection) {
        print network_time(), "REMOVE", c$id, c$start_time;
        Broker::publish(my_topic, conn_removed, c);
}

event Pcap::file_done(path: string) {
        print network_time(), "DONE", path;
        terminate();
}

Hi,
Please check the scripts to generate the issue .

Zeek script

biswa@biswa:~/zeek$ cat /usr/local/zeek/share/zeek/site/my-test/main.zeek

global my_topic = “/topic/test”;

global new_conn_added: event(c: connection);
global conn_removed: event(c: connection);
global conn_reused: event(c: connection);

event zeek_init() &priority=5 {

suspend_processing();

Broker::peer(addr_to_uri(127.0.0.1), 60000/tcp);
Broker::subscribe(my_topic);

}

event my_attack_events(c: connection)
{
print(cat("Got events from Py script: ", c));

}

event Broker::peer_added(ep: Broker::EndpointInfo, msg: string)
{
print “PEER ADDED”, ep;
continue_processing();
}

event new_connection(c: connection)
{
print “NEW”, c$id;
Broker::publish(my_topic, new_conn_added, c);
}

event connection_state_remove(c: connection)
{
print “REMOVE”, c$id;
Broker::publish(my_topic, conn_removed, c);
}

event connection_reused(c: connection)
{
Broker::publish(my_topic, conn_reused, c);
}

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Python script:-

(broker) biswa@biswa:~/broker_demo$ cat broker_test.py
import sys
import broker

with broker.Endpoint() as ep,
ep.make_subscriber(“/topic/test”) as sub:

 ep.listen("127.0.0.1", 60000)

 while True:
    (t, d) = sub.get()
    my_event = broker.zeek.Event(d)
    print("received {}  --   {}".format(my_event.name(), my_event.args()))

    my_new_events = broker.zeek.Event("my_attack_events", my_event.args()[0]);
    ep.publish("/topic/test", my_new_events);

Please check the scripts to generate the issue .

Hmm… It’s working here without triggering an endless loop. Could you provide the full command you’re running? Please also ensure that the Python program is using the broker bindings from your most recent Zeek version (I’m setting PYTHONPATH explicitly).

I doubt it’s caused by your pcap, but if you can, sharing it won’t hurt.

Hi,

  1. Regarding python binding : if you check from the last dump shared that I am running the python script from my zeek/auxil/broker path itself. I have created virtual environment inside the auxil directory of zeek master branch. I have pulled all submodules (broker) as well so that both zeek and broker become latest. Broker compiled successfully and python binding also. The script is connecting fine with the zeek (no incompatibility issue, also if version mismatch python binding also fails generally). Only issue beside the infinite loop, I am getting is if I run python script after zeek, that is not connected anyway.

  2. Pcap : icsnpp-dnp3/dnp3_example.pcap at main · cisagov/icsnpp-dnp3 · GitHub

Thanks

Zeek commands

/usr/local/zeek/bin/zeek -Cr <package name eg. my_package, I copied the zeek scripts eg. /use/share/zeek/site/my_package>

Running Python script inside the venv of broker-python binding

python3 broker_test.py

What do you mean by explicitly setting PYTHONPATH?

@biswa - sorry for the delay. I think I’m now understanding what you’re seeing. I’ve used your pcap and observe a lot of new_connection / connection_state_remove events (This is not an endless loop, but it’s many more events than expected).

This is Zeek still (grrr) forwarding network time to realtime in zeek -r mode when Broker and/or suspend_processing() is involved.

With the latest Zeek master version since yesterday (zeek version 6.0.0-dev.271), you can put the following into your zeek_init() as a workaround:

event zeek_init() &priority=5 {
  # Workaround for Broker setting network time to realtime
  # alternatively, put a "redef allow_network_time_forward=F"
  # at the beginning of the file. 
  set_network_time(double_to_time(0.001));
  suspend_processing();
  ...
}

I’ll propose a fix for this soon so that should not be needed anymore :crossed_fingers:

Now, when you do zeek -r and use broker, Zeek doesn’t terminate at the end of the pcap by default and network time stops moving forward. This means connection related timers don’t expire. Usually that happens during shutdown. So you likely need to force timer expiry by moving network time forward by a few hours once the pcap has been fully read.

event Pcap::file_done(p: string)
    {
    # Done reading pcap, forward network_time() by
    # 2 hours to expire connection related timers now.
    # Alternative would be to call terminate()
    # here as that would trigger the timer draining, but
    # then we would likely miss some broker messages as we
    # exited.
    set_network_time(network_time() + double_to_interval(2 * 3600));
    }

I hope this works for you and reach out if you observe more things weird. We’ve been changing a few things in that are very recently.

Hi @awelzel ,
Can you please elaborate the issue ? I am not able to understand why there are lots of new or remove conn coming for the same connection !
Can you please post the commit or PR id where you made the fixes for this.
Thanks

Can you please elaborate the issue ? I am not able to understand why there are lots of new or remove conn coming for the same connection !

Currently, when Zeek processes the first Broker messages/commands, there’s a condition that checks whether network time is initialized already. If it is not, broker manager decides to set network time to the current time (wallclock). Even if reading traces via zeek -r and suspend_processing() where this isn’t really applicable.

What then happens is that each individual packet from the pcap (which might be weeks, months or even years old) creates a new connection. The first packet creates a connection which is immediately considered timed-out because of the packet timestamp vs current network time discrepancy. So, connection_state_remove is invoked. The next packet then creates the connection again, triggering new_connection, but it’s immediately timed-out again, and so on.

The set_network_time(double_to_time(0.001)); essentially prevents that broker manager behavior. Alternatively setting redef allow_network_time_forward=F with current master will prevent it too and may look less like a workaround.

I hope that all makes sense.

Can you please post the commit or PR id where you made the fixes for this.

It’s not yet in master, but this is the PR: Broker: Remove network time initialization by awelzel · Pull Request #2893 · zeek/zeek · GitHub

There’s also some information in the following issue, detailing some of these observations:

Thanks for the reply!
So, you are saying pcap’s total 834 packets will generate 834 new and 834 remove events each. I think it’s generating more than 834 connection, I will check once and confirm.
Apart from this , the fix you suggested, disabling the network time forward, will it lead to take only current time ? Is this PR giving permanent or temporary fix to this problem?

Yes, I’m seeing 834 conn.log entries and also 834 NEW (and REMOVED) messages for the dnp3_example.pcap. Again, this isn’t “wanted” behavior, but an artifact of network time being set to the current time by broker manager.

Apart from this , the fix you suggested, disabling the network time forward, will it lead to take only current time ?

The opposite, it’ll make sure that only timestamps from the packets in the pcap are used for network time.

Is this PR giving permanent or temporary fix to this problem?

For your use-case of zeek -r in combination with broker connectivity the PR (if accepted) provides a permanent fix. For now, using redef allow_network_time_forward=F will also work for you.