libcurl and libev integration

Hi,

This is sort of a long story so grab a cup of coffee :slight_smile:

I've been using ActiveHTTP::Request to issue HTTP requests from Bro
scripts so far, but as it's using shell invocation of curl it is a)
slow due to execution of external process b) unable to make use of
HTTP keep-alive and c) hence unable to multiplex requests to same
server. This causes whole lot of overhead with current
ActiveHTTP::Request making it unusable for various interesting
usecases, eg. querying data from ElasticSearch :wink:

I eventually started tweaking the code to incorporate libcurl
functionality into Bro runtime and the blocking version came along
just fine. Now as I am trying to make the libcurl function calls
asynchronous, it's becoming really complex. There is
curl_multi_fdset(multi_handle, &fdread, &fdwrite, &fdexcep, &maxfd);
[1] which can be used to fetch socket descriptors associated with
libcurl's connections. It is designed to populate existing fd_set
struct with proper file handles to be spoon-fed to select(). Note that
fd_set struct internal structure is not well defined (different on
Windows and POSIX); macros FD_SET, FD_CLR, FD_ISSET and FD_ZERO are
designed for fd_set manipulation and there is no macro to extract
socket descriptors associated with particular fd_set.

Trying to work with and around Bro's FD_Set and IOSource classes makes
it nearly impossible to integrate libcurl as it is without a lot of
dirty code. I also noticed that you've implemented asynchronous DNS
client on top of your event loop handler mechanism. I assume this was
built much earlier than any event handling library was conceived so
it's sort of a legacy. Now looking how stuff is done nowadays you see
there are a lot of event loop libraries out there which make more
efficient use of kernel such as epoll() and kqueue in contrast to
select() and already have built-in methods for asynchronous file
input/output. Note that libev [3] also has built-in async DNS client.

My conclusion at this very early stage is that it would make sense to
substitute Bro's event loop and DNS client with libev. This should
make it significantly easier to integrate with other libraries such as
libcurl, take a look at libuv example of libcurl [4]. I am not sure
how this would affect Bro runtime logic. Comments, questions and
feedback on the ideas presented above are very much welcome :slight_smile:

1. http://curl.haxx.se/libcurl/c/curl_multi_fdset.html
2. http://www.mkssoftware.com/docs/man3/select.3.asp
3. https://github.com/libuv/libuv
4. http://curl.haxx.se/libcurl/c/multi-uv.html

Iā€™d also vote to investigate changing over to libev (or libuv since you mention it) and I also recently suggested that as part of [1].

- Jon

[1] https://bro-tracker.atlassian.net/browse/BIT-1388

Hi,

I'd also vote to investigate changing over to libev (or libuv since you mention it) and I also recently suggested that as part of [1].

Thanks for the thumbs up. I think libuv makes better match due to
built-in async DNS client and file input/output. Also it's used by
Node.js which means that it's more tested than it's counterparts.

Feel free to add reference to the initial post to the JIRA bug tracker
comments section :slight_smile:

Lots of good thoughts. That all makes sense and would be worth
exploring I think. Much of the current code is indeed just legacy[1]
and should really be completely redone. I wasn't aware of libuv, but
that and libev both look like good candidates to get some abstraction
in there.

Redoing the I/O loop is a larger project though. The coding is one
part but we'd also need to test it pretty thoroughly in a range of
settings so that we don't break anything. If we had a volunteer to
take the lead on this, that would probably help a lot. :slight_smile:

Robin

[1] Including such things as working around old OS versions not
correctly handling select on fds coming out of pcap.