Detecting software components that do strange dns queries

Hi all,

Is it possible to detect what software components do "strange"
queries?? For example, in our network, we detected queries to
"abnormal" domains like these:

1363608064.778525|VmUnpNRkiF5|192.168.65.160|2933|10.196.0.67|53|udp|54891|gqtpngnqt.com|1|C_INTERNET|1|A|-|-|F|F|T|F|0|-|-
1363608064.792823|JT4SuPtIQ3k|192.168.65.160|2940|10.196.0.67|53|udp|3431|wvxzfmyw.cc|1|C_INTERNET|1|A|-|-|F|F|T|F|0|-|-
1363608064.794325|tYWZyjP18fd|192.168.65.160|2941|10.196.0.67|53|udp|15204|shlghhw.org|1|C_INTERNET|1|A|-|-|F|F|T|F|0|-|-
1363608079.436835|TO6u5Zqbx1|192.168.65.160|2962|10.196.0.67|53|udp|50810|xqqkwjqdbhh.ws|1|C_INTERNET|1|A|0|NOERROR|F|F|T|T|0|149.20.56.32,149.20.56.33,149.20.56.34|6024.000000,6024.000000,6024.000000

.. and a lot of more.

Any ideas how to accomplish this??

Hi

Maybe this could help you…
http://code.google.com/p/security-onion/wiki/DNSAnomalyDetection

/Lysemose

Are you asking from a host perspective (now that you've seen this
traffic on a network, what is causing it on the host) or from a
network perspective (how do I find suspicious queries like the in
network traffic)?

-=Mike

Character frequency analysis.

Do you mean https://www.google.es/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDAQFjAA&url=http%3A%2F%2Farxiv.org%2Fpdf%2F1004.4358&ei=eQFMUcnUGsamhAfDzYGoAQ&usg=AFQjCNG7i1H_2CSKH5k11Z44zOg6sLAQgA&bvm=bv.44158598,d.ZG4??

I saw this the other day on Twitter, https://github.com/sethhall/bro-domain-generation, but that still doesn’t answer your original question.

/Lysemose

You can do character frequency analysis with a simple Bro script. Look at <http://www.bro.org/documentation-git/scripts/base/strings.bif.html> to see the functions you can use for strings.

I think that this is asking the wrong question, however. I'd be amazed if you could reliably determine "good" domains from "bad" domains based simply on character frequency analysis. Bro can calculate entropy for you: <http://www.bro.org/documentation/scripts/base/bro.bif.html#id-find_entropy>. That being said, I don't think entropy is the right answer either.

Here are the entropy results (in no particular order) for the 4 domains you listed and for 4 very common domains (google.com, twitter.com, fbcdn.net and amazon.co.uk):

[entropy=2.646439, chi_square=450.8, mean=100.2, monte_carlo_pi=4.0, serial_correlation=0.096875]
[entropy=3.085055, chi_square=400.538462, mean=104.692308, monte_carlo_pi=4.0, serial_correlation=-0.005991]
[entropy=3.095795, chi_square=338.090909, mean=106.727273, monte_carlo_pi=4.0, serial_correlation=0.062381]
[entropy=3.027169, chi_square=384.636364, mean=104.727273, monte_carlo_pi=4.0, serial_correlation=0.011643]
[entropy=3.182006, chi_square=424.857143, mean=105.5, monte_carlo_pi=4.0, serial_correlation=-0.050923]
[entropy=2.947703, chi_square=303.888889, mean=98.0, monte_carlo_pi=4.0, serial_correlation=-0.316796]
[entropy=3.084963, chi_square=372.0, mean=97.666667, monte_carlo_pi=4.0, serial_correlation=-0.248104]
[entropy=2.845351, chi_square=431.181818, mean=102.818182, monte_carlo_pi=4.0, serial_correlation=-0.322755]

I don't know about you, but I can't tell which are good and which are bad. I suspect that DNS names are too short of a sample to provide any meaningful data.

I think you should focus instead on the behavior that you're trying to detect. Looking at your example below, some alerts that'd be more useful might be:

- Too many NXDOMAIN queries.
- A query that resolves to an ISC sinkhole.
- Queries for a domain that no one else queried.
- Repetitive queries every X seconds with little to no deviation.
- Queries for a domain that you haven't seen before.

Hope this helps,

  --Vlad

Many many thanks Vlad for your explanation ... I'll think about it this weekend

Yes, thanks for the example and detail.

CFA was the first thing that crossed my mind so I googled for it and found the Arxiv paper; it sounds promising to me but I can see your point about the length.

While searching for supporting information I found old Google and Github projects with some code inspired by the paper. It appears someone forked the original project but abandoned it after updating the README file. :confused:

Readme: https://code.google.com/p/dnapy/

Code: https://github.com/gourryinverse/dnapy