Searching the Searches

In this brief Splunk search review we wanted to cover how to leverage web proxy logs to break down what users are searching. Why would someone want to be able to do this? It helps round out context as you investigate an alert/machine/user. We even caught an unannounced pentest because of a term searched by the tester while on our next then subsequently tried to visit a result link. It can also help you understand themes within you environment, most search least searched, etc. Finally, we wanted this as a base to then try our hand at sentiment analysis to look for early indicators of an inside risk.

 

Enough with why, let's get into the search:

Gather the search specific urls, in this case Google

index=cisco_wsa wsa_dest_url="*/search?q=*" wsa_direct_url="*.google.com"

Pull out the search string into a field then make it more human readable, the search_engine will come in play later

| rex field=wsa_dest_url "q=(?P<search_terms>.*)"

| eval search_terms=replace(mvindex(split(search_terms,"&"),0),"\+"," ")

| eval search_engine="Google"

Format the results in a nice table by host ip

| stats earliest(_time) as earliest, latest(_time) as latest, values(search_terms) as search_terms by wsa_host_ip search_engine

 

Search Term Search Results

Now you can do things like add | search search_terms="*hacking*" to the end and find anyone who has searched something with the term hacking in it. You can also see the effect of the auto search from from Google where you see one submitted for each letter pressed.

 

Now that we had this built we thought why not add more search engines to this so you have one search for all results:

| append [ search index=cisco_wsa wsa_dest_url="*/search?q=*" wsa_direct_url="*.bing.com"
| rex field=wsa_dest_url "q=(?P<search_terms>.*)"
| eval search_terms=replace(mvindex(split(search_terms,"&"),0),"\+"," ")
| eval search_engine="Bing"
| stats earliest(_time) as earliest, latest(_time) as latest, values(search_terms) as search_terms by wsa_host_ip search_engine
]

So the code above you can add to the previous and now you get results from both search urls. Not bad, but we always want more.

 

Below is the final search that covers Google, Bing, Yahoo, and DuckDuckGo:

index=cisco_wsa wsa_dest_url="*/search?q=*" wsa_direct_url="*.google.com"
| rex field=wsa_dest_url "q=(?P<search_terms>.*)"
| eval search_terms=replace(mvindex(split(search_terms,"&"),0),"\+"," ")
| eval search_engine="Google"
| stats earliest(_time) as earliest, latest(_time) as latest, values(search_terms) as search_terms by wsa_host_ip search_engine

| append [ search index=cisco_wsa wsa_dest_url="*/search?q=*" wsa_direct_url="*.bing.com"
| rex field=wsa_dest_url "q=(?P<search_terms>.*)"
| eval search_terms=replace(mvindex(split(search_terms,"&"),0),"\+"," ")
| eval search_engine="Bing"
| stats earliest(_time) as earliest, latest(_time) as latest, values(search_terms) as search_terms by wsa_host_ip search_engine
]

| append [ search index=cisco_wsa wsa_dest_url="*/search?p=*" wsa_direct_url="*.yahoo.com"
| rex field=wsa_dest_url "p=(?P<search_terms>.*)"
| eval search_terms=replace(mvindex(split(search_terms,"&"),0),"\+"," ")
| eval search_engine="Yahoo"
| stats earliest(_time) as earliest, latest(_time) as latest, values(search_terms) as search_terms by wsa_host_ip search_engine
]

| append [ search index=cisco_wsa wsa_dest_url="*/?q=*" wsa_direct_url="duckduckgo.com"
| rex field=wsa_dest_url "q=(?P<search_terms>.*)"
| eval search_terms=replace(mvindex(split(search_terms,"&"),0),"\+"," ")
| eval search_engine="DuckDuckGo"
| stats earliest(_time) as earliest, latest(_time) as latest, values(search_terms) as search_terms by wsa_host_ip search_engine
]

| sort wsa_host_ip search_engine

 

We know this may not be ask directly applicable as some of our other searches but we hope it sparks some ideas on how to use the data and getting more comfortable with searching the logs.