Thursday, December 20, 2012

Google Hacking -- Blast from the Past

Google Hacking is not new but surprisingly few outside the security community understand what it is, it's risks, and it's rewards.  Whether your sharpening your security skills or improving your ability to find information on the Internet; this article is for you.  Once your SearchFu is strong, how you choose to use your new super hero powers are up to you.

What exactly is Google Hacking?  Google hacking is not breaking into Google computers as the name might suggest.  Google hacking is a multipurpose term; it's both a noun and a verb.  As a noun, Google Hacking[1] it's a groundbreaking book written by security super hero Johnny Long (Twitter, @ihackstuff).  As a verb, Google hacking is the activity of using Google advanced searching commands and techniques to find the proverbial -- needle in the Internet haystack.   You may wonder, how could searching with Google advanced search commands possibly become a security concern?

Information Persistence
Content you place on the Internet may live for very along time.  In fact, content predating the Internet sometimes finds its way back into the Internet.  Case in point, old Bulletin Board System(BBS) message threads(e.g., textfiles.com) are available online for anyone.  If you mistakenly publish content to your web servers, it may be downloaded by archiving bots like Wayback Machine or available in Google's caches.  It's difficult to know which deep dark corners of the Internet your content may live and for how long.

Safety in Numbers
People often feel a misguided sense of anonymity when they consider the large number of people on the Internet.  Many feel their personal information will get lost in all the billions of results.  Or better yet, their personal information is not interesting to anyone.  These are myths.  You will learn some simple and practical techniques to improve your search skills while raising your awareness.

Silent Reconnaissance
Google provides powerful search commands to locate information of interest.  The security concern is that there's no active defense against reconnaissance since corporate servers are not queried directly.  For instance, Apache HTTPD web logs will not contain any entries since the server is not accessed at the time of the search.

Some Advanced Google Search Techniques (Google's Full ReferenceAdv Page)
I'm not going to compress the Google Hacking book into a few paragraphs; to do so is an injustice.  Instead, I'll provide practical examples you can apply to your business and personal life.  Following are some practical uses for Google's search commands.

Limit search scope to a single website
The following is one of the most useful commands ever.  With this command you can limit all search results to a single web site of interest.  Use the site: command where host.something.com is the web site of interest.  Alternatively, you can drop the host and only include the target domain like, something.com.  The word, foo is your search term(s).

foo site:host.something.com

When I first cracked the cover on Johnny's book, for laughs, I tried to find all confidential information on my current employer's web site.  I was not planning the search would produce anything of interest.  After all, who would publish confidential information to a public company web site, right?  Bingo!  I found a lot of fluff but there were some interesting results I shared with horrified executives.  A good run of thumb, don't rule out the obvious.  Don't assume people think the same way as you.  What is obvious to you may not be so obvious to someone else.  Always confirm your suspicions.

Reduce noisy search results

The next useful search is a slight alteration of the preceding command by adding the minus operator to the search term.  Adding a minus operator to the search term(s) excludes matching criteria from the search result set.  In the following example, I use the minus operator with the site command but you can use it with other commands as well.  Consider the following.

-www site:company.com

The preceding query produces a result excluding any references to content served from www.  It may not be immediately apparent why such a search is useful but the query is useful to identify content on servers other than the primary (e.g. ,www).

Cached results
When Google's robots scan sites they cache the results.  You can use the info command to view cached page results.  Combine search terms with the info command produces no effect.  Consider the following example.

info:www.eff.org

When you type of the preceding command you will see information like shown in Figure 1 in your browser.

Figure 1:  Google info command to fetch cached page
In the past, there was a command to retrieve cached pages directly, cached.  The command is no longer supported.  While the caching feature is still available it's not as prominent as it once was.  The purposes of these changes are not entirely clear since the feature is still supported.

If your an IT administrator you can remove your web site or areas of your site from the gaze of Google's bots with a properly crafted robots.txt[3] file but there is a tradeoff.  Attackers can see any entries you include -- so it's somewhat defeating.  Still it's likely better a better alternative than an archive of your site stuffed into Google's caches, if that bothers you.

Limit results to specific file types

How many times have you wanted to find only a list of PDF, XLSX, of TXT files for your searches.  Well thanks to Google's filetype command you can.  Consider the following.

higgs boson filetype:pdf

The preceding search will produce a search result containing only PDF documents.  The salient point of filetype is that Google knows how to index file content for popular file types, not only HTML pages.

Dark Uses of the Google Search Commands 
Attackers are creative, often combining information from Google hacking sessions with other Internet resources like password databases.  Internet web cams are a popular target.  Attackers use search techniques to find specific web cam models of interest.  With the detailed make, model, and version information, attackers find default administrative credentials in password databases available on the Internet.  Once the administrative interfaces are known and account credentials are compromised, the web cam is hijacked.  A hijacked web cam may used to check if your home or not, monitor discussions, and far more creepy things.  Some higher end gimbaled models can moved or repositioned remotely via web controls -- downright creepy.  The following is partial list of darker uses for Google search.
  • Social security numbers
  • Credit card numbers
  • Personal passwords
  • Service or application passwords
  • Vulnerable software
  • Insecure web cams & embedded devices
  • Sensitive corporate information
If you want to learn more about Google hacking you can grab a copy of Johnny's book[1].  He also has a web site Hackers for Charity and maintains an up to date database of advanced Google searches[2].  Simply cut and paste search template commands into your web browser to see the latest results.

The reason I decided to write this article was a search query of mine from years ago produced many more results recently then the time of my original presentation years ago.  I was perplexed.  I assumed since Google hacking has been around for years people must have made improvements.  I was wrong.  Time to start talking more about Google hacking.   ;o)

Tatica. "Clipart - Kung Fu." Clipart - Kung Fu. 19 July 2011. Clipart.org. 16 Dec. 2012 <http://openclipart.org/detail/150409/kung-fu-by-tatica>.

[1] Long, Johnny. "Google Hacking for Penetration Testers [Paperback]." Google Hacking for Penetration Testers: Johnny Long: 9781597491761: Amazon.com: Books. 2 Nov. 2007. Syngress. 14 Dec. 2012 <http://www.amazon.com/Google-Hacking-Penetration-Testers-Johnny/dp/1597491764>.

[2] Long, Johnny. "GHDB « Hackers For Charity." GHDB « Hackers For Charity. Hackers for Charity. 16 Dec. 2012 <http://www.hackersforcharity.org/ghdb/>.  (Note: there are also other Google hacking DBs on the Internet)

[3] "Block or remove pages using a robots.txt file." Google.com. 16 Dec. 2012. Google. 20 Dec. 2012 <http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449>.


Share It!