Friday, February 24, 2012

Plugging Google Data Leaks

Recently, I was doing some online research on security conditions in Dadaab, Kenya. I was more than a little surprised to see not-meant-for-public-distribution security reports pop up in Google (the organization they belonged to shall remain nameless). My guess was someone must have inadvertently placed the files on an open part of the organization’s Web server and Google found them. Ouch!

One of the reports had a link that directed readers to additional information. Out of curiosity I clicked it. I shook my head in disbelief as an IBM Lotus Notes page for the organization’s Nairobi office appeared. There was staff information, internal documents, and even a social calendar listing events, locations, dates, and times. None of this data was password protected. All of it was readily available to anyone with a Web browser and Internet connection who stumbled on the page by accident or perhaps had a less than innocent motive.

You don't need to be a rocket scientist to understand the potential security implications here. Especially considering Al-Shabaab’s recent threats of escalating its terror campaign in Kenya. (Non-state actors are growing increasingly sophisticated when it comes to using the Internet for identifying vulnerabilities of potential targets, by the way.)

Unfortunately, data leaks like this are a fairly common problem across the Internet. Sometimes it’s not a big deal. But in the case of international humanitarian organizations, IT boo-boos like the one above could put staff members at serious, increased levels of risk.

Most humanitarian security practitioners don’t have the background to perform thorough information security audits. And that's OK. But I want to share with you a simple and quick way of finding common sources of Web site data leaks. It doesn’t require any real technical skills. You can even try it right after reading this post and see if your organization might have virtual vulnerabilities that could produce real-world risk.

First some background is required (I promise to keep the geeky stuff to a minimum). Search engines like Google have automated programs that locate and index Web pages. These programs are known as crawlers or bots (short for robots). They’re constantly connecting to publicly accessible Web sites all over the world and reporting back what they find.

In addition to Web pages, these bots also index other files they encounter, such as Word, Excel, PowerPoint, and Adobe Acrobat documents. This is how data leaks often occur. A server is misconfigured or a sensitive document is accidentally put in a directory that allows it to be publicly viewed. A bot crawling the Web finds the file and reports its contents and link location back to the search engine company. After the file is indexed, it may then show up in someone’s search results.

Because of the sheer volume of indexed Web sites, you may think locating documents with leaky data is like finding a needle in a haystack. But guess again. Thanks to a set of advanced search parameters you can use with Google (and other search engines), it’s easy to narrow your hunt.

Here’s how. Instead of searching for Web sites that contain a certain word, use the site: option. It allows you to confine a search to a specified domain (such as or Next, use the filetype: parameter. It searches for a specific file type (.pdf or .doc are two possibilities).

Here’s an example. If you type the following in Google, it will show all of the Adobe Acrobat files on the Electronic Frontier Foundation's ( Web site: filetype:pdf

You can further refine your search by adding a keyword. For example here's how to list all of the PDF files on EFF's Web site that contain the word police. filetype:pdf police

See where I’m going with all of this? You can quickly troll through a site looking for documents that contain data an organization might not have wanted to share (common file types to look for include: doc, docx, odt, pdf, ppt, pptx, txt, xls, and xlsx).

Hackers do this all the time, hunting for passwords, user accounts, social security numbers, credit card numbers, and other types of data that can easily be exploited. Investigative journalists do the same thing, but look for newsworthy tidbits on government Web sites. (Keep in mind not all documents you discover this way will be leaky. Lots of files are knowingly made public.)

None of this is really breaking news. Most information security professionals have been aware of Google Hacking for a long time. There’s a lot written about it on the Web and a couple of books have been published. And in most countries it’s even legal.

Give it a try on your organization’s Web site. Have your focal points search their country office Web sites. Did you find any leaky documents that could put staff at risk? If you did, let your IT staff know about it so they can start plugging the holes. You never know when someone might be interested in your organization for the wrong reasons.

Labels: ,


Post a Comment

<< Home