What is metadata?

The common description is that metadata is “data about data”. Which doesn’t really tell us very much, so let’s look at an example instead.

On Wikipedia’s metadata page there’s a way of archiving information through card catalogs.

Drawers packed with index cards. On these cards, there was information about all the books in the library. The cards didn’t contain the actual content of the books, rather they contained information about the books. The author, publication year, publisher, and so forth. Not the book itself, but information about the book. Metadata.

These days metadata is mostly generated and stored digitally. And we all generate it. Lots of it. In the context of digital communications, metadata is the digital equivalent of an envelope—it’s information about the communications you send and receive. The subject line of your emails, the length of your conversations, and your location when communicating (as well as with whom) are all types of metadata. Metadata is often described as everything except the content of your communications.

Metadata can be explained in a few ways:

Metadata are a shorthand representation of the data to which they refer. If we use analogies, we can think of metadata as references to data. Think about the last time you searched Google. That search started with the metadata you had in your mind about something you wanted to find. You may have begun with a word, phrase, meme, place name, slang or something else. The possibilities for describing things seems endless. Certainly, metadata schema can be simple or complex, but they all have some things in common.

Metadata collected through devices

Those who collect or demand access to metadata, such as governments or telecommunications companies, argue that the disclosure (and collection) of metadata is no big deal. Unfortunately, these claims are just not true. Even a tiny sample of metadata can provide an intimate lens into a person’s life.

“Only” metadata

Stewart Baker, former General Counsel at the NSA (source: Rusbridger in nybooks.com.)

“Metadata absolutely tells you everything about somebody’s life. If you have enough metadata you don’t really need content.”

If you want to read more about (meta)data and privacy, I can highly recommend Bruce Schneier’s book Data and Goliath. One of the things Schneier covers in the book is some interesting research into metadata, in which researchers were given access to a chunk of metadata (with permission from the people involved in the study). Based on that data alone, they were able to identify things, like that there among the participants, was one with medical problems, one that had recently had an abortion, and one that had a (no-longer-quite-so-secret) cannabis-growing set-up. For more details, you can find the study on Web Policy. I’ll conclude with the study’s final remarks by the researchers: phone metadata is highly sensitive.

So the next time you hear someone defend the gathering of your information by saying that they are “only gathering metadata”, remember that metadata is far from “only” metadata.

Let’s take a look at how revealing metadata can actually be to the governments and companies that collect it:

Metadata associated with emails:

  • Sender's name, email, and IP address
  • Recipient's name and email address
  • Date, time, and time zone
  • Unique identifier of email and related emails
  • Mail client login records with IP address
  • Mail client header formats
  • Subject of email

Metadata associated with mobile phones:

  • Phone number of every caller
  • Serial numbers of phones involved
  • Time of call
  • Duration of call
  • Location of each participant
  • Telephone calling card numbers

Metadata associated with Facebook:

  • Username and profile bio information including birthday, hometown, work history, and interests
  • Username and unique identifier
  • User subscriptions
  • User location
  • User device
  • Activity date, time, and time zone

Metadata associated with web browsers:

  • Activity including pages the user visits and when visited
  • User data and possibly user login details with auto-fill features
  • User IP address, internet service provider, device hardware details, operating system, and browser version
  • Cookies and cached data from websites

And they gather this information around the clock, all day long, year after year. And store all the information – and have software to analyze it with. As you might expect national security organizations receive billions of dollars in funding and have the manpower furthermore.

They would be able to discern out a surprising amount of things about you. Who your family members are. Who you’re friends with. Who you work with. Who you spend time with, both privately and professionally. How often you call your wife (and if you remembered her last birthday). What your hobbies and interests are. If you make (or receive) any booty calls. And more.

And those were just your calls. They wouldn't stop at your phone calls: e-mails, surfing habits, online purchases, even what apps you use and what metadata they gather.

The W7 Ontological Model of Metadata

We are saying that metadata gives the following information about the data it models or represents:

  • What
  • When
  • Where
  • Who
  • How
  • Which
  • Why

Protecting metadata from an external collection is a difficult problem technically because third parties often need access to metadata to successfully connect your communications. Just like the outside of an envelope needs to be readable by a postal worker, digital communications often need to be marked with source and destination. Mobile phone companies need to know roughly where your telephone is in order to route calls to it.

Until laws are updated to better deal with metadata, and the tools that minimize it become more widespread, the best one can do is be aware of what metadata you transmit when you communicate, who can access that information, and how it might be used.

Find out more

In Metadata Equals Surveillance, Schneier compiles a list of just some of the many articles arguing against trivializing metadata. For more, start with these pieces by Wired, The Guardian, Techdirt, The New Yorker, Ars Technica, and the Cato Institute.