Social Data Glossary

Basic Social Data Terminology

Familiarize yourself with these social data glossary terms and make yourself a social data expert. With terms from API to XML, if you are confused about what a word or phrase means our glossary is your quick answer. Take your social data knowledge to the next level by understanding these common industry terms.




(Application Programming Interface) An API dictates how two interfaces work with each other. In the case of social data, most information is shared through a streaming API.


Backfill is Gnip's product that allows you to briefly disconnect from your realtime stream and easily get all of your data when you reconnect.

Big Data

Big Data is a term to describe the value that companies are seeing from using data to create actionable insights.

Choice of Protocols

Choice of protocols means you can receive the data in the format you prefer GET, POST, or Streaming.


Complete data is when customers have access to the entire set of data on a platform so they never miss a conversation.

Data Collector

Data Collector is Gnip's product that collects and normalizes data from public APIs including Instagram, Flickr, YouTube & more.

Data Mining

A method of computer science that sifts through data to find patterns using machine learning, statistics, database systems and more.

Data Scientist

Considered a relatively new field, the profession of data scientist means different things to different companies and often is a combination of statistics, machine learning, business intelligence, etc.

Data Scraping

Data scraping is when a company doesn't get the data from a social media publisher but rather scrapes content where they can find it. It is never complete, reliable or sustainable.


Gnip's decahose provides a random 10 percent sample of the full firehose. We'd also like to openly admit it should be called a decihose, which means a tenth while deca means ten.


Enrichments are how Gnip provides additional metadata to its data streams making it easier for our customers to digest data. Examples include Klout scores, geo location, expanding shortened URLs and more.


Firehose is a term first coined by Twitter to describe their complete set of data. Now firehose in conjunction with social media means that you have access to the full set of of a social media publisher's data.


Geotagged data is when a social media publisher lets the user decide if they want to provide the exact location of their content. Geotagged content more often comes from a smart phone.


(Java Script Object Notation) JSON is a text-based open standard designed for data interchange that even the human eye can read and is easy for computers to parse. JSON is the format Gnip delivers its social data in.

Machine Learning

Machine learning is the concept that you can teach a machine to make better predictions and decisions based on data.

Natural Language Processing

Natural language processing is the discipline of teaching computers to understand the human language.


A Javascript framework making it easy to build network applications. It's another way to connect to Gnip and consume data.

Predictive Analytics

The ability to predict future behavior and actions based on past data using machine learning, statistics, dating mining and other techniques.

Public API

Many social media publishers offer a public API providing access to their data but it is often rate limited.


PowerTrack is Gnip's powerful filtering language that gives you the ability to get complete coverage of the data you need.


With a REST API, you make a request to the server within a certain time period, and get data back only after you make the request.

Sentiment Analysis

Sentiment analysis is a technique for determining the feelings expressed in text aka whether the sentiment of text is angry, sad, happy, etc.

Social Data

Expresses social media in a computer-readable format (e.g. JSON) and shares metadata about the content to help provide not only content, but context. Metadata often includes information about location, engagement and links shared. Unlike social media, social data is focused strictly on publicly shared experiences.

Social Media

User-generated content where one user communicates and expresses themselves and that content is delivered to other users. Examples of this are platforms such as Twitter, Facebook, YouTube, Tumblr and Disqus. Social media is delivered in a great user experience, and is focused on sharing and content discovery. Social media also offers both public and private experiences with the ability to share messages privately.

Streaming API

With a Streaming API, your requests are ongoing as is the data coming your way after you make the requests.


Extensible Markup Language - a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.