The amount of data that’s being created and stored on a global level is almost inconceivable, and it just keeps growing. The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s. Data is everywhere – Documents, Smart appliances, Social networks, devices, Internet, sensors, etc. That means there’s even more potential to glean key insights from business information. But how? What actually is Big Data? What does that mean for businesses? Let’s explore…
What is Big Data?
“Big data is for machines; Small data is for people.“
Let’s assume you have a leak in a water pipe in your garden.
You take a bucket and some sealing material to fix the problem, simple right? But after a while, you see that the leak is much bigger and that you need a specialist (plumber) to bring bigger tools. In the meanwhile, you are still using the bucket to drain the water. After a while, you notice that a massive underground stream has opened and you need to handle millions of liters of water every second.
You don’t just need new buckets, but a completely new approach to looking at the problem just because the volume and velocity of water has grown. To prevent the town from flooding, maybe you need the government to build a massive dam that requires an enormous civil engineering expertise and an elaborate control system. To make things worse, water is gushing out from nowhere and everyone is scared with the variety.
Hope it struck a chord?
The same has been happening with “Data”. Data sets have grown so large or complex that traditional data processing software is inadequate to deal with capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, update and information privacy. What’s required was “Big Data”!
“Big Data refers to technologies and initiatives that involve data that is too diverse, fast-changing or massive for conventional technologies, techniques, skills and infra-structure to address efficiently.”
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
While the term “Big Data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The term has been in use since the 1990s, with some giving credit to John Mashey for coining or at least making it popular. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs.
3Vs model of Big Data
Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.
- Volume: The quantity of generated and stored data. Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will generate 240 terabytes of flight data during a single flight across the US; the proliferation of smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.
- Velocity: The speed at which the data is generated and processed to meet the demands and challenges. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Clickstreams and ad impressions capture user behavior at millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds; machine to machine processes exchange data between billions of devices; infrastructure and sensors generate massive log data in real-time; on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
- Variety: The type and nature of the data. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions. Big Data isn’t just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media.
Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value. Additionally, new Vs have been added by some organizations to describe it.
- Variability: Inconsistency of the data set can hamper processes to handle and manage it. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks.
- Veracity: The quality of captured data can vary greatly, affecting accurate analysis.
Actually How Big is Big Data?
What counts as “Big Data” varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.
Big Data & Traditional systems’ limitations
The need for big data velocity imposes unique demands on the underlying compute infrastructure. Relational database management systems and desktop statistics- and visualization-packages often have difficulty handling Big Data. Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure. Traditional database systems are also designed to operate on a single server, making increased capacity expensive and finite. The computing power required to quickly process huge volumes and varieties of data can overwhelm a single server or server cluster. It may require massively parallel software running on tens, hundreds, or even thousands of servers.
Applications & Impact of Big Data and Analytics
An example of Big Data might be petabytes or exabytes of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). At multiple terabytes in size, the text and images of Wikipedia are another example of Big Data.
- Retailers can track user web clicks to identify behavioral trends that improve campaigns, pricing and stocks.
- Utilities can capture household energy usage levels to predict outages and to incent more efficient energy consumption.
- Governments and even Google can detect and track the emergence of disease outbreaks via social media signals.
- Oil and gas companies can take the output of sensors in their drilling equipment to make more efficient and safer drilling decisions.
When you combine big data with high-powered analytics, Big Data has the potential to help companies improve operations and make faster, more intelligent decisions. This data, when captured, formatted, manipulated, stored, and analyzed can help a company to gain useful insight to increase revenues, get or retain customers, and improve operations. You can take data from any source and analyze it to find answers that enable cost & time reductions, new product development & optimized offerings, and smart decision making.
The impact: Big data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms specializing in data management and analytics.
Big Data Analysis | Industry usage
Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills. Big data affects organizations across practically every industry.
- Banking: While it’s important to understand customers and boost their satisfaction, it’s equally important to minimize risk and fraud while maintaining regulatory compliance.
- Education: By analyzing big data, educators can identify at-risk students, make sure students are making adequate progress, and can implement a better system for evaluation and support of teachers and principals.
- Government: When government agencies are able to harness and apply analytics to their big data, they gain significant ground when it comes to managing utilities, running agencies, dealing with traffic congestion or preventing crime.
- Health Care: Patient records. Treatment plans. Prescription information. When it comes to health care, everything needs to be done quickly, accurately – and, in some cases, with enough transparency to satisfy stringent industry regulations.
- Manufacturing: Manufacturers can boost quality and output while minimizing waste. Manufacturers can solve problems faster and make more agile business decisions.
- Retail: Retailers need to know the best way to market to customers, the most effective way to handle transactions, and the most strategic way to bring back lapsed business. Big data remains at the heart of all those things.
Big Data & Hadoop
Hadoop is a file system that allows the storage of any type of data, most of which would have been discarded in the past (because making it usable would’ve been too difficult and expensive). The value of Big Data and Hadoop comes through on-the-fly modeling of data that might actually be useful and which, when integrated with existing big data and analytics environment, can enrich business insights.
It’s important to remember that the primary value from big data comes not from the data in its raw form, but from the processing and analysis of it and the insights, products, and services that emerge from analysis. Big data requires a set of techniques and technologies with new forms of integration to reveal insights from data-sets that are diverse, complex, and of a massive scale. These insights can enable enterprises to make better decisions – deepening customer engagement, optimizing operations, preventing threats and fraud, and capitalizing on new sources of revenue.
New skills are needed to fully harness the power of big data. Though courses are being offered to prepare a new generation of big data experts, it will take some time to get them into the workforce. Meanwhile, leading organizations are developing new roles, focusing on key challenges and creating new business models to gain the most from big data.
There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. So gear up guys, transformation begins, Here & Now!
Big Data, Cloud, IoT are sexy, marketing buzzwords to describe existing technologies that are ready for the mainstream. Big data is changing the way the world uses business information. Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus – as long as the right policies and enablers are in place.