Apache Hadoop has been slowly fading out over the last five years—and the market will largely disappear in 2021.
According to Google trends, Hadoop peaked as a search term in 2015 and has been on a slow decline ever since, reflecting the ongoing technology shift away from on-premises into the cloud and the evolving role of business intelligence in modern organizations. Hadoop’s batch processing and slow response times weren’t designed for today’s interactive analytics, requiring complex workarounds and fixes to keep everything running smoothly.
In addition, rough analysis has no place in today’s fast-paced, consumer-centric world where data is now key to powering the best experiences. While Hadoop can process and transform data, it doesn’t naturally provide the visual and reporting outputs needed for successful business intelligence. Cloud technologies like Google Kubernetes Engine (GKE) and Looker are much better at supporting the speed, flexibility, and scalability cloud-native apps need while also providing easy data exploration in real-time.
It’s time to think beyond the dashboard to data experiences
Embedded analytics and data insights will transform business intelligence (BI).
As the amount of data increases rapidly and organizations become more data-driven, employees need immersive data experiences that embed data directly into their business workflows. Streaming data from multiple sources and the demand for real-time insights are putting traditional data dashboards to the test, and ultimately—they’re falling short. In the coming year, we’ll see more AI-augmented analytics experiences that will place data right where business users need it to be.
For instance, the Looker team uses embedded dashboards to allow people to interact with data in their natural context. The sales organization embeds Looker dashboards directly into Salesforce. By integrating business data into their daily tools and systems, they benefit from custom data visualizations right where they are most useful, helping them make better decisions.
Data lakes will look more like databases
One of the most significant changes we expect to see next year is that data lakes will start looking more like normal databases. Unstructured data is growing—projections from IDC show 80% of worldwide data will be unstructured by 2025. And as it does, so does the pressure to store and manage it in a way that allows companies to access it, analyze, and put it to good use. As on-premises data lakes start migrating into the cloud, we’ll also see them starting to take on key data warehouse functionality that makes it easier to query datasets. For example, modernizing your data lake to Google Cloud enables you to use BigQuery to issue queries directly against data. BigQuery can handle data of all shapes and sizes, ranging from an Excel spreadsheet to petabytes of data. It supports SQL-like queries, making it easy to use like any other basic database. In 2021, we’ll continue to see a shift where data lakes take on more key data warehousing capabilities, such as transactional consistency, rollbacks, and time travel.
Data products will be for employees—not just customers
In addition to building data products for customers, companies are starting to build products specifically for internal data. Consider a data product like Amazon or Zillow, which are designed for you to type in a query and get back a data set. You enter search criteria and get a list of products or listings that match. But data products for companies haven’t been built the same way—instead, teams often end up wading through static spreadsheets and pivot tables to get answers to their questions.
In 2021, expect to see more internal data products for modern workers, tailored for what specific job roles and functions need. Easy-to-use data and AI/ML technology have finally made it possible to bring analytics to everyone, not just teams with specialized knowledge and training. For instance, one of the best examples we’ve seen this year is a touch-enabled interface designed specifically for employees to view metrics about streaming service titles. Using a product experience to democratize data internally will only become more common as companies continue on the path towards automated, intelligent decisions.
Data lock-in remains a big risk
The average company now maintains around a thousand SaaS tools. When you combine this with the rise in distributed applications and modern architectures, the new flavor of lock-in isn’t just at the technical and infrastructural level anymore—data is also in danger of becoming locked in. The more services, tools, and apps we implement, the higher the chance of falling victim to lock-in, especially in cases where migrating data becomes too time-consuming or costly to warrant trying to migrate between providers.
About the Author

Colin Zima is a Chief Analytics Officer and VP of Product, Looker at Google Cloud. Looker helps companies create value with data.
Featured image: ©Graphiqa