The context and purpose behind this series: “The 17 Companies Reshaping the Landscape of Enterprise AI“.
Who They Are
Databricks touts themselves as “the Data + AI company”. Their goal is to unify “your data, analytics and AI workloads”.
The premise is that your data is scattered across multiple clouds, applications, websites, and more. Now, Databricks steps in and brings all that data into a single location to organize it, model it, and analyze it. This open data storage allows companies to capitalize on their data investments.
Currently, more than 5,000 companies are using the Databricks platform to help them scale their data efforts.
What They Do With Data and AI
The Databricks Lakehouse Platform is the backbone of the Databricks solution. The key factor worth noting is that the solution provides data governance, management, and security.
The suite of services offered is built on open data sources which allow Databricks to collaborate with over 450 partners worldwide.
Databricks Products:
The platform contains multiple products/solutions for various needs.
- Delta Lake: This is an “open format storage layer that delivers reliability, security and performance on your data lake”. In other words, all of the data silos that exist in your environments can be centrally stored in the Delta Lake.
- Delta Live Tables: The extracting, transforming, and loading (ETL) of data is a huge undertaking for any organization. This tool allows data engineers to manage various data pipelines from multiple sources in an easy-to-use interface.
- Databricks Machine Learning: For many, machine learning (ML) is still not in reach for practical use; or, it’s difficult to maintain, manage, and scale it securely. Databricks provides MLOPS, Auto ML, and a Data Science Workspace.
- Further, the Managed MLFlow capabilities can streamline and manage the complete machine learning lifecycle.
- Data Science: This empowers data science teams to model the data, collaborate, and share insights.
- Databricks SQL: Visualation tools to tell a story with your data can surface insights in astounding ways. This solution integrates with existing BI tools such as Tableau and Power BI to query data directly from the data lake.
- Platform Security and Administration: Today, security should not be an afterthought, but woven into the fabric of the entire infrastructure. All the capabilities within Databricks can be done on a secure platform with elastic scalability.
Most Unique / Impactful Data Application
Today, many organizations rely on data from other service providers or partners. This impacts the reliance on the timing of data. Because when things happen is a very important data point for many industries.
Traditional data exchanges have not provided stellar performance and caused costly downtime. Additionally, it’s important that the resiliency of the data is founded on the ACID Transactions principle.
With this in mind, Delta Lake provides a way for end-to-end data management. This is all built upon the following:
- Delta Sharing: Securely share data across multiple organizations.
- Unity Catalog: Provide data and AI governance down to the minutia.
- Delta Live Tables: Transform data into usable assets with automatic testing and easy recovery.
Who They Have Impacted
Wejo, a connected vehicle data company, had a massive challenge not only with the volume of data but the time to market. They were able to reduce that time from months and weeks to hours.
The impressive factor here is the are these numbers:
- 50 MILLION+ vehicles connected
- 350 BILLION+ miles curated
- 10 TRILLION+ data points
The major concerns that Wejo faced were not only the massive data volumes but scalability challenge, and slow performance.
As we all know and have become quickly adjusted to, is our cars being road-computers. Sensors, cameras, alerts, and other safety features are standard on cars now. Additionally, understanding these data points, and more is critical to car manufacturers, vendors, and other organizations.
The growth of autonomous cars is still surging and this increases the need for connected data sources. Further, the fear of many that our cars can be hacked and remotely controlled leads to many safety and privacy concerns.
These are all valid points that could impact Wejo in positive and negative ways. The need to be nimble and proactive to changes in the transportation industry is paramount.
So, how could Wejo stay competitive in our accelerated economy?
The Databricks solutions allowed Wejo to realize huge gains:
- 50x FASTER time-to-insight
- 20x FASTER data processing
- 90% DECREASE in time to market
Closing Data Thoughts
The complexity of data will continue to grow as companies layer in AI, IoT, multi-cloud, analytics, and much more. And, data will become more dispersed and unstructured due to the evolution of data sources.
Our reliance on data as a necessary asset has become the engine that powers innovation, greed, creativity, social impact, environmental change, and governmental policies. On top of this, artificial intelligence has surfaced further concerns of ethics and bias.
Since AI has not yet become sentient, data will still be governed by humans. Choosing the right solutions, and people, to manage this governance will shape how companies collaborate, operate, and compete.
Additional Resources
- https://www.statista.com/statistics/871513/worldwide-data-created/
- https://www.cnbc.com/2021/05/21/databricks-on-track-for-1-billion-in-2022-revenue-pete-sonsini-nea.html
- https://www.cnbc.com/2021/02/01/amazon-alphabet-salesforce-back-databricks-at-28-billion-valuation.html