Acceleration Economy
  • Home
  • Cloud Wars
  • Analyst Content
    • By Category
      • AI/AI Index
      • Cloud/Cloud Wars
      • Cybersecurity
      • Data
    • By Interest
      • Leadership
      • Generative AI
      • Partners Ecosystem
      • Process Mining
      • Sustainability
    • By Industry
      • Financial Services
      • Healthcare
      • Manufacturing
      • Retail
    • By Type
      • Guidebooks
      • Summits
      • Roundtables
      • Video Moments
    • By Vendors
      • All Vendors
      • AI/Hyperautomation
      • Cloud
      • Cybersecurity
      • Data
  • Courses
    • Cloud Wars Top 10
    • Selling AI, Cloud, Data & Cybersecurity
    • The Demise of Traditional Go-To-Market Strategies
  • What we do
    • Advisory Services
    • Marketing Services
    • Event Services
  • Who we are
    • About Us
    • Practitioner Analysts
  • Subscribe
Twitter Instagram
  • Courses
  • Summit NA
  • Dynamics Communities
Twitter LinkedIn
Acceleration Economy
  • Home
  • Cloud Wars
  • Analyst Content
        • By Category
          • AI/AI Index
          • Cloud/Cloud Wars
          • CybersecurityThe practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks.
          • Data
        • By Interest
          • Leadership
          • Generative AI
          • Partners Ecosystem
          • Process Mining
          • Sustainability
        • By Industry
          • Financial Services
          • Healthcare
          • Manufacturing
          • Retail
        • By Type
          • Guidebooks
          • Summits
          • Roundtables
          • Video Moments
        • By Vendors
          • All Vendors
          • AI/Hyperautomation
          • Cloud
          • Cybersecurity
          • Data
  • Courses
    • Cloud Wars Top 10
    • Selling AI, Cloud, Data & Cybersecurity
    • The Demise of Traditional Go-To-Market Strategies
  • What we do
    • Advisory Services
    • Marketing Services
    • Event Services
  • Who we are
    • About Us
    • Practitioner Analysts
  • Subscribe
    • Login / Register
Acceleration Economy
    • Login / Register
Home » How to Implement a Data Lakehouse to Maximize ROI
Data Modernization

How to Implement a Data Lakehouse to Maximize ROI

Wayne SadinBy Wayne SadinApril 20, 2023Updated:April 20, 20235 Mins Read
Facebook Twitter LinkedIn Email
Share
Facebook Twitter LinkedIn Email

So, here you are, faced with the fundamental question for a data engineer/data scientist: How do I provide the secure, available, scalable, flexible, accessible, reliable, and cost-effective data ingestion, storage, transformation, reporting, and analytic environment my organization needs to compete in today’s acceleration economy?

Wait . . . go back and read that again, slowly. Look at all those requirements! How the heck do you deliver on all those — sometimes conflicting — demands?

Way back when I was a data engineer, the answer was to license and implement (and patch and upgrade and support and train people to use) a plethora of specialized tools that collectively provided the needed features (except for “flexible” and “cost-effective,” in most cases). And, of course, most of those tools ran on-premise, requiring lots of additional work and cost.

Today, your solution might be to implement a software-as-a-service (SaaS) data lakehouse that combines most, if not all, the above features. Data lakehouse products can be licensed as a stand-alone toolset, as is the case with Acceleration Economy’s Cloud Wars Top 10 vendor Snowflake’s product, or as part of an analytics toolset, like the products offered by Data Modernization Top 10 Shortlist vendor Qlik. Whether or not you license the data lakehouse separately from the analytics product mostly depends on the scale and complexity of your needs.

OK, full disclosure: This class of product has a number of stock keeping units (SKU)s that can be licensed separately, depending on your needs, so you’ll spend some time working through your configuration. And many of the products have “application stores” that allow customers to license additional capabilities from affiliated vendors. Pay close attention to your needs versus wants or “cost-effective” can go out the window . . . but a SaaS data lakehouse suite is still far cleaner than any multi-vendor tool amalgam can be.

Which companies are the most important vendors in data? Check out the Acceleration Economy Data Modernization Top 10 Shortlist.

A Proper Data Lakehouse Implementation

Since a data lakehouse combines the features of a data lake with those of a data warehouse — with an analytics and reporting capability, perhaps — its use can dramatically speed decision-making by improving data access and analytics. A proper data lakehouse implementation depends on a set of thoughtful decisions (including, but not limited to):

  • Data Governance. What data are you collecting? How long do you need to store it? Who should have access, and what kind of access should they have (this incorporates both role-based controls and “data classification” into categories like “internal use only”)? Who can grant access to each data element, and what kind of audit trails are needed? How should data be described so people can find what they need (which gets into taxonomy and metadata)?
    One important part of data governance is data lineage (or data provenance), which means demonstrating (to auditors and perhaps regulators) where data originated and how it was copied and transformed into its end products (reports, dashboards, etc.). Some data must be pristine: It’s in-scope for Sarbanes-Oxley Act (SOX) audits and for external financial reporting. But data quality comes at a cost — especially for “big data” — so not all data needs to be perfect (see “Data Engineering” below).
  • Data Security/Availability. This is your next decision. Start with encryption (in today’s world the answer is “Yes, encrypt” — don’t overthink this). Then layer in data access controls (to implement the governance decisions made above). If you do it right, you’ll find that this is where data security intersects with zero-trust principles. What level of redundancy is needed?
  • Data Engineering. Here you’ll face another set of decisions that are related to the earlier decisions. Data engineering is largely about cost, and the trade-offs that are needed to balance cost against every other objective. FYI, users always desire three things from an information technology (IT) system: that it be free, instant, and all-encompassing. What kind of performance is needed from, for example, real-time data ingestion for Internet of Things (IoT) applications, analytics performance for trading-floor applications and industrial controls, or archival storage for historical comparisons? What is the desired redundancy cost in license fees, bandwidth, latency (for dual-commit transactions), and FTEs (full-time equivalents)?
  • Tool Access. This used to be easy; IT got to access IT tools, and end-users consumed the output of the tools. Then the pendulum swung — too far, in my opinion — and shadow IT flourished as users got access to powerful tools and huge datasets without necessarily being subject to, or even being aware of, data governance and security controls. This tool/control mismatch created many problems for organizations as multiple sources of truth were created and maintained — with needless cost and inadequate security. As a CIO, I’ve spent years stamping out most shadow IT, but data lakehouse products finally allow IT to embed security and governance controls right in the lakehouse, thereby making it easier to enforce important organizational standards and harder for users to inadvertently cause problems. Data lakehouse tools can also bridge the gap between “citizen developers” (users with cool tools) and “pro developers” (IT specialists with cooler tools) and thus facilitate collaboration among groups that heretofore used different tools and had different controls. Effective tool deployment, governance, and training aren’t automatic. Data lakehouse tools should operate within the organization’s overarching data security and governance frameworks and be deployed following best practices that make it easier for all users to “do the right thing” with data and analytics (which means getting rid of spreadsheets almost everywhere).
Insights into the Why and How of Data and Business Modernization featured image
Guidebook: The Why and How of Data and Business Modernization

Conclusion

Data lakehouse technology combines powerful tools with access to treasure troves of data. Proper implementation and use of a data lakehouse and its associated analytics tools empowers everyone from top executives to customer-facing employees to make decisions faster and more accurately than ever before. Making smart decisions when designing and implementing the data lakehouse is critical to maximizing return on the organization’s big investment in technology, and its even bigger investment in generating and acquiring data.


Want more insights into all things data? Visit the Data Modernization channel:

Data Modernization Channel Logo

CIO CXO data data cloud data tools data warehouse featured governance Internet of Things leadership Qlik SaaS scalability Snowflake software
Share. Facebook Twitter LinkedIn Email
Analystuser

Wayne Sadin

CIO PriceSmart
Acceleration Economy Advisory Board Member

Areas of Expertise
  • Board Strategy
  • Cybersecurity
  • Digital Business
  • Website
  • LinkedIn

Wayne Sadin, an Acceleration Economy Analyst focused on Board Strategy, has had a 30-year IT career spanning Logistics, Financial Services, Energy, Healthcare, Manufacturing, Direct-Response Marketing, Construction, Consulting, and Technology. He’s been CIO, CTO, CDO, advisor to CEOs/Boards, Angel Investor, and Independent Director at firms ranging from start-ups to multinationals.

  Contact Wayne Sadin ...

Related Posts

SAP’s Juergen Mueller on GenAI Revolution, Future of Cloud Innovation | Cloud Wars Live

September 25, 2023

Oracle Execs Vow No Price Hikes for GenAI; Premium Pricing ‘Silly’

September 25, 2023

On Location at Automation Anywhere Imagine 2023: Accelerating Industries Through Intelligent Automation

Sponsored ContentSeptember 25, 2023

GenAI Price War?: SFDC, NOW, SAP Boost Prices; Oracle Does Not

September 25, 2023
Add A Comment

Comments are closed.

Recent Posts
  • SAP’s Juergen Mueller on GenAI Revolution, Future of Cloud Innovation | Cloud Wars Live
  • Oracle Execs Vow No Price Hikes for GenAI; Premium Pricing ‘Silly’
  • On Location at Automation Anywhere Imagine 2023: Accelerating Industries Through Intelligent Automation
  • GenAI Price War?: SFDC, NOW, SAP Boost Prices; Oracle Does Not
  • How Generative AI Is Impacting Organizational Roles and Structures

  • 2X a week
  • Analyst Videos & Articles
  • Exclusive Digital Business Content
This field is for validation purposes and should be left unchanged.
Most Popular Guidebooks

The State of Process Mining 2023: Unlocking Efficiency and Driving Customer Satisfaction

July 31, 2023

How Workday Creates Agile Monetization Opportunities for CFOs

June 21, 2023

Why & How to Create a Zero-Trust Framework

June 12, 2023

The Ethical and Workforce Impacts of Generative AI

May 26, 2023

Advertisement
Acceleration Economy
Twitter LinkedIn
  • Home
  • About Us
  • Privacy Policy
  • Get In Touch
  • Advertising Opportunities
  • Do not sell my information
© 2023 Acceleration Economy.

Type above and press Enter to search. Press Esc to cancel.

  • Login
Forgot Password?
Lost your password? Please enter your username or email address. You will receive a link to create a new password via email.