November 25, 2024

Starburst Unveils AI-Powered Data Analytics Solutions at AWS re:Invent 2024: A Deep Dive into Hybrid Lakehouse Innovation

Written by

In an exclusive pre-event interview with VMblog, Manveer Sahota, Sr. Director of Product Marketing at Starburst, offers a compelling preview of the company's cutting-edge data platform strategies for AWS re:Invent 2024.

Starburst is set to showcase its Open Hybrid Lakehouse platform, Starburst Galaxy, which promises to revolutionize data analytics by addressing critical challenges facing enterprises in 2024, including rising cloud costs, data silos, and the accelerating demand for AI-driven insights.

VMblog:  Before we dive into AWS re:Invent specifics, can you give our readers a brief overview of your company and what sets you apart from others in the market?

Manveer Sahota:  Starburst, the Open Hybrid Lakehouse, is the leading end-to-end data platform to securely access, analyze, and share data for analytics and AI across hybrid, on-premises, and multi-cloud environments. As the leaders in Trino, a modern open-source SQL engine, Starburst empowers the most data-intensive and security-conscious organizations like Comcast, Halliburton, Vectra, EMIS Health, and 7 of the top 10 global banks to democratize data access, enhance analytics performance, add intelligence to their analytics stack with AI Agents, and improve architecture optionality. With the Open Hybrid Lakehouse from Starburst, enterprises globally can easily discover and use all their data to power business-critical applications like anti-money laundering and fraud analytics, next-best products, customer 360, log analytics, and ESG reporting.   

VMblog:  How can attendees of the event find you?  What do you have planned at your booth this year?  What type of things will attendees be able to do at your booth? 

Sahota:  Attendees can find us at booth 1175 and we are bringing back our show-stopping swag! Guests can also meet some of Starburst's executives and lead solution architects to learn more about our open hybrid lakehouse, Icehouse, how to effectively create an AI strategy with Data Products, and leverage AI Agents to build, understand, and power analytics with Data Products.

VMblog:  Have you sponsored AWS re:Invent in the past?  If so, what is it about this show that keeps you coming back as a sponsor?

Sahota:  In our seven-year history, Starburst has been a sponsor for four years. Each year, we continue to be amazed by the energy and the technologists that the event brings from around the world, who share a passion for data, technology, analytics, and AI.

VMblog:  Do you have any speaking sessions during the event?  If so, can you give us the details?

Sahota:  Attendees can join Adrian Estala, Starburst Field Chief Data Officer, on Tuesday at 2:30pm PT in Theater 4 (Data & AI/ML Pavilion) for Feed your AI Strategy with Data Products to learn how organizations can leverage concepts like Data Products to accelerate their AI initiative from concept to production.

VMblog:  What are the most significant cloud-related challenges your customers are facing in 2024, and how does your solution address these pain points?

Sahota:  

  1. Rising costs - Customers seek greater efficiencies in their data, analytics, and AI stacks. This means they are consolidating to the platform and only relying on novel point solutions for very specific business problems. Starburst is helping its customers by augmenting or completely replacing their data analytics stacks with our lakehouse, which provides industry-leading price performance for streaming and interactive workloads.
  2. Concerns over lack of tech stack flexibility - Customers also seek greater flexibility and ownership of their data and analytics. This means they want to be able to use the right tools for the job, and in the case of analytics, using open table formats like Iceberg and Trino-based engines in their Lakehouse architecture to enable them to take ownership of their data and SQL.
  3. Data silos - With the surge of AI adoption and a greater need for improved analytics, data silos are stifling innovation and limiting intelligence within an organization. Customers are looking for secure and effective means to discover and access relevant data for analytics and AI without worrying about data movement or duplication, especially for non-critical internal workloads, which raises costs and security risks.
  4. Accelerating analytics - with dynamic market conditions driven by customer demand, competition, and changing geopolitical environments, businesses are looking to improve their responsiveness with faster time to insights to make informed decisions without paying for the premium of traditional real-time analytics. Therefore, customers are seeking solutions like Starburst to enable near real-time analytics where large volumes of data can be ingested, transformed, governed, and ready to be queried within a minute - with optimized efficiency for data management and compute costs.

VMblog:  Which of your products or solutions will you be highlighting at AWS re:Invent 2024, and why are they particularly relevant for today's AWS users?

Sahota:  We'll be showcasing Starburst Galaxy, our fully managed Lakehouse platform that brings together the power of Trino with the flexibility of Iceberg. Galaxy is relevant for today's AWS users because it simplifies the data and analytics experience for data practitioners (data engineers, data scientists, and data analysts) by automating and simplifying complex tasks and infusing AI Agents to understand and analyze data, while helping business teams realize value from their distributed within minutes without incurring massive cost run-ups.

VMblog:  How does your technology specifically complement or enhance AWS services, and what unique value proposition do you offer to AWS customers?

Sahota:  Starburst helps AWS customers build and maintain an Iceberg-based data lakehouse while still providing access to their distributed data across warehouses and other SaaS applications. Starburst easily fits into an AWS-centric architecture, as highlighted by Gilead Sciences at re:Invent in 2022, and with the latest enhancements, we've drastically improved the experience. Now, customers of Starburst and AWS can easily ingest streaming data from Amazon MSK or other Kafka topics into Iceberg tables in S3 at verified rates of ingestion of 100 GB/second,  use automated data transformation and compaction to make the data usable, apply necessary governance to secure it, and then analyze the data with a highly optimized Trino based SQL engine. Users can also leverage Starburst to discover, access, and analyze their data in Amazon Redshift and 20+ other sources and continue to use Amazon Glue as their data catalog of choice. Lastly, Starburst brings AI Agents front and center with AWS Bedrock to make it easier than ever to understand, build, and analyze data products. Ultimately Starburst helps to accelerate and simplify data management, analytics, and data sharing within AWS.

VMblog:  With AI and machine learning being hot topics, how is your company incorporating these technologies to improve cloud operations and management?

Sahota:  Starburst includes a few GenAI capabilities to improve user experiences and help automate some data engineering tasks. These futures are currently in preview.

  • AI Agents for Data Products to power intelligent analytics: This transformational Data Product capability allows organizations to harness the full potential of ALL of their data assets, regardless of source and location, for AI-driven/AI-assisted solutions, insights, and decision-making. An AI Agent within Starburst simplifies the discovery of business context of enterprise data from a schema to columns, documents or enriches data, creates Data Products, and makes the Data Products available to AWS Bedrock for AI assist / allowing users to use natural language to analyze the data.
  • SQL statement generation from business questions (text-to-SQL): Starburst allows users to ask natural language questions about their data within a schema or table, resulting in an SQL statement answering the question. As part of the prompting process, we provide additional metadata about the source to ground the LLM and help provide more pertinent results.
  • Query explanations (SQL-to-text): This enables Starburst to not only explain the query that was generated and executed but also allows you to provide additional context and dig deeper into the questions you may have.

As a chatbot, SQL-to-text can be leveraged to generate any type of output you desire, from a simple technical or domain-specific answer to a question to a summary or comprehensive documentation that can be used with your data assets or products.

This also means business continuity can be maintained - critical, but undocumented transformations can quickly be explained. With added context provided by the user during the Q&A process, businesses can also more effectively derisk staff turnover.

  • Data classification: Starburst ABAC allows administrators to associate access policies to tags, and apply policies to data by assigning tags to the data.  The tags and policies are typically created and assigned based on the data’s business context (e.g. a “PII” tag with a masking policy to an e-mail or username).  

While ABAC more effectively allows administrators to adhere to principles of least privilege, its scalability is low compared to RBAC, particularly for larger organizations.  Starburst’s data classification feature makes ABAC scalable by leveraging AI paradigms to assess data and recommend tags that an administrator can apply, modify, or ignore.  This reduces the last-mile burden of tagging to allow organizations to scale their ABAC implementation.

VMblog:  What new product announcements or demonstrations can attendees expect to see at your booth during AWS re:Invent 2024?

Sahota:  Attendees can visit us at Booth 1175 to see Starburst Galaxy, our fully managed Open Lakehouse built with Trino, in action. Core demos on display will include:

  • AI Agents for Data Products to power intelligent analytics: This transformational Data Product capability allows organizations to harness the full potential of ALL of their data assets, regardless of source and location, for AI-driven/AI-assisted solutions, insights, and decision-making. Attendees can see how they can use an AI Agent to discover the business context of enterprise data from a schema to columns, document or enrich  data, create Data Products, and make the Data Products available to AWS Bedrock for AI assist / allowing users to query the data with natural language. 
  • Near real-time analytics: See how Starburst can ingest up to 100GB/second of Amazon MSK or other Kafka topics, land into an Iceberg table in S3, transform and govern it, and make it ready to be analyzed within a minute by an optimized Trino engine.
  • Data federation: Discover and securely access distributed cloud data across AWS S3, Redshift, Snowflake, BigQuery, and 15+ other sources for interactive or ad hoc analytics.

VMblog:  Cost optimization in the cloud remains a crucial concern - how does your solution help organizations maximize their AWS investment?

Sahota:  Starburst helps organizations maximize their AWS investment by offering industry-leading price performance for their analytics, which lowers their compute costs. Starburst easily integrates into the existing AWS stack, requiring minimal configuration, and customers can begin realizing value immediately. Starburst can deliver up to 9.85x cost savings for streaming and interactive workloads and up to 11.5x faster SQL. 

VMblog:  Security and compliance are top priorities for AWS users. How does your solution strengthen an organization's cloud security posture?

Sahota:  Starburst becomes a single access point for organizations' distributed data, whether in S3, Redshift, or other data stores. At this single point of access, customers can apply necessary access control policies using RBAC and ABAC to ensure the right users can access authorized data. Furthermore, customers can use AWS Private Link with Starburst Galaxy for more security.

VMblog:  What hands-on experiences or interactive demonstrations will you be offering at your booth this year?

Sahota:  This year, attendees can visit booth 1175 for interactive demos from expert solution architects on AI Agents for analytics, data products, Icehouse, federated queries, price-performant SQL, and more.

One of the sexiest demos at the show will be using AI Agents to build, understand, and analyze Data Products with AWS Bedrock and Starburst.

VMblog:  Many organizations are adopting multi-cloud strategies. How does your solution support customers who use AWS alongside other cloud providers?

Sahota:  Starburst can be easily used across AWS, Azure, and GCP, as it's available as a SaaS or self-managed solution on all three clouds. Furthermore, customers can deploy on AWS and use federated queries to analyze data in other non-object store cloud sources like Snowflake for ad-hoc analytics or data discovery use cases.

VMblog:  What specific roles or job functions within an organization would benefit most from visiting your booth at re:Invent?

Sahota:  Practitioners and leaders in data engineering, data science, AI/ML engineering, and analytics can all benefit from seeing how Starburst's Open Hybrid Lakehouse makes it extremely easy to discover, access, govern, analyze, and share.

VMblog:  For attendees who want to learn more, what special offers, resources, or follow-up opportunities will be available at your booth during AWS re:Invent 2024?

Sahota:  We encourage attendees to visit us at booth 1175 to see Starburst Galaxy in action and sign up for a free trial to experience the power of Galaxy firsthand. We'll also have additional resources related to Data Products, Near Real-time Analytics,  Hadoop Modernization, and more for attendees to pick up or download. 

VMblog:  Are you giving away any prizes at your booth or participating in any prize giveaways?

Sahota:  No, but we are bringing back our show-stopping swag that is free for all re:Invent attendees.

David Marshall

David Marshall has been involved in the technology industry for over 19 years, and he's been working with virtualization software since 1999. He was able to become an industry expert in virtualization by becoming a pioneer in that field - one of the few people in the industry allowed to work with Alpha stage server virtualization software from industry leaders: VMware (ESX Server), Connectix and Microsoft (Virtual Server).

Through the years, he has invented, marketed and helped launch a number of successful virtualization software companies and products. David holds a BS degree in Finance, an Information Technology Certification, and a number of vendor certifications from Microsoft, CompTia and others. He's also co-authored two published books: "VMware ESX Essentials in the Virtual Data Center" and "Advanced Server Virtualization: VMware and Microsoft Platforms in the Virtual Data Center" and acted as technical editor for two popular Virtualization "For Dummies" books. With his remaining spare time, David founded and operates one of the oldest independent virtualization news blogs, VMblog.com. And co-founded CloudCow.com, a publication dedicated to Cloud Computing. Starting in 2009 and continuing all the way to 2016, David has been honored with the vExpert distinction by VMware for his virtualization evangelism.

Platinum Sponsors

GitLab

Gold Sponsors

Cohesity

Starburst.io

sumo logic

Latest Videos