November 08, 2024

Komprise at SC24: How Enterprises Can Cut Storage Costs by 70% While Optimizing Data for AI Workloads

Written by

Ahead of SC24 (Supercomputing 2024), VMblog spoke with Krishna Subramanian, COO and Co-founder of Komprise, about how organizations are tackling petabyte-scale unstructured data management challenges.

As enterprises increasingly adopt high-performance computing (HPC) and AI initiatives, they face mounting storage costs and data management complexity. Komprise, making its booth debut at SC24 (#414), offers solutions that not only promise 70% storage cost reductions but also help organizations optimize their data for AI workflows through intelligent tiering, tagging, and lifecycle management.

In this exclusive Q&A, Subramanian discusses how Komprise's platform helps enterprises understand and manage their data across storage silos while preparing for the AI-driven future.

VMblog: If you were giving an SC24 attendee a quick overview of the company, what would you say? How would you describe the company?

Krishna Subramanian:  Komprise is a platform for independent, unstructured data management. We help enterprises with petabyte-scale data volumes understand their data across storage silos. This insight allows IT users to create policies for its most ideal management, such as tiering cold data to cheaper storage, moving PII data to secure storage or finding just the right data sets for AI. Komprise delivers continuous analysis so you can understand your organization's data-how much you have, how fast it's growing and which data is most important based on access patterns. You can always ensure that your data is at the right place at the right time. We frequently save our customers 70% or more on annual data storage, backup and ransomware data protection costs as a result.

VMblog: Your company is sponsoring this year's SC24 event. Can you talk about what that sponsorship looks like?

Subramanian:  While we have attended the event in the past with partners, this is the first year we'll have a booth (#414). We are lining up meetings with customers, partners and prospects at the show and we'll be demonstrating our new Directory Explorer and Smart Data Workflow Manager at the booth as well as giving away some fun prizes.

VMblog: What kind of message will an attendee hear from you this year? What will they take back to help sell their management team and decision makers? Explain your technology.

Subramanian:  Expanding the horizons of high-performance computing (HPC) is the theme of SC24. We love this because it speaks to the fact that HPC creates many opportunities for innovation, but it also creates tons of unstructured data. The data deluge is so big that you cannot look at a single storage or backup or cloud vendor to solve it. Also, as HPC is maturing and becoming core to many enterprise organizations, IT organizations are moving from DIY, open-source tools to commercial data management software.

Our message to attendees who oversee data storage is simple: if you have a lot of unstructured data, you need a solution like Komprise. The reason is that storage vendors cannot solve the higher-level problems IT organizations face-and that is how to reduce the high cost of unstructured data and ensure that data is optimized across its lifecycle using the best storage for the use case at hand. Komprise is best positioned to help enterprises manage all their unstructured data across storage -whether it's in your data center or in the cloud.

It starts with visibility and analytics. Komprise rapidly analyzes and reports on all your file and object data no matter where it lives. Our Global File Index, which provides deep analytics across billions of files, allows users to search, tag and create curated data sets.

Komprise Elastic Data Migration gives customers the fastest, most reliable migrations for both SMB and NFS data; it migrates large data volumes 25X to 27X faster than common tools and with built-in features to prevent failures and data loss such as auto retries and checksums.

Smart Data Workflow Manager is another key component. This provides a point-and-click UI wizard that helps IT users set up an AI data workflow - from searching for the right data set, to configuring and tuning the AI service, to defining the tags and how frequently the workflow should run and monitoring projects from a single view. 

VMblog: Thinking about your company's technologies, what are the types of problems you solve for an SC24 attendee?

Subramanian:  We're lucky in that our solution solves multiple pressing issues:

  1. Reducing 70-80% of storage costs across vendors: We accomplish this through analysis in our Global File Index which shows data growth rates, amount of data in storage, and time of last access so you can model plans to save. For instance, you can see potential savings of moving "cold" data that is one year or older and rarely accessed into secondary storage. You could also model cost savings from moving on-premises data to a cloud object storage tier. Komprise has patented Transparent Move Technology (TMT)TM that tiers data across hybrid storage while maintaining native access to the tiered data both from the original location and from the cloud.
  2. Tagging data for improved classification and segmentation. Metadata enrichment is increasingly valuable as unstructured data volumes grow into multiple petabytes in organizations. By adding tags to data, indicating file contents, location or project, data becomes more searchable. This is useful for quickly identifying sensitive data types, such as those containing PII, or tracking down specific data sets for use in AI and ML projects, among other use cases.
  3. Twenty-five times faster, lower-risk migrations. Increasingly, especially in hybrid cloud environments, data is on the move. Organizations are looking to adopt new storage that is more efficient or applicable to certain use cases such as AI. Yet large-scale data migrations are often painful, complex and may not deliver the expected ROI. Komprise has a proven process to analyze your environment and data prior to migration to ensure that you are moving just the right data to the right storage. The assessment also prevents breakdowns that often happen during migration related to networking, security or other infrastructure configurations. Komprise Elastic Data Migration is significantly faster than many common tools and has built-in features for reliability and ease of use, such as by retaining all file permissions after a migration.
  4. Data lifecycle management. One-size-fits-all storage is no longer viable in today's world because of the size of data. Komprise helps organizations analyze their environments and execute data movement as it ages. You can automate policies to tier data from hot to warm to cold data tiers according to parameters that you set. Because of our patented TMT technology, you can access your data at any tier later, without expensive rehydration to the original storage. And you always have native access to your data wherever Komprise moves it - which is beneficial especially for cloud-based AI and ML programs.
  5. Added ransomware protection. By reducing the data stored on your expensive NAS though cold data tiering to immutable object storage in the cloud, you are reducing the footprint of your attack surface while also delivering enormous cost savingsUnlike storage tiering solutions that lock the data into their file format and are incompatible with ransomware protection solutions like tamperproof snapshots, Komprise technology is transparent and fully compatible with ransomware protection and backup solutions.
  6. Preparing data for AI. Getting unstructured data ready to safely use in AI tools is one of if not the largest challenge in executing AI. Komprise offers a Google-like search across disparate data silos. You can tag your data via UI and API to enrich the metadata, making it more useable in AI. As covered earlier, Smart Data Workflow Manager is the foundation for creating automated AI data workflows that enrich data and curate the right data sets for the right tools.

VMblog: While thinking about your company's technology and solution offerings, can you give readers a few examples of how your offerings are unique? What are your differentiators? What sets you apart from the competition?

Subramanian:  Komprise is the only standards-based storage-agnostic data management software that is used by the who's who of the Fortune 5000 to manage petabytes of data at scale. This is for three key reasons:

  • Scale-out with no central bottlenecks or agents: Komprise has been designed from the ground up to handle the massive scale of unstructured data. It has a lightweight, distributed, scale-out architecture with optimized protocol handling and no agents or stubs. Komprise has developed adaptive algorithms that maximize parallelism and performance while not impeding active data access. Benchmarks show Komprise is 27x faster for NFS and 25x faster for SMB data movement. Many vendors claim to be distributed but they are legacy client-server solutions that use agents or proprietary controllers which limit scale.
  • Patented Transparent Move Technology (TMT) extends any vendor namespace to the cloud: Storage vendors offer some ways to move and tier data within their file system, but these are limited to the technologies they support. Data migration tools can move data, but they do not extend the original namespace, meaning users must look for data in multiple places. Komprise is the only solution with patented Transparent Move Technology to tier data across vendors and architectures transparently and extend the original file namespace without locking data into Komprise or the storage vendor. Customers get non-disruptive mobility with maximal flexibility and savings.
  • AI-ready data and Smart Data Workflows: Komprise lets you search and pick just the right data to feed to any AI or processing engine and then systematically record the results as tags. You can create and execute iterative AI workflows without the penalty of waiting for months to move petabytes of data from one system to another. Komprise also creates data governance by tracking data movement into AI, which is critical as AI moves from use by a few data scientists for model training to use by anyone in the enterprise for inferencing.

VMblog: What major industry challenges or technological bottlenecks do you see your solutions addressing in the high-performance computing (HPC) landscape, particularly in relation to emerging AI/ML workloads?

Subramanian:  Three key challenges we address in HPC especially as it relates to AI/ML workloads:

  • Systematic data sharing for RAG and inferencing: AI/ML relies on your organization's data and the better data you give it, the better your results will be. While it is relatively simple for a data scientist or AI engineer to create a vectorized dataset for training a model, it is much harder to figure out what data to use for inferencing during RAG and how users will share data securely when using the AI/ML models. For this, you need a systematic data workflow that will help users search and pick the relevant data. Next, the system moves the data to the AI process and tracks what was sent and then stores the results so you don't run the AI over and over again. This systematic data workflow execution for RAG and inferencing is what Komprise provides.
  • Data governance for AI workflows: As users start sharing organizational data with AI, IT needs a way to audit what was shared, ensure that sensitive information is restricted from AI and create data governance mechanisms. Komprise provides the framework for data orchestration and data governance with AI, especially for augmentation, inferencing and use of AI models.
  • Managing data lifecycle to optimize AI resource consumption: AI compute and storage is expensive. Once data has been processed, an unstructured data management system should move it back out quickly to reduce waste and cost overruns. Komprise manages data movement to and from AI processing engines to reduce the cost of AI.

VMblog: Data movement and storage continue to be critical challenges in HPC. What innovations is your company bringing to market to address these bottlenecks, and what performance improvements can customers expect?

Subramanian:  Komprise is a comprehensive unstructured data management solution that delivers IT stakeholders insight into their file and object data, the ability to see across silos, and the ability to create automated policies to tier, copy or migrate data. Moving petabytes of data from one storage technology to another presents many issues, with speed and performance being high priority. Komprise Hypertransfer is a migration technology that solves the challenge of moving many small SMB files; metrics show a 25x improvement in speed over a WAN. Komprise also allows users to create custom data workflows across systems using an AI processor, saving time. See Duquesne Case Study.

VMblog: As security concerns continue to grow in the HPC space, especially with the integration of AI workloads, what approaches and technologies is your company developing to ensure both performance and protection?

Subramanian:  While Komprise does not store customer data, we help our customers manage data risk. First, we know that unstructured data presents a huge risk for ransomware attacks because of its size and because it is spread across silos and sometimes hidden. Komprise offers an additional ransomware defense by identifying and tiering cold data, which can be 80 percent or more of data, to object locked storage where hackers can't access or modify it. This dramatically reduces your attack surface for ransomware. Secondly, sensitive data leakage to AI is a valid and growing concern. Komprise provides the tools to find, tag and segregate sensitive data so that users can't send it to AI tools and prompts. Also, Komprise can track what data was moved into an AI system. Learn more here.

Last modified on November 08, 2024
David Marshall

David Marshall has been involved in the technology industry for over 19 years, and he's been working with virtualization software since 1999. He was able to become an industry expert in virtualization by becoming a pioneer in that field - one of the few people in the industry allowed to work with Alpha stage server virtualization software from industry leaders: VMware (ESX Server), Connectix and Microsoft (Virtual Server).

Through the years, he has invented, marketed and helped launch a number of successful virtualization software companies and products. David holds a BS degree in Finance, an Information Technology Certification, and a number of vendor certifications from Microsoft, CompTia and others. He's also co-authored two published books: "VMware ESX Essentials in the Virtual Data Center" and "Advanced Server Virtualization: VMware and Microsoft Platforms in the Virtual Data Center" and acted as technical editor for two popular Virtualization "For Dummies" books. With his remaining spare time, David founded and operates one of the oldest independent virtualization news blogs, VMblog.com. And co-founded CloudCow.com, a publication dedicated to Cloud Computing. Starting in 2009 and continuing all the way to 2016, David has been honored with the vExpert distinction by VMware for his virtualization evangelism.

Gold Sponsors

Graid Technology


Hammerspace


Komprise


PEAK:AIO


StoreONE

Latest Videos