VMblog 2022 Mega Series Q&A: Komprise Explores and Educates on Unstructured Data Management

August 01, 2022

VMblog 2022 Mega Series Q&A: Komprise Explores and Educates on Unstructured Data Management

Written by David Marshall

Welcome to the VMblog 2022 Mega Series where we'll be covering a number of important topics throughout the coming months. In this series, you'll be hearing from the industry leaders and experts in order to help you make important decisions within your own organization. Follow along for a chance to better understand a number of topics and find out more about some of the best technologies available out there in the industry.

In today's Q&A, we're speaking with industry expert, Krishna Subramanian, COO, President and Co-founder of Komprise. And we're diving into the topic of Data Management and Protection.

VMblog: Provide a little backgrounder information on the company. What does your company look like in 2022 and beyond?

Krishna Subramanian: Komprise was founded in 2014 to solve the massive unstructured data growth problem that has been exploding in the last few years. Today we serve data-heavy industries -including public sector and some of the largest companies in life sciences, financial services and media & entertainment-which are sitting on petabytes of data that is difficult to manage and mine for new insights. Komprise is a SaaS platform for unstructured data management and mobility with an analytics-first approach. Komprise Intelligent Data Management transparently tiers and migrates data to lower-cost storage while also delivering the means to easily search and deliver the right data to cloud-based AI and ML services. Komprise customers typically save 60-70% on storage, backup and cloud costs with smart data migration and tiering, which automates data movement so that data always lives in the right place at the right time according to its age, usage or other parameters. Komprise doubled sales in 2021 and we expect to do this again in 2022 because enterprise IT organizations are feeling the pain of outsized data storage costs and the need to become more data-driven to compete.

VMblog: We are here to talk about Unstructured Data Management. How does your company see it and define it?

Subramanian: Unstructured data, firstly, is any data that doesn't fit into rows and columns in a database. It accounts for at least 80% of all data generated and stored today globally-- such as documents, video and audio files, application, research and sensor data. While there are many tools on the market for data migration, in our view, unstructured data management and mobility must be independent from storage, backup and cloud infrastructure platforms. The reason is that enterprises today often have multiple, hybrid storage technologies in place and want to flex and move to new storage offerings in the cloud or otherwise when they wish. The requirements for unstructured data management include: solutions must be data-centric not storage-centric, don't disrupt users and workflows, must not be in the hot-data path and must deliver data to the target storage in native format to avoid proprietary lock-in and maximize value.

VMblog: What's the most important thing happening in your field at the moment?

Subramanian: Over the past decade, storage IT has been transforming from being an infrastructure-centric function to a service provider for the business. This shift is accelerating with the growing importance of AI/ML and the cloud. Progressive IT organizations are no longer viewing their storage team as tactical managers of storage infrastructure. Instead, they are delivering intelligent data services across the edge and cloud. Storage IT teams are looking for ways to empower business users to find and feed the right data into their AI/ML workflows, and for ways to manage and mobilize data at scale across silos. Basically, the shift is from storage infrastructure to intelligent data services. This requires a shift from a storage-centric to a data-centric view of data management.

VMblog: What are the biggest challenges which your customers face today?

Subramanian: Enterprise customers are grappling with spiraling data storage costs and complexity as unstructured data volumes have not only exploded but data is heavily distributed across many different silos-from on-premises to edge to the cloud. This reality means that it's increasingly difficult to control data for cost management or regulatory and data protection needs and to meet the ever-changing requirements of users and departments. The long game is the ability to extract value by feeding the right data to AI/ML platforms and analytics services - another challenge related to distributed data and the lack of visibility into it across silos.

VMblog: What are the most common use cases of your product and what is the value add you offer to your customers?

Subramanian: Our most common use cases today are helping customers in their hybrid cloud journey, to do smart data migration from NAS to the cloud or other NAS devices. Customers realize that migration is more than just an isolated lift-and-shift; by analyzing and understanding data usage, you can focus expensive Flash and backup resources on hot data while keeping 80% or more of the data in an object tier without restricting its access. Komprise delivers this intelligent approach to data migrations. Komprise can also handle cloud to cloud migrations. Our Elastic Data Migration technology is built to scale for petabyte-size migrations with many features for resiliency to speed up migrations and minimize errors and data loss. Secondly, we give customers a powerful tool for cloud tiering: By analyzing their data across all storage, they can see how to save money by tiering "cold" or rarely used data to cheaper cloud storage. In many cases, customers are finding they can cut 70% or more of storage and backup costs by continually offloading cold data from expensive, on premises storage. The growing use case is to shift from managing data volumes to delivering greater data value. For Komprise this means Smart Data Workflows that allow you to define and execute automated processes to find, tag and move unstructured data to cloud data lakes and AI/ML tools where users can generate new insights from it.

VMblog: Can you give a few examples of how your offerings are unique? What are your differentiators?

Subramanian: Komprise delivers competitive differentiation with an analytics-first approach to data management, supported by a few key technologies. Our Global File Index (GFI) provides an easy way to visualize all data and search for specific files across all storage-on-premises or in the cloud. Secondly, our patented Transparent Move Technology (TMT) moves data without user disruption, which according to our latest unstructured data management survey is a top goal for enterprise IT. TMT means that data is fully accessible from the source as files, exactly as before, using standards-based Dynamic Links. TMT also allows for native access to data once it is moved to object storage so users can leverage cloud-native tools for ML and analytics. Thirdly, we have Elastic Data Migration, which moves data to the cloud 27x faster than standard migration tools and with higher reliability. Finally, in May 2022, Komprise announced Smart Data Workflows, which gives IT users the ability to create automated workflows for all the steps required to find the right data across storage assets, tag and enrich the data and send it to external tools for analysis or other cognitive functions.

VMblog: How are the roles of storage manager/storage architects going to evolve in the coming years?

Subramanian: Storage admins and architects will evolve to cloud and data management architects. As storage technology from vendors and cloud providers is today much more automated and software-based, data storage professionals will spend less time managing and configuring hardware and more time on analytics and data strategy. Required knowledge sets will include compute, automation, DevOps, and containers-along with cross-functional collaboration and communication skills to work with line of business managers and teammates from IT, security, legal and HR. Storage architects will focus on making sure that diverse storage, backup and disaster recovery systems on-premises and in the cloud work well together. Reducing the footprint of expensive storage and generating revenue-driving activities revolving around data are top priorities for progressive organizations already.

VMblog: What is the role of data management/storage in ransomware prevention?

Subramanian: Ransomware detection and prevention is a big industry - but still not foolproof to thwart these sophisticated hacker groups. Storage teams can play a role by creating immutable copies of data which ransomware actors cannot modify. A great way to get started is by identifying cold data-such as, that which has not been accessed in over a year - and tiering and archiving it from expensive storage and backups into a resilient object-locked destination such as Amazon S3 IA with Object Lock. This creates an isolated and immutable recovery copy-it cannot be modified such as by a ransomware actor's encryption--while drastically cutting storage and backup costs. Organizations may also choose to move their "hot" data to object lock storage as an extra copy for ransomware protection. Komprise analytics can identify and continuously move data by policy as it ages. Komprise is different than other vendors because when the solution moves a file to its destination, the entire file is isolated. Storage vendors use block-level tiering which only isolate parts of the file. Therefore, if a newer version of a file gets infected with ransomware, you can still recover older versions. Komprise detects file modifications and copies the modified files as new versions.

VMblog: What are the challenges that enterprises face in using cloud-based data lakes, AI and ML tools?

Subramanian: Gee, where should we start? On a positive note, we've seen notable progress in the past few years in terms of the ease of use, affordability and sophistication of these tools. You've got companies like DataBricks and Snowflake creating powerful platforms for cloud-based data lakes that can incorporate unstructured data. And the major cloud vendors themselves are continually developing and releasing new services for AI and machine learning to tackle thorny issues such as PII detection, fraud detection, real-time text analysis, image analysis and platforms to simplify the creation of machine learning models. What's important here is that first, we need to get the right data into these data services. Machine learning models require large amounts of data to increase accuracy - ignoring unstructured data is not viable. Getting the right data into these data lakes and tools to avoid data swamps is a significant challenge due to the data sprawl issue that we discussed earlier.

VMblog: How can you mitigate these challenges to leverage affordable analytics platforms and generate new value from data?

Subramanian: Organizations need easier ways to categorize, segment, and search all their data so that data sets can be quickly discovered and sent off for analysis into these new platforms. Our Smart Data Workflows technology is a step in the right direction. With systematic, policy-based automation, it becomes possible to continually feed analytics and data lakes with the right data rather than doing massive data dumps and trying to sort through the mess later. Storage teams have an important role to play so that data scientists can spend less time filtering and searching for the data they need and more time on the analysis.

VMblog: Which emerging technology do you think holds the most promise once it matures?

Subramanian: Adaptive automation is a combination of machine learning, predictive analytics and policy-based automation; it's a relatively new way of thinking holistically about any domain and is applicable to data management. Instead of creating separate solutions for data analytics, data migration, data tiering and data workflows, an adaptive-automation based approach uses analytics to drive all these processes and gather feedback to continuously improve. This holds tremendous promise because it makes systems intelligent while providing sufficient human guidance and controls.

VMblog: What will forward-thinking companies be doing this year in unstructured data management and storage?

Subramanian: Companies are realizing that flexibility and simplicity are key to managing data at scale. This means they can no longer tie their data to a single storage architecture or vendor. They know the data they move to the cloud is far more valuable than where it is stored - so native access to cloud services at every tier becomes important. Companies are shifting away from storage-centric management to data-centric management. They are investing in ways to understand and mobilize data regardless of where the data sits.

VMblog: Can you share a customer success story that you're proud of from the last year?

Subramanian: Pfizer has been a wonderful customer of ours and also exemplifies our strong partnership with AWS. Like other life sciences companies, Pfizer needs to keep historical data for future R&D: recall that SARS data was useful to researchers in 2020 when developing vaccines and treatments for Covid-19. Yet keeping petabytes of data on top-grade, on-premises storage isn't a sound financial decision if the data is not accessed regularly. Komprise migrates data automatically to Amazon S3 as it reaches 2 years in age. This is saving Pfizer 75% annually on storage and backup costs and also laying the foundation for Pfizer researchers to use Amazon's tools for AI and ML on the data that has been moved. We love that Pfizer is not only realizing significant cost advantages with Komprise but that they see the solution as a way to leverage archived data for future product development.

VMblog: What big changes do you see taking shape in the industry?

Subramanian: Data storage teams are increasingly required to justify spend and investment. You need to know what you have and be able to make projections as you modernize your IT infrastructure. Holistic visibility across all data is crucial. Secondly, flexibility and mobility are paramount. Customers don't want to be locked into any storage platform. Data must be securely available in the right format for the right people and applications at the right time. Finally, we are now seeing a push to define and deliver value. Storing data securely, protecting data and ensuring you're meeting all compliance and regulatory requirements is expected. Being able to derive value from your data drives differentiation and competitive advantage. Data-driven leaders understand this, so the race is on to unlock the potential of data, most of which is unstructured.

VMblog: Can you talk about any new features or product updates you might have coming out in the near future?

Subramanian: We recently ran a survey where customers prioritized enabling use of data in the cloud for their business. We are investing heavily in this area with our Smart Data Workflows capabilities. Komprise launched Smart Data Workflows earlier this year to automate the steps needed to search, cull and move data into data lakes and AI/ML applications. We are enhancing this with greater flexibility and control with capabilities such as role-based delegation and greater support for external data processing (eg AWS Lambda) services both in the datacenter and clouds.

Published in Mega Series 2022

Tagged under

David Marshall

David Marshall has been involved in the technology industry for over 19 years, and he's been working with virtualization software since 1999. He was able to become an industry expert in virtualization by becoming a pioneer in that field - one of the few people in the industry allowed to work with Alpha stage server virtualization software from industry leaders: VMware (ESX Server), Connectix and Microsoft (Virtual Server).

Through the years, he has invented, marketed and helped launch a number of successful virtualization software companies and products. David holds a BS degree in Finance, an Information Technology Certification, and a number of vendor certifications from Microsoft, CompTia and others. He's also co-authored two published books: "VMware ESX Essentials in the Virtual Data Center" and "Advanced Server Virtualization: VMware and Microsoft Platforms in the Virtual Data Center" and acted as technical editor for two popular Virtualization "For Dummies" books. With his remaining spare time, David founded and operates one of the oldest independent virtualization news blogs, VMblog.com. And co-founded CloudCow.com, a publication dedicated to Cloud Computing. Starting in 2009 and continuing all the way to 2016, David has been honored with the vExpert distinction by VMware for his virtualization evangelism.

VMblog 2022 Mega Series Q&A: Komprise Explores and Educates on Unstructured Data Management

David Marshall

Latest from David Marshall

Related items