October 22, 2020

What is Data Orchestration, and Why You Should Care

Johan Ballin, Director of Technical Marketing at Hammerspace

Background

Kubernetes understood its limitations and, therefore, left storage out. Provisioning compute is a distinctly different discipline than provisioning storage. Kubernetes introduced the notion of Persistent Volumes (PV) and Persistent Volume Claims (PVC) to manage the challenge of providing persistent storage to non-persistent applications, such as Kubernetes Pods. They are APIs that decouple the what from the how or, in other words, consumption from implementation details. It is a highly useful disaggregation that turns infrastructure into reusable storage pools for Kubernetes Pods.

Data Antigravity

Hammerspace has taken an identical approach to Kubernetes by decoupling data from the underlying infrastructure. This approach frees data from the limitations of storage infrastructure. The explosive growth of data has amplified challenges such as silos, sprawl, data placement, and management of capacity and performance. Finding data across disparate silos has become increasingly difficult leading to assets being created over and over again leading to more and more copies. Building silos inevitably leads to stranded or insufficient performance and capacity driving costs higher. Moving data between silos or to new components not only drives costs higher but also leads to dreaded maintenance windows and downtime. Data, in short, has gravity! By decoupling data from those limitations, Hammerspace removes the silo boundaries reducing costs, increase collaboration by making it ubiquitous, and the usefulness of data. Hammerspace is anti-gravity for data!

Data Orchestration

The ability to abstract data from underlying storage infrastructure is unique to Hammerspace. It separates the ‘how' (data management) from the ‘what' (storage management). In other words, it is not necessary to become a storage expert just to provision storage resources to an application. This is exactly what makes up data orchestration. This is also what makes Hammerspace distinctly different from legacy storage or data management solutions. What Kubernetes has done for applications, Hammerspace does for data management. So how do we go from managing storage silos to orchestrating data? The data orchestration journey involves four interdependent steps. They are decoupling, assimilation, objectives, and portability.

Decoupling

The first step in the data orchestration journey is the decoupling of data from the limitations of the underlying infrastructure. Decoupling infrastructure removes legacy storage ills such as silos, sprawl, out-of-control copies, the lack of business-level controls, and downtime due to forklifts and software upgrades. To orchestrate data, we must first liberate data from its infrastructure limitations.

data-orchestration-decoupling

Assimilation

Data gravity is another major obstacle in data orchestration. The topic often comes up and is generally assumed to be an intractable problem constrained by the laws of physics, but what if there were a more elegant solution? Hammerspace has the ability to assimilate the metadata of unstructured storage silos. This effectively tears down the walls between disparate storage solutions. By first decoupling silos and then assimilating their metadata the walls between, for example, a NetApp and an Isilon cluster we can now treat it as a single pool of resources. Hammerspace metadata assimilation unites siloed data and makes it possible to perform data management without data gravity.

data-orchestration-assimilation

Objectives

Epistemology, the science of knowledge, outlines two distinct areas: the what and the how. Knowing what something is versus how to accomplish it. This important distinction is not restricted to a classroom discussion on symbolic logic. It pertains directly to how we leverage technology to accomplish business objectives. Hammerspace Objectives are declarative statements that define the desired end-state through the metadata without having to make infrastructure changes.

data-orchestration-objectives

Declaring the intent of data (i.e. its desired end-state) is vastly simpler than having to define every single step to be taken (imperative policies) to accomplish the desired end-state. We have already removed silo boundaries and data gravity in previous steps. We can now leverage declarative policies, called objectives in Hammerspace nomenclature to replace error-prone imperative policies to accelerate business outcomes.

data-orchestration-policies

Portability

This is the crowning jewel in data orchestration. The subsequent steps of removing silo boundaries, removing data gravity and declaring desired business outcomes has paved the way for the final challenge in data orchestration: how to make data portable. Hammerspace Global Namespace allows underlying infrastructure to be addressed as a single resource pool. NFS workloads can access and consume exports with universal naming consistency regardless of location. SMB workloads can access globally consistent UNC paths.

data-orchestration-portability

Hammerspace Global File System serves NFS v3 and v4.2 as well as SMB v2 and v3 with full multi-protocol access and preservation of SMB ACLs. Hammerspace also provides a fully featured Container Storage Interface (CSI) for any-and-all Kubernetes workloads. The Hammerspace CSI plug-in provides Kubernetes Persistent Volumes that deliver block volumes, local file storage volumes, as well as shared storage volumes. Hammerspace makes data portable and ubiquitous.

Conclusion

Data Orchestration relies on the subsequent steps of decoupling data from infrastructure, metadata assimilation, leveraging declarative statements to accelerate business outcomes, and mobilizing it to make it ubiquitously available to all applications, end-users, and workloads. Decoupling data from infrastructure frees data from legacy storage limitations. It unites disparate storage silos into cohesive resource pools. Metadata assimilation removes data gravity that ultimately makes data easier to manage and distribute according to business intent. Objectives, in turn, allows you to declare the desired end-state without having to figure out every single step to be taken. Hammerspace-powered machine learning knows how to perform the necessary steps to produce the desired end-state you declare with an objective. It is important to note that Hammerspace Objectives can be implemented through GUI, CLI, or API.

data-orchestration-conclusion

Hammerspace has rich custom metadata options, such as tagging, descriptors, and classification. A Hammerspace Objective can be as simple as a GUI checkbox that delivers non-disruptive data migration from one vendor to another, from one data center to another, from a data center to a public cloud, or even between clouds. Objectives can also take the form of custom scripts that powers a highly sophisticated workflow or data pipeline. Hammerspace delivers the power to free data from infrastructure silos, removing data gravity, accelerating business outcomes, and making file data available on planet-scale.

Get Started with Hammerspace Data Orchestration Today

Take a look at how you can modernize workloads and eliminate silos with Hammerspace data orchestration by going to the URL below.

https://hammerspace.com/10-tb/

About the Author

Johan Ballin

Johan Ballin is Director of Technical Marketing at Hammerspace where he is responsible for technical publications. He has over two decades of industry experience in diverse technologies and roles. Johan has a deep passion for evangelizing products and solutions that address complex challenges. He has previously held positions at NetApp, Qumulo, MapR, Riverbed, and F5 Networks. His current focus includes parallel file systems, orchestration, virtualization, and all things cloud.

Published in KubeCon + CloudNativeCon 2020

Tagged under