Neeharika Palaka and David Schulman co-wrote this post.
Scaling data science is hard. From natural language processing for model-driven policy approvals to image classification in biotechnology to supply chain risk and anomaly detection in advanced manufacturing, data science brings tremendous promise—but not without challenges. Only 21 percent of businesses are gaining a major competitive advantage through the use of data and analytics tools, according to a recent survey.
The primary challenge lies in finding the right tech stack to scale adoption of data and analytics tools so that IT can efficiently manage DevOps, while data science teams can focus on quickly delivering business impact through frequent iteration and speedy model deployment instead of on infrastructure challenges.
Data science requires specialized environments with access to data, tools, packages, and infrastructure to handle bursty workloads. Without Kubernetes to containerize and virtualize workspaces, data scientists would frequently have to:
Work with IT to install and manage new packages, or provision new software or infrastructure
Debug code issues caused by data scientists using different environments
Wait unnecessarily long to onboard new team members due to environment, infrastructure, or data access issues
Spend time updating old projects that have been rendered unusable due to environment changes, or a lack of collaboration and knowledge sharing
Take manual steps across the data science lifecycle to operationalize and monitor models in production
While Kubernetes is a great foundation for tool agility, faster iterations, and reproducibility within a data science platform, your data scientists shouldn’t have to become Kubernetes experts to realize its promise. Domino Data Lab partners with VMware Tanzu to tailor Kubernetes for our data science customers. The Machine Learning Operations (MLOps) software from Domino works together with VMware Tanzu portfolio products, such as VMware Tanzu for Kubernetes Operations, to provide many benefits by virtualizing data science workspaces:
Usable interface for data scientists – Data scientists often come from different backgrounds than engineers and DevOps folks, and can require an intuitive interface and APIs that abstract some Kubernetes concepts
Containerized environment management – Achieve reproducibility by allowing data scientists to create, update, and manage environment images that will be used for data science workspaces
Integrated user management and permissions – Kubernetes provides a set of authorization primitives, but unique considerations arise when providing true user isolation for colocated workloads and managing sensitive information (e.g., access credentials for data connections)
Data science-specific scheduling – Data science workloads often require complex, multi-pod workloads, which can be a challenge with Kubernetes alone
Resource controls – Kubernetes needs to be augmented so that administrators can balance user access to scalable compute resources, and to exercise sufficient controls to manage costs and prevent one user from monopolizing capacity
Read on to learn more about our partnership, and then be sure to watch our webinar (available for replay) to get the full story and see the products in action together.
Domino Data Lab enterprise MLOps platform
Domino Data Lab’s enterprise MLOps platform, trusted by over 20 percent of Fortune 100 companies, helps customers overcome the biggest challenges to data science at scale: infrastructure friction, productionization challenges, and a lack of collaboration.
Domino Data Lab integrations
Domino’s secure, scalable enterprise MLOps platform gives data science teams a system of record to increase productivity through compounding knowledge and to make work reproducible and reusable. It’s an integrated model factory that lets you develop, deploy, and monitor models in one place using your preferred tools and languages, and a self-service infrastructure portal providing one-click, governed access to the data, tools, and compute you need.
About VMware Tanzu
With the VMware Tanzu portfolio of products, VMware enables customers to make the most of modern applications on any cloud. The Tanzu portfolio provides the foundation of a modern application platform that drives scale through virtualization across multi-cloud and Kubernetes operations—with the right levels of connectivity, governance, observability, and automation.
VMware Tanzu enables customers’ software supply chain to be more secure—all the way from app development to having their apps running in production. Moreover, the portfolio offers a cohesive developer experience across any Kubernetes to speed application development and delivery cycles. It’s all about modern apps, powered by VMware Tanzu.
How we work together
With Domino Data Lab and VMware Tanzu, code-first data science teams can accelerate research, increase collaboration, and deploy models across an optimized multi-cloud infrastructure, all aimed at building intelligent applications that truly drive enterprise value. The Domino Data Science Platform perfectly pairs with Tanzu products such as Tanzu for Kubernetes Operations, providing our joint customers with a robust MLOps platform with virtualized data science workspaces for all of their data science needs.
VMware Tanzu customers today run their Tanzu Kubernetes Grid clusters across a variety of environments, including public, private, and hybrid clouds, and at the edge. With the Domino data science platform, our customers can equip their data science teams to leverage MLOps capabilities to build, deploy, and monitor models wherever these Tanzu Kubernetes Grid clusters may run. That’s the power of these two software portfolios working together. This becomes especially relevant in industries like manufacturing or oil and gas, which may have complex models that are deployed at edge locations but are managed and updated from a mix of other locations.
For VMware customers that utilize NVIDIA GPU-powered workloads running on VMware Tanzu, Domino provides a whole host of advantages. Typically, large enterprises with sizable data science teams leverage the high performance of GPU acceleration to run complex models, such as deep learning models to optimize supply chains or detect anomalies in aerospace, or delivering precision medicine in healthcare. Without an enterprise MLOps platform like that of Domino Data Lab, these high-performance workloads can quickly become unwieldy for operators to manage, leading to issues such as underutilization or high costs.
The Domino Data Lab and VMware Tanzu portfolios serve to unify data science teams toward common business goals. Whether that’s in enabling operators to manage centralized infrastructure or in powering data scientists to build and deploy complex models, the MLOps layer from Domino on top of VMware Tanzu software accelerates research through collaboration and streamlines deployment to production. Moreover, in regulated and specialized industries, Domino’s enterprise MLOps platform also provides the governance, tracking, and reproducibility required to satisfy regulatory agencies.
These tenets can be applied to many industries. For example, Domino Data Lab is trusted by customers across many industries, such as:
Financial services – Moody’s Analytics began moving models into production 6 times faster, improving competitiveness and customer value
Healthcare – Janssen delivers precision medicine for cancer research with 10x faster development of deep learning models
Energy – AES built out a global team and its underlying infrastructure, and it has built and deployed more than 50 models
With the power of Domino on VMware Tanzu, our customers can unlock business impact like the examples above from data science faster than ever before without worrying about silos between data science teams and IT.
If you’d like to learn more about how VMware Tanzu and Domino Data Lab make it easier for researchers and data scientists to access the data, tools, and compute resources they need to deliver business impact, be sure to watch our joint webinar (available for replay). Join this expert-led session about bringing powerful MLOps capabilities to VMware Tanzu Kubernetes Grid clusters, and empowering your data scientists. With the collaboration, reproducibility, flexible tooling, and powerful compute resources that Domino and Tanzu provide, data scientists can spend more time unlocking new insights and delivering business impact instead of unlocking infrastructure.
If you’d like to hear from innovators in GPU-accelerated, containerized data science, check out Domino, VMware Tanzu, and NVIDIA in action together during our recent NVIDIA GTC 2022 thought leadership panel featuring Craig McLuckie, Kubernetes co-founder and vice president of R&D at VMware Tanzu; Domino Data Lab’s Chief Technology Officer; and NVIDIA’s vice president of GPU computing.
If you’d like to get in touch, contact your VMware Tanzu sales representative or your Domino Data Lab sales representative.