Sagemaker AI reimagines data management and AIOps

Since 2017, Amazon SageMaker has empowered organizations to harness machine learning for diverse applications. Initially a tool for data scientists, its utility has expanded to include MLOps engineers, data engineers and business stakeholders.

The SageMaker AI rebrand underscores its evolution into a comprehensive platform integrating data management and AI development.

“A few years ago, machine learning was mostly a data scientist’s pursuit, and data scientists were taking data within organizations and building machine learning models,” said Ankur Mehrotra (pictured), director and general manager of Amazon SageMaker at Amazon Web Services Inc. “Over the years, we saw more personas getting involved. We saw MLOps engineers getting involved to put those models in production. We then saw data engineers get involved to help data scientists prepare data to build these models. Then we saw business stakeholders involved in the decision-making process, etc.”

Mehrotra spoke with theCUBE Research’s Dave Vellante and  John Furrier for theCUBE’s “Cloud AWS re:Invent Coverage,” during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed SageMaker AI equipping organizations with the tools to innovate faster and at scale by addressing infrastructure, governance and ease of use.

SageMaker AI redefines AIOps with Unified Studio and HyperPod

At the heart of the transformation is SageMaker Unified Studio, a unified interface that seamlessly combines data preparation, machine learning model development and governance. This integration allows teams to collaborate more efficiently, leveraging a shared context across workflows. Unified Studio ensures that businesses no longer juggle disparate tools, streamlining the AI lifecycle under one umbrella, according to Mehrotra.

“SageMaker manages those tasks on your behalf, and that’s why it’s a managed service,” he said. “For example, if you were to build a model or deploy a model, then SageMaker AI would now provide the infrastructure, set up the tools, take your data and run the job to do that task.”

HyperPod, a purpose-built feature for gen AI, addresses the challenges of scaling GPU and Trainium clusters. With capabilities such as automatic fault tolerance and self-healing environments, it ensures that infrastructure issues do not derail projects. The introduction of flexible training plans, leveraging EC2 capacity blocks, enables customers to secure and manage compute resources efficiently, minimizing downtime and maximizing productivity, Mehrotra added.

“Last re:Invent, we announced SageMaker HyperPod, which is a purpose-built capability for generative AI model development,” he said. “In HyperPod, you can basically easily set up a GPU or a Trainium cluster and you can easily scale up your cluster and manage the cluster with familiar tools. Also, SageMaker takes care of automatically resolving any health issues within the cluster and provides a self-healing cluster environment and also improves the performance of your training, fine-tuning jobs within that environment.”

To reduce experimentation times, SageMaker AI also has HyperPod recipes, which are pre-optimized configurations for popular model architectures, such as Llama and Mistral. These recipes handle parameter optimization, checkpointing and fine-tuning, enabling users to initiate generative AI projects within minutes instead of weeks, according to Mehrotra.

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s “Cloud AWS re:Invent Coverage”:

Photo: SiliconANGLE

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Related Content

A profile of Tron founder Justin Sun, who wants to be the "Elon Musk of the crypto world" and advises the Trump-backed crypto project World Liberty Financial (Vicky Ge Huang/Wall Street Journal)

Hackers compromise Chrome extensions with 400,000+ users

Test-driving Google’s Gemini-Exp-1206 model in data analysis, visualizations

Leave a Comment