Shifting 200,000 servers from CentOS to RHEL, the SaaS giant is enlisting gen AI to help handle the health and telemetry of its operations infrastructure in real-time. Credit: Tyson Lutz / Salesforce When you’re tasked with migrating 200,000 servers to a new operating system, a helping hand is very welcome indeed. That’s why SaaS giant Salesforce, in migrating its entire data center from CentOS to Red Hat Enterprise Linux, has turned to generative AI — not only to help with the migration but to drive the real-time automation of this new infrastructure. Central to the success of the migration, which will transpire over the next 18 months, is Salesforce’s deployment of proprietary large language models (LLMs) from its own generative AI platform, says Tyson Lutz, senior vice president of software engineering. “This is a massive shift of infrastructure with super low risk,” says Lutz, who is overseeing the project. “This is going to change how infrastructure is managed.” The migration is still in its early stages. The company’s outgoing platform, CentOS, has been retired commercially, and given aggressive growth — the 200,000 servers do not even represent all the CentOS-based equipment in Salesforce data centers — Salesforce wanted to re-platform on a more secure, flexible, compliant, and commercially supported OS for its expanding global, public-facing compute infrastructure, Lutz says. With CentOS now part of the Red Hat Enterprise Linux beta stream chain, RHEL was a logical choice for Salesforce, which conducted a rigorous selection process, deciding on RHEL last December. But the most compelling aspect of the migration is that the process will be aided by its own generative AI platform. Salesforce recently rearchitected its Data Cloud and Einstein AI framework to introduce Einstein 1, a platform aimed at enabling Salesforce users (and its own IT shop) to connect any data to create a unified profile of their customers and then infuse AI, automation, and analytics into every workload or customer experience, according to the company. Lutz says Salesforce IT will leverage gen AI for basic automation and scripting as part of the migration, but it will also deploy higher-level LLM-based generative AI to handle the health and telemetry of the infrastructure in real-time. As part of that effort, the IT team has trained LLMs on event logs to ensure the system can more accurately predict and analyze real-time data logs. Before, it took humans a significant amount of time to pore through logs to try to figure out what was going on before moving forward. Now, these tasks are handled by AI. Salesforce’s novel approach to using gen AI in its infrastructure operations will enable Salesforce IT to “bring human intent into machines managing machines,” Lutz says. “We are on the bleeding edge in our operations,” he adds. “We’re using generative AI to look at logs, understand content, and provide [data] in a human readable format.” Upgrading the fleet Salesforce’s RHEL migration is part of “Hyperforce,” a rearchitecting of the company’s infrastructure that began in 2020, Lutz says. The goal of Hyperforce has been to ensure all aspects of the service, including Salesforce Customer 360, Sales Cloud, Service Cloud, Marketing Cloud, Commerce Cloud, and Data Cloud would be delivered on a platform with the highest levels of security, reliably, and commercial support. The past four years of Hyperforce has seen Salesforce transitioning how its products are developed in a containerized format — making them “third-party ready” — which offers greater flexibility for moving workloads not just within various AWS regions but to other cloud providers as well, Lutz says. The shift to RHEL will also help the company better offer maximum flexibility, security, and reliability, Lutz adds. Salesforce CIO Juan Perez reiterated this point via email, adding that the migration’s goal is to ensure Salesforce is ready for any workload. In the gold rush race to the cloud, many SaaS vendors have built their offerings on widely available open-source platforms such as CentOS but not all give commercial support anymore. Lutz calls this a “wake up call” that SaaS vendors and cloud providers must heed to ensure their offerings are built on highly secure, highly reliable, highly flexible, and highly supported platforms. “Companies are beginning to realize that they’ve got to get on this right away and move to a well-supported production-ready operating system,” Lutz says. “There are misconceptions about this,” he adds. “There are varying levels of operating systems that run in the cloud, and they are used at varying levels and layers. Underneath the cloud, your application and workloads still need an operating system on which to run.” One that should also be secure and robust, he says. The enterprise-grade edge Analysts says Salesforce has checked off two important boxes: first, ensuring the underlying platform is secure, flexible, scalable, and supported, but also enhanced with sophisticated generative AI that will hopefully generate the best platform for customer workloads. “As cloud environments scale, enterprises are increasingly looking to AI as a way of managing and securing applications and infrastructure. AI in cloud operations can improve efficiencies, identify configuration issues, and automate remediation,” says Dave McCarthy, research vice president of cloud and edge infrastructure services at IDC. “In order to convince CIOs to move to the cloud, SaaS providers must build on top of enterprise-grade infrastructure with SLAs for data protection and resiliency,” McCarthy adds. “While community-supported open source is popular with developers for prototyping, these applications must move to vendor-supported solutions when in production.” Merging the two — migrating hundreds of thousands of servers to a new operating system with AI — is going to be a huge game changer for OpEx and Salesforce’s business, Lutz says. “Generative AI is going to offer us engineers, humans an understandable picture of what’s going on much quicker,” he says. “There’s a huge potential here to have a more accurate understanding and tighter control and management on infrastructure we just have never seen before.” Related content brandpost Sponsored by Avanade By enabling “ask and expert” capabilities, generative AI like Microsoft Copilot will transform manufacturing By CIO Contributor 29 Feb 2024 4 mins Generative AI Innovation feature Captive centers are back. Is DIY offshoring right for you? Fully-owned global IT service centers picked up steam in 2023, but going the captive route requires clear-eyed consideration of benefits and risks, as well as desired business outcomes. By Stephanie Overby 29 Feb 2024 10 mins Offshoring IT Strategy Outsourcing feature What is a chief data officer? A leader who creates business value from data A chief data officer (CDO) oversees a range of data-related functions to ensure your organization gets the most out of, arguably, its most valuable asset. By Minda Zetlin and Thor Olavsrud 29 Feb 2024 10 mins Staff Management Careers IT Leadership tip The trick to better answers from generative AI Miso.ai co-founders Lucky Gunasekara and Andy Hsieh discuss how going beyond RAG basics to break down question context and assumptions is key for enterprise-grade service. By Sharon Machlis 29 Feb 2024 5 mins Generative AI PODCASTS VIDEOS RESOURCES EVENTS SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe