Salesforce Introduces World's First LLM Benchmark for CRM

Government

Salesforce Introduces World's First LLM Benchmark for CRM

Salesforce has unveiled the world’s first large language model (LLM) benchmark for customer relationship management (CRM) systems, offering businesses a comprehensive tool to evaluate the rapidly expanding array of generative AI models. This new benchmark provides an evaluation framework that measures LLM performance based on accuracy, cost, speed, and trust and safety, specifically tailored to common sales and service use cases such as prospecting, lead nurturing, and service case summaries.

Salesforce's benchmark includes a public leaderboard to assist professionals in selecting the best LLM for their CRM needs. The company plans to continually update the benchmark with new use case scenarios and enhance its evaluation criteria, soon incorporating fine-tuned LLMs.

“As AI continues to evolve, enterprise leaders are saying it’s important to find the right mix of performance, accuracy, responsibility, and cost to unlock the full potential of generative AI to drive business growth,” said Silvio Savarese, EVP & Chief Scientist at Salesforce AI Research. “Salesforce’s new LLM Benchmark for CRM is a significant step forward in the way businesses assess their AI strategy within the industry. It not only provides clarity on next-generation AI deployment but also can accelerate time to value for CRM-specific use cases. Our commitment is to continuously evolve this benchmark to keep pace with technological advancements, ensuring it remains relevant and valuable.”

Importance of the Benchmark

Existing LLM benchmarks have been primarily academic and consumer-focused, lacking relevance to business applications. They often fail to address crucial aspects like accuracy, speed, cost, and trust, leaving CRM customers without a reliable way to evaluate generative AI-powered CRM solutions. Salesforce's benchmark addresses these gaps by using real-world CRM data and expert human evaluations, offering businesses strategic insights for incorporating generative AI into their CRM systems.

Key Evaluation Metrics

Accuracy: Assessed through factuality, completeness, conciseness, and instruction-following. Accurate models provide valuable results, improving customer experience.
Cost: Categorized as high, medium, or low based on percentiles, allowing businesses to evaluate the cost-effectiveness of different LLMs.
Speed: Measures the responsiveness and efficiency of LLMs in processing and delivering information, enhancing user experience and reducing customer wait times.
Trust and Safety: Evaluates the LLM’s ability to protect sensitive customer data, comply with privacy regulations, and avoid bias and toxicity.

Businesses can use this benchmark to compare LLMs, identify the best solutions, and make informed decisions to enhance customer success and drive business growth. With Salesforce’s Einstein 1 Platform, customers can choose from existing LLMs or bring their own models to meet their unique needs, deploying more effective generative AI solutions.

Clara Shih, CEO of Salesforce AI, emphasized the practical focus of this benchmark: “Business organizations are looking to utilize AI to drive growth, cut costs, and deliver personalized customer experiences, not to plan a kid’s birthday party or summarize Othello. Our customers have been asking for a purpose-built way to evaluate and select from among the proliferation of new AI models, and we are thrilled to introduce the world’s first LLM benchmark for CRM to help them navigate the complex landscape of models. This benchmark is not just a measure; it’s a comprehensive, dynamically evolving framework that empowers companies to make informed decisions, balancing accuracy, cost, speed, and trust.”

For more information, visit Salesforce's website and view the LLM Leaderboard for CRM on Huggingface.

Disclaimer: The information provided in this article does not constitute an endorsement of any particular LLM; it is for general informational purposes only. Readers should make their own determinations based on their needs. Opinions of the referenced presenters and/or authors are their own and do not necessarily reflect the official position of Salesforce.