Galileo’s Luna redefines GenAI evaluation, boasting 97% lower costs and 11x faster speeds


VB Transform 2024 returns this July! Over 400 enterprise leaders will gather in San Francisco from July 9-11 to dive into the advancement of GenAI strategies and engaging in thought-provoking discussions within the community. Find out how you can attend here.

Galileo, a trailblazer in enterprise generative AI, has unveiled Galileo Luna, a groundbreaking suite of Evaluation Foundation Models (EFMs) that promises to transform how enterprises evaluate their GenAI systems. With Luna, Galileo aims to address the critical challenges of speed, cost, and accuracy that have hindered the widespread adoption of generative AI in production environments.

“Galileo created Luna to address the limitations of current GenAI evaluation methods, which were slow, expensive, and often inaccurate,” said Vikram Chatterji, Co-Founder and CEO of Galileo, in an interview with VentureBeat. “The motivation stemmed from the need for ultra-low-latency, cost-effective, and high-accuracy evaluations in production environments.”

The development of Luna marks a significant milestone for Galileo, which has been at the forefront of enterprise GenAI since its inception in early 2021. The company’s dedication to pushing the boundaries of AI evaluation is evident in the nearly year-long intensive R&D process that led to Luna’s creation.

Luna, Galileo’s groundbreaking suite of Evaluation Foundation Models, outperforms leading AI evaluation methodologies in a benchmark comparison of area under the receiver operating characteristic curve (AUROC) scores. The higher AUROC values, reaching 0.78, demonstrate Luna’s superior accuracy in assessing enterprise generative AI systems, surpassing competitors like GPT-3.5, Trulens Groundedness, and RAGAS Faithfulness. (Image Credit: Galileo)

Purpose-built models redefine speed, cost, and accuracy

At the heart of Luna’s innovation lie its purpose-built small language models, meticulously tailored for specific evaluation tasks such as hallucination detection, context quality assessment, data leakage prevention, and malicious prompt identification. This specialized design allows Luna to deliver unparalleled performance across three key metrics: speed, cost, and accuracy.

VB Transform 2024 Registration is Open

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

“Luna surpasses GPT-3.5 in speed, cost, and accuracy through several innovations,” Chatterji explained. “Luna utilizes purpose-built small language models that are tailored for specific evaluation tasks, significantly reducing computational overhead and cost. This design choice allows for evaluations that are 97% cheaper and 11x faster than those performed with GPT-3.5.”

But it’s not just about speed and cost. Luna also boasts industry-leading accuracy, outperforming previous methods by up to 20% in detecting hallucinations, prompt injections, personally identifiable information (PII), and more. “Multi-headed small language models and advanced techniques like intelligent chunking ensure that Luna models maintain context better and provide more accurate evaluations,” Chatterji added.

In a comparison of monthly costs for evaluating 1 million queries, Galileo’s Luna significantly undercuts other methodologies, with a cost of just $175 per month. Luna’s purpose-built small language models enable ultra-low-cost evaluations, making it up to 97% more cost-effective than alternatives like GPT-3.5 at $6,248 per month, RAGAS Faithfulness at $7,994 per month, and Trulens Groundedness at $16,641 per month. (Image Credit: Galileo)

Revolutionizing evaluation without ground truth datasets

One of the most remarkable aspects of Luna is its ability to operate without the need for traditional ground truth datasets. By leveraging pre-trained evaluation models fine-tuned on diverse, domain-specific datasets, Luna eliminates the time-consuming and costly process of creating custom test sets. This innovation streamlines the evaluation process and reduces dependence on extensive human-generated data.

The potential applications of Luna are vast, with Chatterji highlighting its relevance in industries that demand high reliability and speed in AI evaluations. “Luna is especially powerful in large-scale enterprise applications where volume and throughput are necessary (i.e. millions of queries per month). We’re seeing Fortune 100 enterprises in healthcare, finance, and telecom find Luna particularly useful,” he said.

Galileo’s Luna delivers unrivaled speed in AI evaluation, with a latency of just 0.232 seconds for processing a single query. This marks a significant improvement over other methodologies, such as GPT-3.5 at 2.5 seconds, Galileo Chainpoll at 3.0 seconds, Trulens Groundedness at 3.4 seconds, and RAGAS Faithfulness at 5.4 seconds. Luna’s purpose-built small language models enable ultra-low-latency evaluations, making it up to 11 times faster than competing approaches. (Image Credit: Galileo)

Customization and continuous evolution in the face of rapid GenAI advancements

Use cases range from real-time monitoring of AI outputs and detecting hallucinations in AI-generated content to ensuring the safety and quality of chatbot interactions. And with Galileo’s Fine Tune product, Luna can be customized to meet specific customer requirements, achieving accuracy levels of 95% or higher for critical tasks in industries such as pharmaceuticals and financial services.

As the generative AI landscape continues to evolve rapidly, Galileo remains committed to staying at the forefront of innovation. Chatterji emphasized that Luna will scale in three key ways: expanding support for more evaluation task types, continually improving accuracy, and further reducing cost and latency.

“Galileo is committed to pushing the boundaries of what’s possible in AI evaluation and helping organizations bring trustworthy AI to production,” Chatterji said. “As the landscape of generative AI continues to evolve, Galileo remains dedicated to providing its clients with cutting-edge evaluation capabilities that make AI practical for businesses to deploy and inspire confidence and trust amongst consumers.”

With the launch of Luna, Galileo has solidified its position as a leader in enterprise GenAI evaluation. As more organizations seek to harness the power of generative AI, Luna’s ability to deliver fast, cost-effective, and accurate evaluations will be a critical factor in driving widespread adoption and unlocking the full potential of this transformative technology.



Source link

You might also like

Comments are closed, but trackbacks and pingbacks are open.