Together AI promises faster inference and lower costs with enterprise AI platform for private cloud


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Running AI in the public cloud can presents enterprises with numerous concerns about data privacy and security.

That’s why some enterprises will choose to deploy AI on a private cloud or on-premises environment. Together AI is among the vendors looking to solve the challenges of effectively enabling enterprises to deploy AI in private clouds in a cost effective approach. The company today announced its Together Enterprise Platform, enabling AI deployment in virtual private cloud (VPC) and on-premises environments.

Together AI made its debut in 2023, aiming to simplify enterprise use of open-source LLMs. The company already has a full-stack platform to enable enterprises to easily use open source LLMs on its own cloud service. The new platform extends AI deployment to customer-controlled cloud and on-premises environments. The Together Enterprise Platform aims to address key concerns of businesses adopting AI technologies, including performance, cost-efficiency and data privacy.

“As you’re scaling up AI workloads, efficiency and cost matters to companies, they also really care about data privacy,” Vipul Prakash, CEO of Together AI told VentureBeat. “Inside of enterprises there are also well-established privacy and compliance policies, which are already implemented in their own cloud setups and companies also care about model ownership.”

How to keep private cloud enterprise AI cost down with Together AI

The key promise of the Together Enterprise Platform is that organizations can manage and run AI models in their own private cloud deployment.

This adaptability is crucial for enterprises that have already invested heavily in their IT infrastructure. The platform offers flexibility by working in private clouds and enabling users to scale to Together’s cloud.

A key benefit of the Together Enterprise platform is its ability to dramatically improve the performance of AI inference workloads. 

“We are often able to improve the performance of inference by two to three times and reduce the amount of hardware they’re using to do inference by 50%,” Prakash said. “This creates significant savings and more capacity for enterprises to build more products, build more models, and launch more features.” 

The performance gains are achieved through a combination of optimized software and hardware utilization.

 “There’s a lot of algorithmic craft in how we schedule and organize the computation on GPUs to get the maximum utilization and lowest latency,” Prakash explained. “We do a lot of work on speculative decoding, which uses a small model to predict what the larger model would generate, reducing the workload on the more computationally intensive model.”

Flexible model orchestration and the Mixture of Agents approach

Another key feature of the Together Enterprise platform is its ability to orchestrate the use of multiple AI models within a single application or workflow. 

“What we’re seeing in enterprises is that they’re typically using a combination of different models – open-source models, custom models, and models from different sources,” Prakash said. “The Together platform allows this orchestration of all this work, scaling the models up and down depending on the demand for a particular feature at a particular time.”

There are many different ways that an organization can orchestrate models to work together. Some organizations and vendors will use technologies like LangChain to combine models together. Another approach is to use a model router, like the one built by Martian, to route queries to the best model. SambaNova uses a Composition of Experts model, combining multiple models for optimal outcomes.

Together AI is using a different approach that it calls – Mixture of Agents. Prakash said this approach combines multi-model agentic AI with a trainable system for ongoing improvement. The way it works is by using “weaker” models as “proposers” – they each provide a response to the prompt. Then an “aggregator” model is used to combine these responses in a way that produces a better overall answer.

“We are a computational and inference platform and agentic AI workflows are very interesting to us,” he said. “You’ll be seeing more stuff from Together AI on what we’re doing around it in the months to come.”



Source link

You might also like

Comments are closed, but trackbacks and pingbacks are open.