Nvidia’s coveted H100 GPUs will be available on-demand through Lambda
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
GPU cloud company Lambda will officially launch its 1-Click Clusters, which will give customers access to Nvidia H100 GPUs and Quantum 2 InfiniBand clusters on demand.
Nvidia worked with Lambda to provide two to 64 nodes available on demand. The clusters let companies access only the computing power they need, especially if they don’t need the GPUs to run 24/7.
Robert Brooks, one of Lambda’s co-founders and vice president for revenue, told VentureBeat that the service works for companies building their own model to spin up the GPU power they need only when they need it.
“Training is really computationally an engineering challenge from a hardware and software perspective,” Brooks said. “If you start an AI company right now, you need to get a lot of money to get a large cluster of GPUs and it takes time to even begin spinning that up.”
Lambda takes care of many of the problems involved in hiring out GPUs, like negotiating with GPU providers to get a set time when they can use the servers. Brooks said Lambda figured out a way to make the hardware, in particular the InfiniBand processors used for large-scale research, tenant and ready for new model training projects.
Most companies opt to use off-the-shelf AI models that have already been pre-trained and fine-tune them with their own data and governance policies. One of the biggest reasons is the cost associated with building and training foundation models. Nvidia GPUs are very expensive, and some companies may not have access to a data center that can house their GPUs.
Smaller companies working with AI end up either renting GPU space or working with cloud providers in hopes of gaining access to the processors needed to train their models. While cloud providers like AWS and Microsoft Azure offer not just access to AI models but also Nvidia GPUs, Brooks said the focus in those instances is on inference — where the AI model is actually being deployed instead of learning — rather than training. Contracts to use GPU clusters often last a year or longer.
Brooks said Lambda’s 1-Click clusters target companies that only need to train models for a shorter period and who cannot afford to sign long-term contracts for GPUs they may not use beyond a few months.
Companies can reserve the number of nodes they need on the Lambda cluster for a minimum of two weeks. Pricing depends on the number of nodes and time period needed.
Lambda, founded in 2012, raised $320 million in February, bringing its valuation to $1.5 billion.
Comments are closed, but trackbacks and pingbacks are open.