HomeCompaniesFelafax

Building AI Infra for non-NVIDIA GPUs

Felafax is building AI infra for non-NVIDIA GPUs. With our ML experience from Google and Meta, we built a new AI stack that is 2x more cost-efficient and performant without needing Nvidia’s CUDA.
Active Founders
Nithin Sonti
Nithin Sonti
Founder
Nikhil Sonti
Nikhil Sonti
Founder
Felafax
Founded:2024
Batch:S24
Team Size:2
Status:
Active
Location:San Francisco
Primary Partner:David Lieb
Company Launches
Felafax: Expanding AI Infra beyond NVIDIA
See original launch post

TL;DR: We are building an open-source AI platform for non-NVIDIA GPUs. Today, we are launching one of the pieces, a seamless UI to spin up a TPU cluster of any size and providing an out-of-box notebook to fine-tune LLaMa 3.1 models. Try us at felafax.ai or check out our github!

👋 Introduction

Hi everyone, we're Nikhil and Nithin, twin brothers behind Felafax AI. Before this, we spent half a decade at Google and Meta building AI infrastructure. Drawing on our experience, we are creating an ML stack from the ground up. Our goal is to deliver high performance and provide an easy workflow for training models on non-NVIDIA hardware like TPU, AWS Trainium, AMD GPU, and Intel GPU.

🧨 The Problem

  • The ML ecosystem for non-NVIDIA GPUs is underdeveloped. However, alternative chipsets like Google TPUs offer a much better price-to-performance ratio; TPUs are 30% cheaper to use.
  • The cloud layer for spinning up AI workloads is painful. Training requires installing the right low-level dependencies (infamous CUDA errors), attaching persistent storage, waiting 20 minutes for the machine to boot up… the list goes on.
  • Models are getting bigger (like Llama 405B) and don't fit on a single GPU, requiring complex multi-GPU orchestration.

🥳 The Solution

Today, we're launching a cloud layer to make it easy to spin up AI training clusters of any size, from 8 TPU cores to 2048 cores. We provide:

  • Effortless Setup: Out-of-the-box templates for PyTorch XLA and JAX to get you up and running quickly.
  • LLaMa Fine-tuning, Simplified: Dive straight into fine-tuning LLaMa 3.1 models (8B, 70B, and 405B) with pre-built notebooks. We've handled the tricky multi-TPU orchestration for you.

In the coming weeks, we will also launch our open-source AI platform built on top of JAX and OpenXLA (an alternative to NVIDIA's CUDA stack). We will support AI training across a variety of non-NVIDIA hardware (Google TPU, AWS Trainium, AMD and Intel GPU) and offer the same performance as NVIDIA at 30% lower cost. Follow us on Twitter, LinkedIn and Github or updates!

🙏 How You Can Help

  1. Try our seamless cloud layer for spinning up VMs for AI training – you get $200 credits to start off - app.felafax.ai
  2. Try fine-tuning LLaMa 3.1 models for your use case.
  3. If you are an ML startup or an enterprise that would like a seamless platform for your in-house ML training, reach out to us (calendar).