nCompass Technologies

Deploy hardware accelerated AI models with only one line of code

W24

Active

https://www.ncompass.tech

Deploy hardware accelerated AI models with only one line of code

nCompass is a platform for acceleration and hosting of open-source and custom AI models. We provide low-latency AI deployment without rate-limiting you. All with just one line of code.

Active Founders

Aditya Rajagopal

Founder

Diederik Vink

Founder

nCompass Technologies

Founded:2023

Batch:W24

Team Size:2

Status:

Active

Primary Partner:Dalton Caldwell

Company Launches

nCompass Technologies: Reliable LLM API with no rate-limits

See original launch post

Tl;dr:

We’ve built an AI model inferencing system that can serve requests at scale like no other and now we’re releasing it to the public as a rate-limit-free API. We serve any open-source LLM and can also deploy optimized versions of your custom fine-tuned LLM with cost-effective autoscaling. Sign up here, create an API key, get $100 of credit on us, and run as many requests as you like!

The Problem

Deploying AI models in production requires expensive infrastructure. Serving more than ~10req/s using open source inference engines like vLLM on a single GPU results in terrible quality of service. Time-to-first-token skyrockets to more than 10s, and end-to-end latency degrades even more!

The common solution: horizontally scale up GPUs.

The problem: GPU’s are expensive and hard to find.

Why should you care

API user: These high infrastructure costs are the reason you suffer rate limits when using existing API providers.
Deploying on-prem: Your infrastructure costs might be the reason a PoC doesn’t move to production.

Our Solution

We’ve built an AI inference serving system that can sustain 100s of requests per second while maintaining a time-to-first-token of <1s on ~30% fewer GPUs when compared to NVIDIA’s NIMs containers and up to 2x fewer GPUs when compared to vLLM.

This enables us to provide a rate-limit-free API while maintaining a high quality of service. Alternatively, we can provide this as a cost-effective on-prem deployment solution, ensuring your infrastructure costs don’t blow up with requests served. We support any open source model and can host your custom fine-tuned model as an API with autoscaling enabled as well.

Tutorials

Shout out

To be able to build such a scalable and available system, we needed a top-quality hardware provider. We wanted to use this as an opportunity to shout out Ori Global Cloud, a key partner in this journey, to enable a serverless Kubernetes platform for AI inference at scale. Ori Serverless Kubernetes is an infrastructure service that combines powerful scalability, simple management, and affordability to help AI-focused startups realize their wildest AI ambitions. Reach out to Ori for exclusive GPU cloud deals!

Asks

Use our self-serve console (https://console.ncompass.tech/login) to create an account and start running with $100 of credit.
Book a demo (https://calendar.app.google/3jRDwcstFQvsbqnR8) if you would like to discuss an on-prem solution. YC deals apply!

Our pricing is transparent and can be found here: https://console.ncompass.tech/public-pricing

Other Company Launches

nCompass Technologies: Realtime audio denoising

nCompass's newest model is a realtime audio denoiser that can remove voices from the background of audio streams.

Read Launch ›

nCompass Technologies - Low-latency deployment of AI models made easy

nCompass is an API that requires only one-line-of-code to integrate low latency versions of open-source/custom models into your AI pipeline.

Read Launch ›