Together AI

Together AI

AI Acceleration Cloud designed for rapid inference, fine-tuning, and training of generative AI models.

About Together AI

Together AI provides a comprehensive AI Acceleration Cloud platform that supports the full lifecycle of generative AI development. It offers fast inference, flexible fine-tuning, and scalable training with user-friendly APIs and robust infrastructure. Users can run and customize open-source models, deploy large-scale AI solutions on GPU clusters, and optimize performance and costs. The platform supports over 200 models across modalities such as chat, images, code, and more, all compatible with OpenAI APIs for seamless integration.

How to Use

Users can access Together AI via simple APIs for serverless inference or deploy models on dedicated hardware endpoints. Fine-tuning is straightforward through command-line tools or API controls of hyperparameters. GPU clusters can be requested for intensive training tasks. The platform includes a web UI, API, and CLI to manage endpoints and services. Additionally, code execution environments facilitate AI development and testing workflows.

Features

NVIDIA GPU clusters available instantly or on reservation, including models like H100, A100, GB200, and B200
Advanced management tools with Slurm and Kubernetes support
Together Chat app for open-source AI interaction
Code Interpreter for executing AI-generated code
Flexible fine-tuning options, including LoRA and full fine-tuning
APIs compatible with OpenAI standards
Code Sandbox environment for AI development and experimentation
Extensive library of over 200 generative AI models
High-speed interconnects such as InfiniBand and NVLink
Optimized software stack featuring FlashAttention-3 and custom CUDA kernels
Dedicated endpoints for deploying models on custom hardware
Serverless inference API supporting open-source models

Use Cases

Reducing latency and costs for AI models like Arcee AI
Performing multi-document analysis and personalized data processing
Enabling production-grade AI applications for businesses
Automating classification and data extraction tasks
Executing visual recognition, reasoning, and video understanding
Developing cybersecurity solutions such as Nexusflow
Training custom generative AI models from scratch
Creating scalable AI customer support chatbots for platforms like Zomato
Generating and debugging code with advanced language models
Managing complex tool integrations and API-driven workflows
Developing next-generation text-to-video models like Pika
Accelerating enterprise AI projects for companies like Salesforce and Zoom

Best For

AI developersAI researchersOpen-source AI organizationsMachine learning engineersBusinesses requiring scalable GPU infrastructureStartups utilizing generative AIEnterprises building AI solutionsData scientists

Pros

Provides scalable infrastructure with NVIDIA GPUs for demanding AI workloads
Full ownership over models ensures no vendor lock-in
Meets SOC 2 and HIPAA compliance standards for secure enterprise deployment
Supports a diverse library of over 200 open-source and specialized models
Easy-to-use, OpenAI-compatible APIs streamline integration
Fast inference, fine-tuning, and training capabilities empower AI development
Incorporates cutting-edge optimizations like FlashAttention-3 and custom kernels
Offers competitive pricing aimed at reducing overall AI deployment costs
Batch inference available with an introductory discount
High reliability with a 99.9% uptime SLA for GPU clusters

Cons

Advanced features like hyperparameter tuning and custom deployment require technical expertise
Pricing for high-end GPU hardware like GB200 and B200, and large-scale setups, is available upon request, not immediately transparent

Pricing Plans

Choose the perfect plan for your needs. All plans include 24/7 support and regular updates.

Serverless Inference

Variable based on model and token volume

Pricing depends on token count, with costs per 1 million tokens for input and output, or images and multimodal inputs. Batch inference benefits from a 50% introductory discount. Model prices range from $0.06 to $7.00 per million tokens.

Most Popular

Dedicated Endpoints

Variable by GPU type, billed per minute or hour

Deploy models on custom GPU endpoints with per-minute billing. Available NVIDIA GPUs include RTX-6000, L40, A100, H100, and H200, with prices starting at $0.025/minute ($1.49/hour) for RTX-6000 and L40, up to $0.083/minute ($4.99/hour) for H200.

Fine-tuning

Per 1 million tokens processed

Pricing varies with model size, dataset, and epochs. Supervised fine-tuning (LoRA) costs between $0.48 and $2.90 per million tokens; full fine-tuning ranges from $0.54 to $3.20. DPO and other methods are priced accordingly.

Together GPU Clusters

Starting at $1.30 per hour

High-performance clusters equipped with NVIDIA Blackwell and Hopper GPUs, including H200, H100, and A100, optimized for AI training and inference. H200 clusters start at $2.09/hr, H100 at $1.75/hr, and A100 at $1.30/hr. Contact us for pricing on GB200 and B200.

Code Execution

Per hour or session

Code Sandbox is billed per vCPU ($0.0446/hour) and per GiB RAM ($0.0149/hour). Code Interpreter sessions cost $0.03 for 60 minutes of execution.

Frequently Asked Questions

Find answers to common questions about Together AI

What types of AI models does Together AI support?
Together AI supports over 200 models, including chat, multimodal, language, image, code, and embedding models, with a focus on open-source options.
What GPU hardware options are available on Together AI?
Together AI offers high-performance NVIDIA GPUs such as GB200, B200, H200, H100, A100, and L40 series for inference and training tasks.
How does Together AI optimize AI performance and costs?
The platform uses custom kernels like FlashAttention-3, FP8 inference, quantization techniques, and optimized decoding to enhance speed and reduce expenses.
Can I fine-tune my own models on Together AI?
Yes, the platform supports fine-tuning with LoRA and full training options, allowing you to customize models while maintaining full ownership.
Is Together AI suitable for enterprise AI deployments?
Absolutely. It offers secure, compliant infrastructure with enterprise-grade SLAs, dedicated endpoints, and expert support for large-scale AI projects.