RYAX applied to Retrieval-Augmented Generation

RAG on Unstructured Data: Optimizing Generative AI Deployment Costs

✨ Optimizing GenAI RAG Pipelines: Up to 53% Cost Reduction with Ryax

 

As organizations increasingly turn to Retrieval-Augmented Generation (RAG) workflows to extract value from unstructured documents (PDFs, emails, web pages…), a new challenge emerges: the cost of GPU resources required to run these pipelines at scale can be overwhelming.

 

At Ryax, we conducted an in-depth analysis of a real-world use case: a document-understanding pipeline processing over 1,000 documents per day across 32 daily runs. This pipeline uses open-source LLMs, ensuring data sovereignty and privacy, deployed on a client-controlled cloud infrastructure.

📊 Key Insight: 18% of GPU Idle Time

 

The workflow is composed of 5 sequential actions. While preprocessing and postprocessing steps last around 3 minutes running on CPUs, text embedding and LLM-based response generation consume 14 minutes using GPU resources. Following traditional approaches, this imbalance leads to inefficiencies as the GPUs remain idle during part of the pipeline execution and the user needs to pay expensive GPUs for staying idle.

RAG-Workflow

✅ Our Solution: Spatio-Temporal Resource Optimization

 

We applied a two-layer optimization strategy using the Ryax platform:

 

1. Containerized Workflow Decomposition

Each pipeline step runs in its own container with precise, on-demand resource provisioning. This alone delivers a 34% cost reduction.

 

2. GPU Partitioning with NVIDIA MIG

Leveraging Multi-Instance GPU (MIG) technology, we run multiple tasks in parallel on a single GPU, drastically increasing utilization. This adds another 26% savings.

 

👉 Combined, these strategies reduce costs by 53%, cutting execution costs from €0.983 to €0.462 per workflow, with no significant performance degradation.

⚖️ Compared to Traditional Approaches

 

We benchmarked this approach against standard GenAI deployment models:

 

• Monolithic GPU VMs on AWS/GCP: over 98% more expensive

• Token-based inference services (e.g. Together.ai): up to 84% higher cost

• Time-based inference services (e.g. Inferless): 27% cheaper, but not compatible with on-prem or client-controlled environments

🚀 Why This Matters

 

This study shows that scaling GenAI does not have to come with soaring infrastructure costs. Thanks to Ryax’s layered optimization—temporal (when to allocate) and spatial (how to allocate)—RAG pipelines become both efficient and affordable at scale.

Why Adopt Ryax Today?

Companies leveraging artificial intelligence face increasing pressure to reduce operational costs while maintaining high performance. Ryax offers a pragmatic and proven solution to achieve this balance.

 

Concrete benefits for your organization:

Multi-Level Optimization: A holistic approach combining containerization, GPU partitioning, and dynamic orchestration.

Massive Cost Reduction: Up to 50% savings on your cloud expenses by eliminating resource wastage.

Flexibility and Scalability: A scalable model capable of adapting to real-time needs and workload variations.

Transparent Implementation: Rapid integration with your existing infrastructures without complex workflow overhauls.

 

With Ryax, you regain control over your AI infrastructure costs while ensuring optimal performance and smooth execution of your models.

Want to know more, download the complete study

Link to the study: RAG optimization cost UC

Read about other RYAX use cases

  • mobility

    Mobility

    Ryax addresses mobility challenges through its data engineering platform by enabling a seamless development, deployment and monitoring of workflows in hybrid edge-cloud computational environments.

  • pharmaceuticals

    Pharmaceutics

    Thanks to its abilities to orchestrate complex data processing over distributed infrastructures, Ryax can seamlessly address lab automation projects, AI-powered nano-molecules research or drug discovery endeavours using machine learning.

  • smart agriculture

    Smart Agriculture

    Ryax is a software that can address algricultural issues with a platform allowing data scientists to create, deploy and manage data analytics workflows simply, by abstracting the complex data engineering plumbing.

RYAX tackles new use cases every day.

Tell us about your projects.