Automating RAG Deployment with OpenShift AI and Validated Patterns

Creating demos that are easily reproduced is key for collaboration and knowledge sharing. This blog post covers the steps followed to create automations, in a GitOps fashion, to deploy a Retrieval-Augmented Generation (RAG) demo on top of OpenShift, including the OpenShift AI infrastructure and its dependencies.

Why Validated Patterns?

We have opted for using the Validated Patterns framework. This represents an evolution of how applications are deployed in hybrid cloud environments. Validated Patterns automatically deploy a full application stack using a GitOps-based framework, including all necessary components and operators. This approach helps create business-centric solutions while maintaining Continuous Integration (CI) practices over your applications.

Deploying and configuring all the components for a RAG application, though not inherently difficult, can be tedious and error-prone. That’s why we created a validated pattern to automate the deployment of both the RAG components and the required infrastructure on top of OpenShift, particularly OpenShift AI and its dependencies.

OpenShift AI Validated Pattern

The RAG Demo Validated Pattern automates the installation and configuration of all components necessary for running OpenShift AI. This pattern significantly simplifies the deployment process, allowing you to deploy the RAG demo on OpenShift AI without manual intervention for each component.

The code for the validated pattern is located at https://github.com/luis5tb/rhoai-patterns-demo, and it is named rhoai-pattern-demo

Dependencies Installation

The rhoai-pattern-demo pattern handles the installation of several critical dependencies required to run AI workloads efficiently on OpenShift AI. These include:

OpenShift Serverless: Provides event-driven architecture, enabling applications to scale based on demand, without the need for constant resource allocation. This is required by the KServe component, used by OpenShift AI to deploy models.
OpenShift Service Mesh: Facilitates secure, reliable communication between microservices by providing routing, observability, and tracing. Again, this is used by the KServe component in OpenShift AI to deploy models.
OpenShift Authorino: Manages authentication and authorization policies, ensuring secure access to APIs and microservices. OpenShift AI leverages this to provide secure access to deployed models.
Node Feature Discovery (NFD) Operator: Detects hardware features on nodes, ensuring that resources such as GPUs are properly detected.
NVIDIA GPU Operator: Manages the provisioning and lifecycle of NVIDIA GPUs in OpenShift, allowing the OpenShift AI models and workbenches to leverage GPU acceleration for intensive AI computations.

By automating the installation and configuration of these components, the pattern ensures a smooth deployment of OpenShift AI, setting up the foundation for the RAG demo.

The pattern also handles the installation of the OpenShift AI Operator, responsible for managing the deployment and lifecycle of AI workloads on OpenShift AI. This operator simplifies the deployment and management of AI models, making it easier to deploy and manage AI applications on OpenShift AI.

All these operators are installed by adding their respective subscriptions into the subscription section in values-hub.yaml:

  subscriptions:
    serverless:
      name: serverless-operator
      namespace: openshift-serverless
      channel: stable
    servicemesh:
      name: servicemeshoperator
      namespace: openshift-operators
      channel: stable
    nfd:
      name: nfd
      namespace: openshift-nfd
      channel: stable
    nvidia:
      name: gpu-operator-certified
      namespace: nvidia-gpu-operator
      channel: v24.6
      source: certified-operators
    authorino:
      name: authorino-operator
      namespace: openshift-operators
      channel: tech-preview-v1
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    rhoai:
      name: rhods-operator
      namespace: redhat-ods-operator
      channel: alpha
      source: redhat-operators
      sourceNamespace: openshift-marketplace

Additionally, the required namespaces with the proper configuration are specified in the same file:

  namespaces:
    - openshift-serverless:
        operatorGroup: true
        targetNamespaces: []
    - openshift-nfd
    - nvidia-gpu-operator
    - redhat-ods-operator:
        operatorGroup: true
        targetNamespaces: []

Finally, to complete the installation process, some Custom Resources (CRs) need to be created so that the respective operators install the required components. The creation of these CRs is managed by adding the following to the values-hub.yaml file as applications, all of them part of a project in ArgoCD named openshift-ai:

  projects:
    - openshift-ai
  applications:
    nfd-config:
      name: nfd-config
      namespace: openshift-nfd
      project: openshift-ai
      path: charts/all/nfd-config

    nvidia-config:
      name: nvidia-config
      namespace: nvidia-gpu-operator
      project: openshift-ai
      path: charts/all/nvidia-config

    rhoai-config:
      name: rhoai-config
      namespace: redhat-ods-operator
      project: openshift-ai
      path: charts/all/rhoai-config

The charts path contains the yaml template used to create these CRs.

RAG Deployment with OpenShift AI

Retrieval-Augmented Generation (RAG) is a technique that combines traditional retrieval-based methods with generative models like large language models (LLMs). The key idea behind RAG is to retrieve relevant context (documents or data) from a database or external source, which is then used by the LLM to generate more accurate and relevant responses. This approach enhances the quality and relevance of generated outputs, making it highly useful for applications like intelligent search, customer support, and knowledge management.

The RAG-base demo assumes there is some private local data that needs to be used in conjunction with public data. This set of data needs to be added (at different times) to the VectorDB for retrieval and use as context for the LLM.

Once the OpenShift AI cluster setup is automated, we can proceed with the automation of the RAG components needed by our demo:

VectorDB: In this case, ChromaDB.
LLM: Either local or remotely deployed. In this demo, we use a remotely deployed LLM to save local resources.
Pipelines: To initialize the VectorDB with base knowledge.
Pipeline: To add extra data to the VectorDB depending on the question asked.

RAG automation

The code for this automation is available in the same repository but under the rag-demo branch: https://github.com/luis5tb/rhoai-patterns-demo/tree/rag-demo.

The automation process involves the following steps:

Utilize the default secrets management that comes from the template. To enable this, the vault and golang-external-secrets applications need to be added to the values-hub.yaml in their own project (hub):

  namespaces:
    - vault
    - golang-external-secrets  
  projects:
    - hub
  applications:
    vault:
      name: vault
      namespace: vault
      project: hub
      chart: hashicorp-vault
      chartVersion: 0.1.*
    golang-external-secrets:
      name: golang-external-secrets
      namespace: golang-external-secrets
      project: hub
      chart: golang-external-secrets
      chartVersion: 0.1.*

Then add your keys to the `values-secret.yaml.template, for example:

  secrets:
  - name: config-demo
      vaultPrefixes:
      - global
      fields:
      - name: secret
      onMissingValue: generate
      vaultPolicy: validatedPatternDefaultPolicy

  - name: minio-secret
      fields:
      - name: minio_root_user
        value: MY_MINIO_USER
      - name: minio_root_password
        value: MY_MINIO_PASSWORD

  - name: openai-keys
      fields:
      - name: OPENAI_API_BASE
        value: MY_OPENAI_API_BASE
      - name: OPENAI_API_KEY
        value: MY_OPENAI_API_KEY

Automate the setup of S3 storage for private documents. In this demo, for simplicity, it is also used for public documents. The S3 storage setup is done using MinIO as it is simple to deploy. This automation includes deploying the MinIO application and creating an initial bucket for the documents:

  clusterGroup:
  namespaces:
      - minio-ns
  projects:
      - s3
  applications:
      minio:
      name: minio
      namespace: minio-ns
      project: s3
      path: charts/all/minio
  vars:
    minio:
      bucket_name: localdocs

Deployment of VectorDB, in this case, ChromaDB:

  namespaces:
    - chromadb-ns  
  projects:
    - vectordb
  applications:
    chromadb:
      name: chromadb
      namespace: chromadb-ns
      project: vectordb
      path: charts/all/chromadb

Automate OpenShift AI resources, including the Data Science project (a namespace with extra annotations/labels), the Pipeline server inside it, and the secret with credentials to access the LLM endpoint:

  namespaces:
    - rag-demo:
        annotations:
          openshift.io/description: 'RAG Workspace'
          openshift.io/display-name: 'RAG Workspace'
        labels:
          kubernetes.io/metadata.name: "rag-demo"
          modelmesh-enabled: 'false'
          opendatahub.io/dashboard: 'true'
  projects:
    - rag
  applications:
    rag-demo:
      name: rag-demo
      namespace: rag-demo
      project: rag
      path: charts/all/rag

Steps to Deploy RAG on OpenShift AI

With the dependencies in place, deploying the RAG demo on OpenShift AI can be done in a few automated steps:

Clone the Validated Pattern Git repository: The pattern repository contains the automation scripts and GitOps configuration files.
Apply the GitOps manifests: Using OpenShift GitOps, apply the manifests to trigger the deployment of all necessary components. This is done by configuring your secrets in values-secret.yaml.template and copying them to your config folder (e.g., ~/.config/hybrid-cloud-patterns/values-secret-rhoai-pattern-demo.yaml), and then triggering the installation:
```
 $ cp values-secret.yaml.template ~/.config/hybrid-cloud-patterns/values-secret-rhoai-pattern-demo.yaml
 $ ./pattern.sh make install
```
Configure the RAG pipelines: After the infrastructure is in place, the OpenShift AI project is created, and the pipeline server is configured. You can import the required pipelines and upload the necessary documents to the MinIO buckets. This last step is not automated yet:
1. Add documents to the appropriate folder in the Minio localdocs bucket:
  
  You can retrieve the Minio UI endpoint from the OpenShift developer console:
2. Import the VectorDB initialization pipeline. A sample is available at:
  https://github.com/luis5tb/rhoai-patterns-demo/tree/rag-demo/sample_pipelines/initialize_vectordb_pipeline.yaml:
3. Trigger the pipeline. You will need the VectorDB (internal) endpoint, which you can get from the OpenShift console:
4. Import the RAG pipeline. A sample is available at:
  https://github.com/luis5tb/rhoai-patterns-demo/tree/rag-demo/sample_pipelines/rag_pipeline.yaml:
5. Trigger the RAG pipeline. Fill in the required values (including the ChromaDB endpoint from earlier) and the question you wish to answer.

Conclusions

The rhoai-pattern-demo Validated Pattern simplifies and automates the deployment of a RAG application, leveraging a GitOps-based approach to ensure consistency and repeatability. By using this pattern, you can focus on building and refining your AI applications without worrying about the underlying infrastructure and configuration, reducing errors and saving time.

With the increasing complexity of AI workloads and hybrid cloud environments, validated patterns offer a streamlined way to deploy business-critical applications efficiently. Whether you’re experimenting with RAG or building production-ready AI solutions, this automation accelerates your workflow on OpenShift.