05. Weaviate

Using Weaviate with LangChain

Introduction to Weaviate

Weaviate is an open-source, AI-native vector database designed for storing and retrieving high-dimensional embeddings efficiently. It is particularly suited for applications such as semantic search, recommendation systems, and retrieval-augmented generation (RAG). Weaviate supports hybrid search capabilities, including keyword-based and vector-based searches, and integrates well with various machine learning frameworks.

Setting Up Weaviate

1. Installing Weaviate

To use Weaviate, install the Weaviate client:

Copy

pip install weaviate-client

If you want to run a local instance of Weaviate, you can use Docker:

Copy

docker run -d -p 8080:8080 semitechnologies/weaviate

This will start a Weaviate instance locally, accessible on http://localhost:8080.

2. Creating a Weaviate Client

Once installed, initialize a Weaviate client in Python:

Copy

import weaviate

client = weaviate.Client("http://localhost:8080")

If you're using Weaviate Cloud, replace localhost with your Weaviate Cloud endpoint and provide authentication credentials.

Integrating Weaviate with LangChain

LangChain provides seamless integration with Weaviate for vector-based storage and retrieval. The Weaviate wrapper in LangChain simplifies adding and retrieving vector embeddings.

1. Creating a Weaviate Index

Before storing vectors, define a schema in Weaviate:

Copy

schema = {
    "classes": [
        {
            "class": "LangChainDocs",
            "vectorizer": "text2vec-openai",
        }
    ]
}

client.schema.create(schema)

This creates a collection named LangChainDocs with OpenAI’s text2vec-openai as the vectorizer.

2. Storing Embeddings in Weaviate

To store vectors, first generate embeddings using an embedding model (e.g., OpenAI or Hugging Face):

Copy

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate

embeddings = OpenAIEmbeddings()
vector_db = Weaviate(client=client, index_name="LangChainDocs", embeddings=embeddings)

Now, store some text data in Weaviate:

Copy

documents = ["This is a sample document.", "LangChain makes working with LLMs easier."]
vector_db.add_texts(texts=documents)

3. Performing Similarity Search

Retrieve documents similar to a given query:

Copy

query = "How does LangChain help with LLMs?"
results = vector_db.similarity_search(query, k=2)

for result in results:
    print(result.page_content)

This fetches the top 2 documents that are most semantically similar to the query.

Best Practices and Optimization

Use Efficient Vectorizers: Choose the right vectorizer (e.g., OpenAI, Cohere, Sentence Transformers) based on your use case.
Index Maintenance: Regularly update and clean up old embeddings to keep the index optimized.
Hybrid Search: Leverage Weaviate’s hybrid search capabilities for a combination of keyword and vector-based retrieval.
Cloud Deployment: For production, consider using Weaviate Cloud for scalability and reliability.

Conclusion

Weaviate provides a robust, open-source alternative to proprietary vector databases like Pinecone, offering hybrid search capabilities and flexibility. Its integration with LangChain makes it an excellent choice for scalable and efficient AI applications. With proper setup and optimization, you can leverage Weaviate to enhance search, recommendation, and retrieval-augmented generation (RAG) applications.

Previous04. Qdrant Next06. Milvus

Last updated 5 months ago