10. Hangeulocyte analyzer (Kiwi, Kkma, Okt) + BM25 finder

Define the function to neatly check the output result.

Copy

def pretty_print(docs):
    for i, doc in enumerate(docs):
        if "score" in doc.metadata:
            print(f"[{i+1}] {doc.page_content} ({doc.metadata['score']:.4f})")
        else:
            print(f"[{i+1}] {doc.page_content}")

BM25Retriever with Kiwi talkizer

Copy

# Install required libraries
# !pip install -qU kiwipiepy konlpy langchain-teddynote

Copy

# For comparison BM25Retriever
from langchain_community.retrievers import BM25Retriever

# BM25Retriever using a custom-implemented Korean morphological analyzer (Kiwi)
from langchain_teddynote.retrievers import KiwiBM25Retriever

sample_texts = [
    "Financial insurance is a financial product designed for long-term asset management and risk management.",
    "Financial savings product insurance is a special financial product that has a long-term savings purpose as well as a livestock product provision function.",
    "Financial savings product insurance is a special financial product that has a long-term savings purpose as well as a livestock product provision function.",
    "Financial group bombing insurance is a product that focuses on risk management rather than savings. It is suitable for customers who are willing to take high risks.",
]

Copy

Copy

Copy

Calculate similarity scores by personal needs and on metadata score Added the ability to add

Copy

Copy

Copy

Copy

k value setting

Copy

Copy

BM25Retriever used KonlPy (Kkma, Okt)

Copy

Copy

Copy

Copy

Copy

Copy

Copy

Copy

Copy

Copy

Last updated