IVQ 1-50

🔹 Core Concepts

  1. What is LangFuse and what problem does it solve? LangFuse is an open-source observability and evaluation platform for LLM applications. It helps developers trace, debug, score, and monitor LLM executions like agents, chains, and RAG workflows.

  2. What is a trace in LangFuse? A trace represents a full execution path of an LLM interaction or workflow. It captures spans, inputs, outputs, metadata, and evaluation results.

  3. What is an observation in LangFuse? An observation is a granular log inside a span—typically representing one LLM call, tool use, or function result. It can include prompts, responses, and scores.

  4. What is a span in LangFuse and how does it relate to a trace? A span is a single step or unit within a trace. Spans can be nested, capturing parent-child relationships like subchains or tool invocations.

  5. What are the main components of a LangFuse trace?

    • Trace (top-level context)

    • Spans (units of work)

    • Observations (LLM I/O, tool usage)

    • Scores (evaluations)

    • Metadata (custom tags)


🔹 Logging, SDK, and Visualization

  1. How do you log LLM requests in LangFuse? Use the SDK:

    langfuse = Langfuse()
    trace = langfuse.trace(name="chat-session")
    trace.span(name="llm-call", input="prompt", output="response")
  2. What is the role of langfuse.log() in the SDK? It logs observations (input/output/errors/scores) manually. Not commonly used—prefer trace, span, and observation APIs.

  3. How are traces visualized in the LangFuse dashboard? As expandable tree graphs with timeline, metadata, LLM I/O, scores, and tags.

  4. What metadata can be attached to traces in LangFuse? Custom tags, user ID, session ID, environment, model type, latency, token counts, etc.

  5. How do you group and filter traces in LangFuse? Use dashboard filters based on tags, score thresholds, models used, user IDs, or execution time.

  6. How do you manually create a trace in LangFuse using the SDK?

  1. How do you set custom properties on a trace in LangFuse? Use .update() or pass metadata on creation:

  1. Can you use LangFuse in both sync and async Python code? Yes. The SDK supports both paradigms using standard Python context management.

  2. Does LangFuse support API key-based authentication for the SDK? Yes, via LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY env variables or SDK init.

  3. How do you use LangFuse’s SDK in a Jupyter Notebook? Install via pip install langfuse, initialize in a cell, and use the trace/span APIs as in a script.


🔹 Evaluations & Analysis

  1. What are score evaluations in LangFuse and how are they used? Scores are numeric evaluations (e.g., helpfulness=0.8) attached to spans or observations. They help benchmark prompt/model quality.

  2. How does LangFuse handle prompt and response evaluation? You can log manual scores or use built-in evaluation hooks with OpenAI or other LLMs to rate fluency, relevance, etc.

  3. How can LangFuse help with debugging hallucinations in LLM outputs? By tracing LLM I/O, comparing prompt versions, tagging errors, and adding feedback scores to problematic responses.

  4. How does LangFuse visualize nested chains or complex workflows? Through collapsible trace trees with timestamps, chain nesting, and span hierarchy.

  5. How do you track evaluation metrics like accuracy or latency? By attaching scores and metadata (like latency_ms) to spans or observations.

  6. How are score and tag different in LangFuse?

  • Score = numeric evaluation (e.g., 0.9)

  • Tag = label (e.g., version:v2, low-quality)

  1. How do you store user feedback or ratings in LangFuse? Attach scores or metadata to observations using:

  1. Can LangFuse help you A/B test multiple prompts or models? Yes—log metadata like prompt_version or model, then group traces by that tag.

  2. How does LangFuse support prompt versioning and comparisons? You log prompt version as metadata/tag, then filter and compare score distributions in dashboard.


🔹 Agent Workflows, Integrations & FastAPI

  1. How do you integrate LangFuse with LangChain or LlamaIndex? LangFuse provides wrappers and tracing callbacks that hook into LangChain or LlamaIndex events.

  2. How does LangFuse support multi-step traces in agent workflows? Each agent/tool/tool-call can be logged as nested spans in a single trace.

  3. How can LangFuse be used to monitor tool usage inside LLM agents? Wrap each tool call with a span and optionally log inputs/outputs/latency.

  4. How do you log retries or fallbacks in LangFuse? Use child spans with tags like retry=true or fallback=promptB, and optionally log failure reason.

  5. How do you integrate LangFuse with a FastAPI or Flask backend? Create a middleware or use LangFuse in route handlers to trace request-based workflows:

  1. How do you monitor multiple agents or chains in a single trace? Use nested spans with clear naming:

  1. How do you track tool usage in a ReAct or function-calling agent? Wrap each ReAct function/tool-call as a LangFuse span or observation.

  2. How can LangFuse help implement human-in-the-loop workflows? You can attach manual feedback (via API/UI), trace corrections, or trigger evaluations based on user inputs.


🔹 Advanced Usage, Performance, Hosting

  1. Can LangFuse be self-hosted? What are the trade-offs? Yes. Trade-offs include more control vs. overhead of managing Postgres, ClickHouse, and Next.js UI stack.

  2. How does LangFuse handle rate limits or high-frequency trace logging? SDK uses batching and async upload. But bulk logging in high throughput apps may need rate-limiting strategy.

  3. What kind of alerting or anomaly detection does LangFuse support? No built-in alerting yet, but score thresholds or custom tagging can be exported for external tools like Grafana or Slack.

  4. Can you export logs or traces from LangFuse for external analysis? Yes, via API or direct DB access (if self-hosted). LangFuse Cloud supports exports too.

  5. What is the performance overhead of using LangFuse in an app? Negligible for standard tracing (100–300µs per call). Async mode keeps main logic fast.

  6. How does LangFuse handle long-running or streaming tasks? You can create spans early and update them later, or track partial observations.

  7. Can you search and filter traces based on custom metadata? Yes. The dashboard supports full-text and metadata filtering.

  8. Is LangFuse compatible with OpenAI, Anthropic, and other LLM APIs? Yes. LangFuse is model-agnostic and works with any Python-based LLM interaction.

  9. Does LangFuse support TypeScript or JavaScript SDKs? Not officially as of now, but the HTTP API can be used from any language.

  10. How do you organize and label traces for different environments (dev, prod)? Add metadata like env=dev or env=prod to each trace or use SDK config presets.

  11. Can you dynamically add screenshots or file attachments to traces? No built-in support for attachments yet. Use metadata URLs pointing to external storage.

  12. How are parent-child relationships between spans handled? span.child() automatically links to parent; useful for nested workflows.

  13. What role does LangfuseSpan play and how do you use it? It represents a single span unit. Created using:

  1. What kinds of inputs and outputs can be logged with LangFuse? Strings, structured JSON, model outputs, prompt templates, function args, etc.

  2. How do you redact sensitive data before logging? Preprocess input/output before passing to LangFuse. You control what gets sent.

  3. How do you simulate or test LangFuse logging without real LLMs? Call trace.span(..., input="fake", output="mock") and test logic as usual.

  4. What role does LangFuse play in evaluating RAG systems? It helps trace retrieval, generation, and evaluation steps, while scoring relevance and completeness.

  5. How can LangFuse help AB test multiple prompts or models? Tag traces with prompt_version, model=claude-v2, etc. Then analyze score distribution per version.


Last updated