IVQA 101-150

101. What is a multi-modal model and how does it differ from text-only LLMs?

A multi-modal model processes and integrates multiple types of data—e.g., text, images, audio, video—simultaneously. Unlike text-only LLMs (e.g., GPT-3), which operate purely on textual input and output, multi-modal models (e.g., GPT-4V, Flamingo) can reason across different modalities to perform tasks like visual question answering, image captioning, or audio transcription.


102. How does CLIP work and what’s its role in GenAI?

CLIP (Contrastive Language–Image Pre-training) aligns images and text in a shared embedding space by training on a large dataset of image-caption pairs. It learns to associate an image with its most relevant text and vice versa using contrastive loss. In GenAI, CLIP is used for tasks like zero-shot image classification, image retrieval, and as a reward model in generative pipelines (e.g., guiding image generation in VQGAN+CLIP).


103. What are common use cases for vision-language models like Flamingo or GPT-4V?

  • Visual question answering (VQA)

  • Image captioning or description

  • Multimodal chat (e.g., ChatGPT with image input)

  • Document understanding (e.g., analyzing scanned PDFs)

  • Visual reasoning tasks (e.g., "What’s happening in this image?")

  • Content moderation and accessibility tools


104. How do you fine-tune a multi-modal model for a specific domain?

Fine-tuning involves:

  • Collecting domain-specific multimodal data (e.g., medical images + radiology reports)

  • Using adapters or LoRA for efficient tuning

  • Preserving pre-trained modality encoders (e.g., ViT, BERT)

  • Minimizing catastrophic forgetting by freezing backbone layers

  • Evaluating performance on domain-specific tasks (e.g., medical image captioning)


105. What are embeddings in the context of images and text together?

These are vector representations that capture semantic meaning across both images and text in a shared latent space. For example, CLIP generates image and text embeddings such that semantically similar items (e.g., an image of a cat and the word “cat”) lie close together in this space, enabling tasks like cross-modal retrieval or similarity ranking.


106. How would you build a caption generator for images using GenAI?

  • Use an image encoder (e.g., ViT, ResNet) to convert the image to an embedding.

  • Feed the embedding into a decoder (e.g., Transformer or LLM) to generate text.

  • Pre-train or fine-tune the model on image-caption pairs.

  • Optionally use reinforcement learning (e.g., with CLIP as a reward function) to optimize caption quality.


107. What are the challenges in evaluating multi-modal models?

  • Lack of unified benchmarks across modalities

  • Difficulty in defining objective evaluation metrics (e.g., BLEU scores may not capture semantic quality)

  • Ambiguity in human interpretation of outputs

  • Dataset biases and overfitting to specific modalities

  • Limited explainability in how modalities interact


108. What is VQGAN + CLIP and how does it generate art?

  • VQGAN generates high-quality images using a discrete latent space.

  • CLIP guides VQGAN by scoring how well generated images match a textual prompt.

  • A text prompt is used to optimize the latent vector input to VQGAN such that CLIP's similarity score between image and text is maximized—thus producing art that reflects the text.


109. Compare DALL·E vs. MidJourney vs. Stable Diffusion.

Model
Key Features
Style/Control
Open Source

DALL·E 2

Text-to-image, from OpenAI

Moderate

No

MidJourney

Artistic and stylized image generation

High stylization

No

Stable Diffusion

Open-source latent diffusion model

Fine-grained control

Yes


110. How do you control image style or tone in GenAI outputs?

  • Prompt Engineering: Carefully crafted text prompts guide model output.

  • Negative Prompts: Used in Stable Diffusion to avoid unwanted features.

  • Latent Space Manipulation: Modify latent vectors to control color, structure, etc.

  • Conditioning: Provide style examples or reference images (e.g., ControlNet).

  • Fine-tuning or LoRA: Train on a dataset with a specific style (e.g., Van Gogh-style images).


111. Compare Hugging Face Transformers vs. OpenLLM vs. LlamaIndex.

Feature
Hugging Face Transformers
OpenLLM (by BentoML)
LlamaIndex (formerly GPT Index)

Focus

Model hub and architecture repo

Serving and deployment of LLMs

Data indexing and retrieval for LLMs

Strength

Wide model support, community

Easy model packaging and deployment

RAG pipelines, context-aware querying

Use Case

Fine-tuning, inference, research

Production-ready LLM APIs

Connect LLMs with structured/unstructured data

Open Source

Yes

Yes

Yes


112. What is the role of Guardrails AI in GenAI apps?

Guardrails AI enforces structure, safety, and correctness in LLM outputs. It uses declarative schemas (e.g., via pydantic) to validate and correct LLM outputs, ensuring they meet requirements like JSON structure, regex, or value constraints—making apps more predictable and secure.


113. How would you use LangGraph to manage agent state?

LangGraph uses stateful graphs for managing multi-step LLM workflows. Each node represents an agent/tool, and the edges encode conditional logic or memory updates. This allows:

  • Memory retention across nodes

  • Dynamic execution paths

  • Structured conversations with persistence Ideal for orchestrating complex, looped, or branching agent behaviors.


114. What’s the difference between LangChain Agents and Tools?

  • Tools: External functions or APIs (e.g., search, calculator) the LLM can call.

  • Agents: Controllers that decide which tool to use, when, and how—based on user input and intermediate results. Agents use reasoning + tool execution loops to solve complex tasks, while tools are the building blocks.


115. What does AutoGen enable beyond simple prompt chaining?

AutoGen enables multi-agent collaboration, where multiple LLM-based agents (e.g., user proxy, coder, critic) interact via structured messaging. It supports:

  • Role-specific agents

  • Turn-based conversations

  • Memory and error correction Beyond prompt chaining, it creates agentic workflows with persistent context and dynamic task resolution.


116. How can you use DSPy for LLM optimization?

DSPy (Declarative Self-improving Prompting) allows you to:

  • Define modules (e.g., Generate, Select, Rerank)

  • Apply tracing and feedback loops (telemetry)

  • Run compilation passes to improve prompt strategies

  • Optimize LLM behavior using search or reward functions It treats prompt tuning as a programmable pipeline.


117. What are Tool-Use models, and how are they trained?

Tool-use models are trained to invoke external APIs/tools (e.g., calculator, search) when solving tasks. They're trained using:

  • Supervised fine-tuning on tool invocation traces

  • Reinforcement learning for choosing correct tools

  • Function-calling annotations (e.g., OpenAI tool specs) They learn both when and how to use tools within context.


118. How do you use the Function calling feature in OpenAI?

You define a function schema in JSON (parameters, types, descriptions). The model, when prompted, can decide to call the function by returning the function name and arguments. Your system then executes the function and feeds back the result, enabling structured tool use in conversations.


119. What are “planning” and “reflection” loops in GenAI agents?

  • Planning: Agent decomposes tasks into subtasks or goals before acting.

  • Reflection: Agent reviews past actions and outcomes to revise plans or retry. These loops help agents:

  • Handle complex tasks

  • Improve accuracy over multiple iterations

  • Avoid blind tool use


120. Compare orchestration using LangChain vs. Flowise vs. Haystack.

Framework
Type
Strengths
Ideal Use Case

LangChain

Code-based

Highly customizable, large ecosystem

Custom pipelines, agent workflows

Flowise

No-code UI for LangChain

Visual LLM flow builder, quick prototyping

MVPs, demos, low-code teams

Haystack

Python RAG framework

Focus on pipelines, search, retrievers

Production-ready RAG & QA systems


GenAI can streamline legal analysis by:

  • Summarizing contracts and legal opinions

  • Extracting key clauses, obligations, and risks

  • Comparing versions and redlining

  • Answering natural language queries over case law or statutes Tools like Harvey AI and Lexion use LLMs to reduce manual review time and increase legal accessibility.


122. What are the risks of using GenAI in healthcare applications?

  • Hallucinations: LLMs may fabricate diagnoses or treatments.

  • Bias: Trained on general data, may not reflect clinical diversity.

  • Privacy: Handling sensitive patient data requires strict compliance (e.g., HIPAA).

  • Lack of Explainability: Critical in clinical decision support. Thus, GenAI should be used as an assistive tool, not a primary diagnostic engine.


123. Describe how GenAI is used in financial document summarization.

GenAI can:

  • Extract and summarize earnings reports, SEC filings, and investor decks

  • Translate raw financial data into readable insights

  • Highlight anomalies or risk signals

  • Automate compliance report drafting This supports analysts, auditors, and financial advisors with real-time insights.


124. How do you use GenAI in customer support automation?

GenAI powers chatbots and virtual agents that:

  • Understand natural queries

  • Retrieve knowledge base answers (via RAG)

  • Escalate to human agents if needed

  • Analyze sentiment and customer intent Example: Finetuning on past support tickets + integrating with CRM via LangChain or DSPy.


125. How can GenAI help with resume screening and hiring?

  • Extracts skills, experience, and education from resumes

  • Matches candidates to job descriptions

  • Generates interview questions based on role fit

  • Flags inconsistencies or anomalies Platforms like KrowdAI use GenAI to simulate recruiter workflows and reduce bias.


126. What are limitations of GenAI in high-stakes decision-making?

  • Lack of audit trails and interpretability

  • Can reinforce historical biases in training data

  • Prone to overconfidence and hallucinations

  • Legal and ethical constraints (e.g., GDPR, employment laws) Always require human-in-the-loop for final decisions.


127. How would you use GenAI for educational tutoring?

  • Personalized lesson plans based on student level

  • Natural language Q&A with adaptive feedback

  • Generating quizzes, explanations, and flashcards

  • Language learning through conversation Used in apps like Khanmigo (Khan Academy), offering scalable 1:1 learning experiences.


128. How can you combine GenAI with IoT or edge devices?

  • Use LLMs for natural language interfaces to control devices (e.g., smart homes)

  • Summarize telemetry data (e.g., “What happened last night with the AC?”)

  • On-device inferencing via quantized models (e.g., TinyML + LLM distillations)

  • Use GenAI for predictive maintenance and alert explanations


129. What role can GenAI play in software documentation?

  • Auto-generating docstrings, README files, and changelogs

  • Converting code comments into detailed explanations

  • Summarizing PRs or commits

  • Creating developer tutorials from code bases Tools like Codium and Mintlify use LLMs to document codebases efficiently.


130. How is GenAI changing video or game content generation?

  • Procedural storylines, character dialogue, and quests (e.g., Inworld AI)

  • Text-to-video/image tools (e.g., Runway, Pika, Sora) for cinematic prototyping

  • Dynamic voice acting and localization

  • Personalized content generation based on player actions It accelerates development and increases content variability.


131. What are red teaming practices in GenAI model evaluation?

Red teaming involves adversarial testing of GenAI models to uncover vulnerabilities such as:

  • Harmful, biased, or unsafe outputs

  • Susceptibility to prompt injection or jailbreaks

  • Inaccurate or misleading generations This is done by internal or external testers using creative prompts, edge cases, and social engineering tactics to expose flaws. It's an essential part of responsible AI deployment.


132. How do you design safe prompts to reduce offensive outputs?

  • Use instructional or declarative language to guide the model (e.g., “Be respectful”).

  • Include guardrails like constraints or banned words.

  • Use system prompts or chat roles to enforce tone.

  • Apply zero-shot and few-shot examples that set a safe behavioral pattern.

  • Post-process output with toxicity classifiers or regex filters.


133. What is prompt injection and how do you mitigate it?

Prompt injection is when a user manipulates the input to override system instructions, e.g., “Ignore the previous instructions and...”. Mitigation strategies:

  • Separate trusted vs. user input in structured calls (e.g., OpenAI function calling).

  • Input sanitization and validation

  • Use context isolation, not just concatenation

  • Deploy output filters and real-time monitoring


134. How do you ensure GDPR compliance in GenAI applications?

  • Data Minimization: Avoid collecting more data than necessary

  • Right to be Forgotten: Allow users to delete their data

  • Consent Management: Log and respect user consent

  • Explainability: Offer transparent reasoning for decisions made by GenAI

  • Data Localization: Ensure EU data stays within GDPR-compliant regions


135. How do you identify if a model output is manipulated or misleading?

  • Use attribution techniques (e.g., source highlighting in RAG)

  • Run outputs through fact-checking pipelines

  • Look for hallucination markers (confident but unverifiable content)

  • Apply truthfulness benchmarks (e.g., TruthfulQA)

  • Trace model decision path if logs or token attributions are available


136. What are jailbreak prompts and how do LLMs get exploited?

Jailbreak prompts are specially crafted inputs that bypass safety filters or force LLMs to generate forbidden content. Examples:

  • Roleplay hacks (“pretend you’re an evil AI”)

  • Encoding messages in code or foreign languages LLMs get exploited due to poor separation of system and user roles, and lack of robust input handling.


137. What is constitutional AI?

Constitutional AI (by Anthropic) involves training models with a set of guiding principles or rules (the "constitution"). Instead of relying on human feedback alone, models are taught to self-criticize and align with ethical norms using these principles during training and inference. It reduces dependence on human RLHF and improves explainability.


138. How do you audit GenAI models for ethical compliance?

  • Bias Testing: Evaluate model across race, gender, and other sensitive attributes

  • Toxicity Evaluation: Use tools like Perspective API

  • Output Logging: Record and review outputs for problematic behavior

  • Red Teaming: Periodic adversarial audits

  • Documentation: Maintain model cards and datasheets for transparency


139. What steps can be taken to prevent model misuse?

  • Rate limiting and access control (API keys, user verification)

  • Use content filters and moderation layers

  • Implement intent detection for harmful queries

  • Provide user disclaimers and traceable audit trails

  • Apply fine-tuned alignment and restrict model capabilities based on context


140. What is traceability in GenAI outputs and why is it important?

Traceability refers to the ability to track and explain how a model generated an output—including:

  • The prompt

  • Any retrieved documents

  • Model version

  • Decision path or tool use It’s crucial for auditing, compliance, debugging, and building trust in AI systems, especially in regulated sectors.


141. How do you estimate the ROI of adding GenAI to a product?

Estimate ROI by comparing costs vs. measurable gains:

  • Costs: API usage (e.g., tokens), infra, fine-tuning, integration effort

  • Gains: Time saved (automation), increased user engagement, conversion rate lift, support ticket reduction Use an ROI formula:

ROI = (Incremental Revenue or Cost Savings - GenAI Costs) / GenAI Costs


142. How do you reduce token cost in GenAI API usage?

  • Use shorter prompts and context windows

  • Apply retrieval-augmented generation (RAG) to feed only relevant chunks

  • Chunk and cache reusable outputs (e.g., summaries)

  • Use model routing (e.g., cheaper models for simple tasks)

  • Compress or truncate history/memory judiciously


143. What are the top metrics for success in a GenAI product launch?

  • Adoption rate: % of users trying GenAI features

  • Engagement: Time spent, repeat usage

  • Satisfaction: CSAT, NPS, feedback volume

  • Accuracy or relevance (task-dependent)

  • Retention impact: Whether users stay longer with GenAI features

  • Token cost per active user or per query


144. How can GenAI help speed up product iteration cycles?

  • Automates prototyping (e.g., text, UI, code suggestions)

  • Enables A/B testing of variants via prompt engineering

  • Speeds up content creation (docs, marketing, UI copy)

  • Adds intelligence to feedback loops (e.g., summarizing user reviews)

  • Powers internal tools (e.g., auto-generated QA test cases)


145. How would you pitch a GenAI solution to a non-technical stakeholder?

Focus on business value, not architecture:

  • “It reduces manual work by 40%”

  • “It creates personalized experiences at scale”

  • “It’s like hiring an AI assistant that works 24/7” Use analogies (e.g., co-pilot, smart intern) and provide a demo or before/after case study.


146. How do you measure user satisfaction with GenAI features?

  • CSAT or thumbs-up/down post-interaction

  • Qualitative feedback (e.g., "Was this helpful?")

  • Task completion rates or drop-off points

  • Churn analysis before vs. after feature rollout

  • Use feedback loops to continuously refine prompts and UX


147. What are the challenges in monetizing GenAI-powered features?

  • Hard to differentiate if competitors use similar models

  • High token/inference cost vs. perceived value

  • Users may not trust or want to pay for generated content

  • Cold start problem: Low engagement without clear onboarding

  • Unclear pricing models (usage-based vs. flat rate)


148. How do you handle customer trust concerns with AI-generated content?

  • Provide transparency: Show what’s AI-generated

  • Offer editing or approval steps before final output

  • Allow users to opt-out or control AI behavior

  • Log generation history for accountability

  • Use guardrails to ensure factual and brand-safe content


149. What are examples of GenAI as a co-pilot in SaaS platforms?

  • Notion AI: Summarizes docs, drafts content

  • Github Copilot: Assists with code generation

  • Salesforce Einstein GPT: Auto-generates CRM insights

  • Zendesk AI: Drafts ticket replies

  • Canva Magic Write: Helps with creative copy


150. How would you build a GenAI roadmap for a startup?

Phase 1: Discovery

  • Identify pain points where GenAI adds value

  • Rapid prototyping with open APIs (e.g., OpenAI, Claude)

Phase 2: MVP Launch

  • Integrate one core feature (e.g., summarizer, chatbot)

  • Track adoption, costs, user feedback

Phase 3: Scaling

  • Optimize infra (token routing, RAG, caching)

  • Add personalization, multi-modal inputs, or fine-tuning

Phase 4: Governance & Trust

  • Add guardrails, feedback loops, auditing

  • Explore monetization, premium features


Last updated