IVQ 151-200
Q151. How does the chain rule apply to backpropagation in transformer-based LLMs?
Q152. How does matrix multiplication influence attention score computation in LLMs?
Q153. What role does the softmax derivative play in updating weights during LLM training?
Q154. How does gradient clipping help stabilize training in large-scale language models?
Q155. How do partial derivatives help fine-tune each layer in multi-head attention?
Q156. How is positional encoding applied in transformer architectures?
Q157. How does layer normalization affect the output in transformer blocks?
Q158. How are query, key, and value matrices derived from input embeddings?
Q159. How is multi-head attention different from single-head attention in transformers?
Q160. How are residual connections used in transformer layers to preserve gradient flow?
Q161. How does GPT-4o handle real-time cross-modal inputs during inference?
Q162. How does Claude 3 integrate vision and language for complex reasoning tasks?
Q163. How does Google DeepMind improve token alignment in multimodal training?
Q164. How do large models fuse audio, text, and images during joint representation learning?
Q165. How does OpenAI ensure modality balance during multimodal LLM pretraining?
Q166. What are the main categories of generative AI models?
Q167. How do encoder-only, decoder-only, and encoder-decoder models differ?
Q168. What types of pretraining objectives are used in foundation models?
Q169. How do vision-language models differ from pure language foundation models?
Q170. What distinguishes general-purpose foundation models from domain-specific ones?
Q171. How does LoRA improve parameter efficiency during fine-tuning?
Q172. How does QLoRA maintain performance while reducing memory usage?
Q173. How does adapter tuning preserve pre-trained knowledge in LLMs?
Q174. How does prefix tuning enable task adaptation with minimal updates?
Q175. ow do PEFT techniques balance generalization and specialization in LLMs?
Q176. How does dense retrieval differ from sparse retrieval in RAG pipelines?
Q177. What role do embeddings play in document retrieval for RAG?
Q178. How is retrieved context integrated into the prompt for generation?
Q179. What are the advantages of RAG over closed-book LLMs?
Q180. How does RAG ensure relevance and factual accuracy during response generation?
Q181. How does expert routing work in a Mixture of Experts architecture?
Q182. What are the trade-offs between sparse and dense MoE models?
Q183. How does MoE reduce computational cost while maintaining performance?
Q184. How does token-to-expert mapping affect efficiency in MoE-based LLMs?
Q185. What challenges arise when training large-scale MoE models?
Q186. How does CoT prompting differ from standard prompting in LLMs?
Q187. What types of tasks benefit most from Chain-of-Thought reasoning?
Q188. How does CoT prompting improve multi-step arithmetic and logic problems?
Q189.How is CoT prompting combined with self-consistency for better accuracy?
Q190. What are the limitations of CoT prompting in complex reasoning tasks?
Q191. What are the key differences between classification and generation tasks in AI models?
Q192. How do discriminative models learn decision boundaries, and how does that contrast with generative models?
Q193. How does the training objective differ for discriminative vs. generative models?
Q194. When should you choose a generative model over a discriminative one in real-world applications?
Q195. How do models like BERT (discriminative) and GPT (generative) differ in architecture and use cases?
Q196. How do knowledge graphs enhance factual grounding in LLM responses?
Q197. What techniques are used to connect structured knowledge with unstructured LLM outputs?
Q198. How does entity linking between text and a knowledge graph benefit reasoning tasks?
Q199. How can knowledge graphs reduce hallucinations in generative models?
Q200. What are the challenges of integrating dynamic or evolving knowledge graphs with LLMs?
Last updated