Overview: Criteria for picking the base LLM (size, capabilities, license)
Choosing the right foundation model is one of the most important decisions for your AI assistant. Your choice affects its quality, cost, hardware needs, and legal usage.
Here’s what you need to consider before loading your first model.
✅ Key Criteria
1️⃣ Model Size
Small (50M–1B parameters) ➜ Faster to run on CPUs or small GPUs. Good for lightweight tasks, simple Q&A, or prototypes.
Medium (1B–7B parameters) ➜ Balanced performance and cost. Runs well on modern consumer GPUs (8–16 GB VRAM).
Large (13B+ parameters) ➜ Higher fluency and accuracy, but needs more powerful GPUs or cloud hardware.
Tip: Bigger is not always better — choose what you can run comfortably for your target use case.
2️⃣ Capabilities & Specialization
General-purpose LLMs (like GPT‑2, LLaMA‑2, Mistral) ➜ Good for broad chat, creative text, and general reasoning.
Instruction-tuned LLMs ➜ Specifically trained to follow user instructions, give helpful answers, and maintain a polite tone. Look for models like
LLaMA‑2‑Chat,Mistral‑Instruct, orVicuna.Domain-specific LLMs ➜ Some models are fine-tuned for coding, medical Q&A, or other niches. Examples:
CodeLlamafor programming,BioGPTfor life sciences.
3️⃣ License & Usage Rights
Always check the model’s license on the Hugging Face model card.
Some models allow commercial use (e.g., OpenLLaMA, Mistral), while others are research-only.
Using a permissive license saves future headaches if you plan to deploy or share your assistant.
4️⃣ Hardware Requirements
Does the model fit your hardware? Check your GPU RAM, disk space, and CPU limits.
Use quantized versions (e.g., 4‑bit or 8‑bit) if you need to run large models on limited hardware.
For big models, consider running inference on cloud GPU providers.
✅ Practical Examples
gpt2
General
124M
MIT
Good starter for demo.
mistralai/Mistral-7B-Instruct
Chat/instruction
7B
Apache 2.0
Strong mid-size chat model.
NousResearch/Llama-2-13b-chat
Chat/instruction
13B
Meta LLAMA 2 license
Powerful, but needs large GPU.
TheBloke/CodeLlama-7B-GPTQ
Code
7B
LLAMA 2 license
Quantized for coding tasks.
google/flan-t5-base
Instruction
250M
Apache 2.0
Lightweight, multi-task.
✅ Key Takeaway
Choose a model that:
Fits your hardware
Meets your performance needs
Matches your assistant’s purpose
Has a license you’re comfortable with
➡️ Next: In the next section, you’ll load your chosen model and run your first prompt!
Last updated