Overview: Criteria for picking the base LLM (size, capabilities, license)

Choosing the right foundation model is one of the most important decisions for your AI assistant. Your choice affects its quality, cost, hardware needs, and legal usage.

Here’s what you need to consider before loading your first model.


Key Criteria


1️⃣ Model Size

  • Small (50M–1B parameters) ➜ Faster to run on CPUs or small GPUs. Good for lightweight tasks, simple Q&A, or prototypes.

  • Medium (1B–7B parameters) ➜ Balanced performance and cost. Runs well on modern consumer GPUs (8–16 GB VRAM).

  • Large (13B+ parameters) ➜ Higher fluency and accuracy, but needs more powerful GPUs or cloud hardware.

Tip: Bigger is not always better — choose what you can run comfortably for your target use case.


2️⃣ Capabilities & Specialization

  • General-purpose LLMs (like GPT‑2, LLaMA‑2, Mistral) ➜ Good for broad chat, creative text, and general reasoning.

  • Instruction-tuned LLMs ➜ Specifically trained to follow user instructions, give helpful answers, and maintain a polite tone. Look for models like LLaMA‑2‑Chat, Mistral‑Instruct, or Vicuna.

  • Domain-specific LLMs ➜ Some models are fine-tuned for coding, medical Q&A, or other niches. Examples: CodeLlama for programming, BioGPT for life sciences.


3️⃣ License & Usage Rights

  • Always check the model’s license on the Hugging Face model card.

  • Some models allow commercial use (e.g., OpenLLaMA, Mistral), while others are research-only.

  • Using a permissive license saves future headaches if you plan to deploy or share your assistant.


4️⃣ Hardware Requirements

  • Does the model fit your hardware? Check your GPU RAM, disk space, and CPU limits.

  • Use quantized versions (e.g., 4‑bit or 8‑bit) if you need to run large models on limited hardware.

  • For big models, consider running inference on cloud GPU providers.


Practical Examples

Example Model
Type
Params
License
Notes

gpt2

General

124M

MIT

Good starter for demo.

mistralai/Mistral-7B-Instruct

Chat/instruction

7B

Apache 2.0

Strong mid-size chat model.

NousResearch/Llama-2-13b-chat

Chat/instruction

13B

Meta LLAMA 2 license

Powerful, but needs large GPU.

TheBloke/CodeLlama-7B-GPTQ

Code

7B

LLAMA 2 license

Quantized for coding tasks.

google/flan-t5-base

Instruction

250M

Apache 2.0

Lightweight, multi-task.


Key Takeaway

Choose a model that:

  • Fits your hardware

  • Meets your performance needs

  • Matches your assistant’s purpose

  • Has a license you’re comfortable with


➡️ Next: In the next section, you’ll load your chosen model and run your first prompt!

Last updated