Chapter 5.2 – How Large Language Models Are Trained (High Level)

A practical, non-academic look at how large language models are trained — and why that training shapes how they behave in real-world usage.

Posted Jan 28, 2026 Updated Mar 1, 2026

How Large Language Models Are Trained

By Ravi Joshi

5 min read

Chapter 5.2 – How Large Language Models Are Trained (High Level)

How Large Language Models Are Trained (High Level)

When I first started using LLMs seriously, I didn’t care how they were trained.

They worked — mostly — and that was enough.

But after a while, patterns started showing up:

Sometimes they were confident and wrong
Sometimes they hedged when they didn’t need to
Sometimes they refused reasonable requests
Sometimes they hallucinated details that sounded real

At first, this felt random.

Later, I realized: this behavior isn’t random — it’s shaped directly by how these models are trained.

Not just what data they see, but how they’re optimized and what they’re rewarded for.

That’s what this chapter is about — not the math, not the academic theory — but the training pipeline that turns raw text into something that feels conversational.

Aha! Moment:
The quirks and surprises in LLM behavior aren’t random—they’re a direct result of how these models are trained and rewarded.

The Big Picture First

At a high level, training an LLM happens in three major stages:

Pre-training – Learn language patterns from massive text corpora
Fine-tuning – Adapt behavior for specific tasks or domains
Alignment (RLHF) – Teach the model how to respond helpfully, safely, and usefully

Each stage shapes how the model behaves in production — and skipping any of them makes the system feel very different to use.

flowchart LR
    Data[Raw Text Data] --> Pretrain[Pre-training]
    Pretrain --> Finetune[Fine-tuning]
    Finetune --> Align[Alignment / RLHF]
    Align --> Model[Deployed LLM]
    style Data fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style Pretrain fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style Finetune fill:#ede7f6,stroke:#5e35b1,stroke-width:2px
    style Align fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style Model fill:#fce4ec,stroke:#ad1457,stroke-width:2px

Engineer’s reflection: Think of this like base AMI → hardened image → org-specific golden image → production rollout.

Let’s walk through each stage in practical terms.

Stage 1 — Pre-training: Learning Language at Scale

Pre-training is where the model learns:

Grammar
Sentence structure
Facts (implicitly)
How ideas relate across long contexts

The training data usually includes:

Books
Websites
Articles
Documentation
Code
Forums

The objective is simple: Predict the next token.

But doing this across trillions of tokens forces the model to learn surprisingly deep patterns — including reasoning-like behavior, summarization, translation, and explanation — even though none of those tasks were explicitly programmed.

This is where most of the compute and cost lives.

Infra analogy: Like training a base OS image by installing every major library and runtime — heavy, expensive, but reusable everywhere.

After pre-training, the model:

Knows language very well
Can generate fluent text
But isn’t particularly helpful yet

It can ramble, contradict itself, ignore instructions, or behave oddly in conversations.

That’s where the next stages matter.

Stage 2 — Fine-tuning: Shaping Task Behavior

Fine-tuning takes the pre-trained model and exposes it to smaller, curated datasets focused on specific behaviors, such as:

Answering questions
Writing code
Following instructions
Summarizing text
Maintaining tone

Instead of raw internet-scale data, this stage uses:

Prompt–response pairs
Domain-specific examples
Task-oriented datasets

This doesn’t teach the model new language — it teaches it how to use what it already knows in structured ways.

Engineer’s reflection: Pre-training teaches syntax. Fine-tuning teaches workflows.

Without fine-tuning, you’d get a model that speaks well but collaborates poorly.

Stage 3 — Alignment: Teaching the Model How to Behave

Even after fine-tuning, models can:

Give unsafe advice
Produce biased outputs
Be overly verbose
Ignore user intent
Sound rude or unhelpful

Alignment training — often using RLHF (Reinforcement Learning from Human Feedback) — tries to correct this.

The rough flow looks like:

flowchart LR
    Prompt[User Prompt] --> LLM1[Model Responses]
    LLM1 --> Human[Human Ranking]
    Human --> Reward[Reward Model]
    Reward --> Train[Model Optimization]
    Train --> Improved[Aligned LLM]
    style Prompt fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style LLM1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style Human fill:#ede7f6,stroke:#5e35b1,stroke-width:2px
    style Reward fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style Improved fill:#fce4ec,stroke:#ad1457,stroke-width:2px

This stage doesn’t improve factual knowledge much — but it massively improves:

Helpfulness
Tone
Instruction-following
Safety
Refusal behavior

Engineer’s warning: Alignment doesn’t make the model smarter — it makes it nicer and safer to use.

Why Training Shapes Model Personality

Once you know this pipeline, a lot of LLM behavior makes sense:

Hallucinations → gaps in training data + confidence bias from pre-training
Over-cautious refusals → strong alignment rewards for safety
Verbosity → reward tuning favors helpfulness and completeness
Polite tone → alignment data heavily biases toward cooperative style

These aren’t bugs — they’re side effects of optimization goals.

Engineer’s reflection: You don’t get “truth machines.” You get “reward-optimized text generators.”

Why Fine-Tuned Models Feel So Different

This also explains why:

Base models feel raw and weird
Chat models feel conversational
Code models feel structured
Domain models feel specialized

They’re not different architectures — they’re different training trajectories.

Same foundation. Different shaping forces.

Infra analogy: Same Kubernetes cluster, different admission controllers, policies, and workloads — behavior changes without changing the core platform.

What This Means for Practitioners

From an engineering perspective, this training setup implies:

LLMs don’t reason — they optimize reward functions
They reflect training data distributions, not reality
Alignment trades off accuracy for safety and usability
Fine-tuning biases behavior more than people expect
Prompting is essentially runtime fine-tuning

This explains why:

Small prompt changes produce large output differences
Tone matters as much as content
Constraints improve results
Examples outperform instructions

Engineer’s reflection: Prompting is configuration, not querying.

What I Wish I Knew Earlier

Takeaway:
LLM training happens in layers: pre-training → fine-tuning → alignment
Pre-training teaches language, not helpfulness
Fine-tuning teaches structure and task behavior
Alignment teaches tone, safety, and cooperation
Most “model personality” comes from alignment, not intelligence

What’s Next?

➡ Series 5 – Chapter 5.3: Tokens, Context Windows, and Why Prompts Matter

In the next chapter, we’ll explore:

What tokens actually are
How context windows limit memory
Why models forget earlier parts of conversations
How prompt structure impacts output quality

Architectural Question: How can you design prompts and context to get the most reliable, useful output from LLMs—especially as context windows and tokenization limits come into play?

You now understand how LLMs are shaped by their training. Next, we’ll learn how to communicate with them effectively.

ai, llm, training, machine-learning

This post is licensed under CC BY 4.0 by the author.