Large Language Models Explained Briefly
3Blue1Brown — The single best 10-minute explainer ever produced. Visual, accurate, no jargon. If you only watch one thing on this page, watch this.
Open →Six foundational explainers for anyone who has never used before — what an actually is, why matter, what makes an different from a chatbot, what Claude Code is, how it differs from ChatGPT, and why we picked it over the alternatives. Read this first if any of those questions feels uncertain. Audience: complete beginner · ~20 min read · 16 curated external resources → ready to install? Skip to Day 1 → already installed? Try a Use Case
Read straight through, or skip to the section that matches the question you’re sitting on right now. The order goes concept → product → choice: what the underlying AI is, what it can and can’t remember, what changes when you give it tools, what the specific product is, how it relates to its siblings, and why we picked it.
Cut through the word “AI” first. AI is a marketing umbrella. It covers image generation, speech recognition, self-driving cars, recommendation engines, robotics — and the thing we’re actually here to talk about. On its own the word is almost useless.
The specific thing that exploded in late 2022, the thing reshaping how knowledge workers do their jobs — including ours — is one narrow slice called a Large Language , or . Other kinds of AI exist and matter. Throughout this hub, when we say AI, we mean LLM. Everything else is a different conversation.
An LLM is a program. A very large one. It was trained on a huge amount of text — books, websites, code, documentation, forums, manuals. Most of the public internet. During training it learned one thing, and only one thing, very well: given a sequence of words, predict the next word. That’s it. That’s the whole trick.
Now do that prediction billions of times, with hundreds of billions of internal parameters, and what emerges is a system that can answer questions, write code, summarise a document, draft an email — because all of those, when you look closely, are “what comes next” problems. You ask “what’s the capital of France” — the most statistically likely next words are “The capital of France is Paris.” You ask “write a Python function that sorts a list” — the most likely next are valid Python. You paste a five-page contract and ask for a summary — the most likely next words are a summary.
1. It is statistical, not thinking. The LLM has no beliefs, no opinions, no memory between conversations, no model of truth versus falsehood. It is optimising for plausibility — what looks like a reasonable next word — not correctness. Most of the time plausibility and correctness overlap. When you ask “what’s two plus two,” the plausible answer and the correct answer are both “four.” But they can diverge.
2. That’s why it hallucinates. A is when the model confidently produces something that sounds right and isn’t. A made-up function name. A fictional court case. A wrong date. A library that doesn’t exist. It’s not lying. Lying requires knowing the truth. It’s pattern-matching to what a plausible answer looks like. If you ask for a citation, citations have a shape — author, year, title, journal — and it will produce something with that shape, even if the citation doesn’t exist. The fix is not to trust it. The fix is to verify. Always.
3. A token is the unit. When the LLM predicts “the next word” it’s actually predicting the next token. A token is roughly a word, or a piece of a word. “Hello” is one token. “Hallucination” might be three. Punctuation is its own token. You’ll see token counts everywhere — pricing is per token, context limits are in tokens. A 200,000-token is roughly 150,000 words.
4. in, out. Everything you type — your question, the files Claude reads, the instructions you give — is the prompt. Everything it produces back is the completion. That’s the whole vocabulary.
5. It has no memory between sessions. Close the , reopen it tomorrow, the model has zero recollection of yesterday. Everything Claude “knows” about your project, your conventions, your past conversations is what’s currently in its context window (next step). Anything outside that, it cannot see. The knowledge baked into the model itself was also frozen at some point in the past — its “training cutoff.” Ask it about events after that date and it’ll either say it doesn’t know, or — worse — make something up that sounds right.
This is a wildly compressed summary. If you want to actually understand this, the resources at the end of this section are some of the best free explanations ever produced.
3Blue1Brown — The single best 10-minute explainer ever produced. Visual, accurate, no jargon. If you only watch one thing on this page, watch this.
Open →Andrej Karpathy — Andrej Karpathy was at OpenAI and Tesla. This is the deeper version. Covers what LLMs are, where they're going, security implications. Generally considered required watching in the field.
Open →3Blue1Brown — If the 10-minute version left you wanting more — this is the visual deep-dive into how transformers actually work.
Open →/compact squeezes the conversation into a summary.
/clear empties the glass entirely. When in doubt, clear.
If you remember one mechanical fact from this whole page, make it this: the can only see what fits in its , and once it’s full, things start falling out.
The glass-of-water metaphor:
Picture a glass of water. Every conversation fills the glass. Your first pours something in. The model’s reply pours something in. Every file the model reads, every command it runs — all pour into the same glass.
The glass has a fixed size. When it’s full, the oldest water spills out. The model literally stops being able to see the early part of the conversation.
This is why long sessions degrade. The model forgets what you told it an hour ago, because that water spilled.
This is why we have commands like /compact — squeeze the conversation down to a summary — and /clear — empty the glass and start fresh. When in doubt: clear and start over. Cheaper than fighting a polluted context.
The numbers:
A context window is measured in (think: ~3-4 characters of text per token, or about ¾ of a word on average). Modern Claude models hold ~200,000 tokens, which sounds like a lot until you realise a 500-line source file might cost you 2,000 tokens, and a session that reads ten files plus a long stack trace plus your back-and-forth chat can fill 50,000 before you notice.
Two failure modes happen when context fills up:
Practical implications:
/compact — summarises the conversation so far, freeing up tokens. Use it between unrelated tasks./clear — wipes the context entirely. Cheaper context = better answers. Don’t be precious about it.CLAUDE.md — a file you write at the project root that gets loaded every session. The persistent memory, paid for in tokens once per session instead of repeated every turn. (Day 1 → Step 5 walks through the why with a before/after visualisation, and shows what one looks like.)Trust nothing past the 70% context-fill mark without a review. If you start seeing the model invent things — that’s the signal. /compact or /clear and start fresh.
IBM Think — Vendor-neutral, technically careful explanation. Defines the term, explains the trade-offs, links to deeper reading.
Open →Perivitta Rajendran — Connects the context-window mechanic directly to hallucination behaviour — the practical "why does Claude start making things up?" answer.
Open →Rishabh Bhandari (Bootcamp / Medium) — Covers the "lost in the middle" phenomenon — why even large context windows don't solve forgetting cleanly.
Open →You’ve used ChatGPT. You typed a question, it typed an answer. That’s a chatbot loop. One in, one out. The is a brain in a jar — it can produce text but it can’t do anything in your world.
An is the same model with hands. Specifically: it’s been given a set of tools (read a file, write a file, run a , fetch a URL, query a …) and it’s wrapped in a loop that lets it decide which tool to call, call it, read the output, decide again, and continue — until it thinks the task is done or you stop it.
What changes when you flip from chatbot to agent:
Karpathy has a memorable framing: think of the agent as a new kind of operating system, with the as the kernel orchestrating tools, files, and processes. We’re not all the way there yet, but is one of the products closest to that pattern.
CLAUDE.md next timeForethought — Plain-English breakdown of agent autonomy + tool use vs scripted chatbot decision trees. Good for non-technical readers.
Open →MindStudio — Maps the agent-loop concept onto Anthropic's specific products. Explains why Claude Code "feels different" from calling the API yourself.
Open →One sentence: is a -based AI assistant that lives next to your files and can read them, edit them, and run commands — with your permission.
It is the (Step 3) wrapping the (Step 1) — specifically, ’s Claude family of — and running locally on your machine.
Four moving parts:
claude, you’re in a session. No website to log into, no browser tab. It runs locally on your machine.claude.com/code page. ChatGPT is a website. Claude.ai is a website. Claude Code is software that runs on your computer.That last part is the whole game. Chat assistants make you the bottleneck — you paste in, you copy out, you apply changes by hand. Claude Code removes the copy-paste loop entirely.
The mental shift from ChatGPT:
CLAUDE.md, and it integrates. You supervise.From operator to supervisor. That is the shift.
You stay in control: you approve risky actions, you read the diff before committing, you tell it to back off when it goes sideways.
Anthropic (official) — Anthropic's own beginner course — installation through advanced customisation. The most authoritative starting point. Free.
Open →Tech with Tim — Hands-on walkthrough of the actual CLI experience. Watch this before you install if you want to see what you're signing up for.
Open →Anthropic — Non-technical Anthropic staff showing real workflows. Useful for non-developers who want to see the breadth of use cases.
Open →This is the question almost every newcomer asks within the first day, and the names don’t help. Here’s the disambiguation.
Hold this distinction first: every AI vendor — , OpenAI, Google — sells two different things.
Same engine, completely different vehicle. Once you internalise the model-versus-product distinction, the rest of this step is just naming.
Claude is the model — the actual AI. It’s what answers the question, regardless of which interface you use. Anthropic ships several variants: Opus (most powerful), Sonnet (the daily driver), Haiku (fastest). When someone says “Claude said X,” they mean a Claude model produced X — but they may have reached it through any of three different surfaces.
| Surface | What it is | When to reach for it | What you pay |
|---|---|---|---|
| Claude.ai | The chatbot at claude.ai (and the desktop/mobile apps). A web tab with a text box. | Quick questions. Drafting text. Brainstorming. “Explain this concept.” Doesn’t touch your files. | Free tier, then ~$20/month (Pro), ~$100-200/month (Max) |
| A tool. Runs in your . Has filesystem access, runs commands, edits code. | Anything that needs to change files in a project. Refactors. New features. Debugging. CI/. | Included in Claude.ai Pro+. Or pay-per- via the API | |
| Anthropic API | Raw programmatic access — your code calls Claude over . | You’re building something that uses Claude inside it. Custom tools, internal . | Pay-per-token. |
| Claude Cowork | A variant of Claude Code aimed at non-developers. Research preview. | Non-technical folks who still want hands-on capability. | Same Claude.ai subscriptions. |
Compared to other vendors:
If you want the short version: Claude.ai is the browser tab; Claude Code is the terminal agent; the API is for programmers building tools. They all talk to the same models.
16x.engineer — Clearest single-page disambiguation of the three surfaces. Read this if the names are blurring together.
Open →RSL/A — Covers the full product family including Desktop and Cowork. Decision-tree shaped — match your job to the right surface.
Open →Honest answer: there isn’t a single “best” coding . There’s a constellation of tools with overlapping capabilities, and we picked because it’s the strongest fit for what we do. Some context.
| Tool | What it is | Why you might pick it | Why you might not |
|---|---|---|---|
| Claude Code | ’s agent, Claude . | Deepest reasoning. 200K context delivered reliably. Strong autonomous multi-step. -first composes with any editor. | Locked to Claude (no GPT/Gemini switching). Subscription cost can climb on heavy use. |
| Cursor | VS Code fork with built-in agent. | Visual IDE feel. Can switch between models. Fast inline edits. | Reports of context truncation under hood. IDE lock-in if you preferred your existing setup. |
| Cline | VS Code extension, full agent loop. | Lives inside your existing VS Code. Step-by-step approval mode. Open source. | Pay your own model bills. Less “deeply integrated” feel. |
| Aider | Terminal pair-programmer. Model-agnostic. | Git-native (every edit is a ). -efficient. Works with any model. | No visual interface. Smaller blast radius — less autonomous than Claude Code on big tasks. |
| Copilot | Inline + a separate chat. | Already on most dev machines. Tight VS Code integration. | Completion-shaped, not agent-shaped. Less suited to multi-file refactors. |
Why we picked Anthropic and Claude (not OpenAI or Google):
Why we landed on Claude Code at NBG specifically:
Many teams end up running two tools side-by-side: Claude Code for hard problems, Cursor or Copilot for fast inline edits. That’s a sensible end state. The hub is opinionated about Claude Code as the primary tool here, not the only one allowed to exist.
You’re now equipped with the mental models. Move on to Day 1 when you’re ready to actually install Claude Code and run your first session — five concrete steps, about thirty minutes, you’ll be productive by the end of it.
DEV Community — Most comprehensive 2026 landscape overview. Useful for understanding *what else exists* before deciding Claude Code is right.
Open →Builder.io — Head-to-head with the closest competitor. Honest about where Cursor wins (visual editing) and where Claude Code wins (autonomous multi-file work).
Open →Artificial Analysis — Third-party benchmark numbers — not vendor marketing. Useful for "is the hype real" questions.
Open →You've got the mental . Now install and run your first session — six concrete steps, about thirty minutes, productive by the end of it. Then point Claude at something real: fifteen worked use cases you can try the same evening.