Experiments
What Can RNNs Learn?
The Unreasonable Effectiveness in Action
Karpathy's char-rnn experiments demonstrated what RNNs can learn from raw text
In 2015, Andrej Karpathy trained a simple character-level RNN on various text corpora and was surprised by the results. The model learned not just to generate plausible text, but to capture the deep structure of each domain - from the rhythm of Shakespeare's iambic pentameter to the syntax of C code.
These experiments became legendary in the ML community because they showed that a relatively simple model could learn remarkably complex patterns. Let's explore what the model learned from each dataset.
Explain These Experiments to Your Stakeholders
Dinner Party Version
Dinner Party Version
The Experiments
Shakespeare
4.4MBComplete works of Shakespeare - plays, sonnets, and poems
PANDARUS: Alas, I think he shall be come approached and the day When little srain would be attain'd into being never fed, And who is but a chain and subjects of his death, I should not sleep. Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states. DUKE VINCENTIO: Well, your wit is in the care of side and that.
The model learns iambic pentameter rhythm, character names, stage directions, and dramatic dialogue structure.
Paul Graham Essays
1MBEssays on startups, programming, and technology
The surprised in investors weren't going to raise money. I'm not talking about someone who would become those. Inexpense, and in the earliest stage, and often, and that's like to be the company. At some point they were being with the start to go to a most start, a lot more about it is that means that every founder is something that I was most, they had to be funding.
Captures the essay style, startup vocabulary, and argumentative structure typical of PG essays.
Wikipedia
100MB (Hutter Prize)Wikipedia XML dump - encyclopedic content with markup
Naturalism and decision for the majority of Arab countries' capitalisms and the Ede in which was also the formation of the material said by the species of the 1970s, the the Minede of Algeria's French colony. The ethnic first the [[Algeria]]n period of the [[Pede]] of the [[History of Algeria]]... <ref name="Pede"/>
Learns Wikipedia markup, citation format, cross-references, and encyclopedic writing style.
LaTeX (Algebraic Geometry)
16MBBook on algebraic stacks/geometry with complex mathematical notation
\begin{proof}
We may assume that $\mathcal{I}$ is an abelian sheaf on $\mathcal{C}$.
\item Given a morphism $\Delta : \mathcal{F} \to \mathcal{I}$
is an injective and $\mathcal{I}$ is a quasi-coherent sheaf on $\mathcal{C}$,
then $\mathcal{I}$ is an abelian subcategory.
\end{proof}Produces valid LaTeX syntax, mathematical notation, theorem environments, and proof structures.
Linux Kernel Source
474MBC code from the Linux kernel codebase
/*
* Increment the size file of the new incorrect-Loss Ringer,
* expand_argument(in + 1, &(in), &flag_value);
* if (strcmp(buf, key) < 0)
* return -1;
*
* For the software, all and in-channel this value must be
* found to be less than the internal.
*/
static int __init init_module(void)
{
return driver_register(&example_driver);
}Learns C syntax, kernel coding conventions, comment styles, and function signatures.
Baby Names
8,000 namesList of common baby names
Rsjsjdsjd Mavede Jsjdjejv Marrida Chaede Layede Trede
Early training produces random characters; with more training, generates plausible name-like sequences.
Training Dynamics: Watching Learning Unfold
How generated text evolves from random noise to coherent prose
One of the most fascinating aspects of training char-rnn is watching the model's output evolve over time. Here's how Shakespeare generation improves from iteration 100 to 2000:
Random babbling
iter 100tyntd-Loss Ringer, expand weprof. In the sontain the a] of the for is dede a]gent-Loss Ringer
Model has learned spaces and basic character frequencies but no coherent structure.
Learning word boundaries
iter 300"The the shall to the serval, And all the stranger, the stranger." KING RICHARD III: And if the shall have a stranger, With the stranger that we have a stranger.
Words emerge, basic sentence structure appears, character names start forming.
Basic structure emerges
iter 500KING RICHARD III: I would the state and the ground for the ground And the state of the world and the ground, And we have the state and the state. Second Citizen: Come, what is the ground for the state?
Dialogue structure, character names, and dramatic conventions are recognizable.
Coherent generation
iter 2000KING RICHARD III: So shall the terror of my sojourn stand Upon the bloody brothers of the field; Where I will leave you to the heaven's guard, And live in peace, and think upon the dead. QUEEN ELIZABETH: As I intend to prosper and repent, So thrive I in my dangerous attempt.
Near-perfect iambic pentameter, coherent themes, proper dramatic structure.
Key Observation
The model learns structure hierarchically: first character frequencies, then word boundaries, then sentence structure, then paragraph organization, and finally semantic coherence. This mirrors how humans learn language - sounds → words → grammar → meaning.
Looking Inside: Interpretable Neurons
Individual neurons learn specific, human-interpretable functions
When Karpathy examined the hidden state activations, he discovered that individual neurons had learned specific, interpretable functions - without being explicitly programmed to do so. Here are some remarkable examples:
URL Detector Neuron
A neuron that activates specifically when inside URLs
http://www.google.com/search?q=hello
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[========= HIGH ACTIVATION =========]
normal text here and more text
[--- low activation throughout ---]
Visit http://example.com for more
^^^^^^^^^^^^^^^^^
[=== ACTIVATED ===]Without explicit URL training, the network learned a neuron that detects URL context.
Quote State Neuron
Tracks whether currently inside or outside quotation marks
He said "hello world" and then left.
| |
ON -----> OFF
"This is a longer quote that spans
[============ ON ===================
multiple words" and ends here.
===============]OFFThe neuron maintains state across many characters, remembering if a quote is open.
Bracket Depth Cell
Tracks nested bracket/parenthesis depth in code
if (condition && (x > 0)) {
| | | |
1 2 1 0
function(arg1, (nested, (deep))) {
| | | | |
1 2 3 2 1The network learns to count nesting levels, crucial for generating valid code.
Line Position Neuron
Encodes position within the current line (for indentation)
[BOL] def function(): ^ ^^^^^^^^^^^^^^^^ 0 increasing activation [BOL] return value ^ ^^^^ 0 high (indented)
Helps the model maintain consistent indentation in generated code.
Why This Matters
These interpretable neurons demonstrate that neural networks don't just memorize - they learn abstractions. The URL detector neuron has learned the concept of "being inside a URL" from pure character sequences. This emergent abstraction is a key insight into how deep learning works.
Key Takeaways from the Experiments
The same architecture learns vastly different domains - no domain-specific engineering needed
Character-level models learn hierarchical structure: chars → words → sentences → documents
Training progression reveals learning dynamics: structure before semantics
Individual neurons develop interpretable functions (URL detection, quote tracking, bracket counting)
Model quality scales with data size and training time - a precursor to modern scaling laws
From char-rnn to GPT: The Legacy
These experiments, while using a now-outdated architecture, established principles that power today's language models:
Then (char-rnn, 2015)
- • 3-layer LSTM, ~10M parameters
- • Trained on 1-100MB text
- • Character-level prediction
- • Could generate plausible text
Now (GPT-4, 2023+)
- • Transformer, ~1T+ parameters
- • Trained on petabytes of text
- • Token-level (subword) prediction
- • Emergent reasoning abilities
The core insight remains the same: train a model to predict the next token, and it will learn the structure of the domain. Scale it up, and remarkable capabilities emerge.