Alice got very little. Her little one there. Nothing. No thing. No say. So Alice thought: Thing? Things? No things.
Just then rabbit came. Nothing so very great—Alice see rabbit before. Such rabbit said: Oh! Oh! Oh! Rabbit looked some little thing. His way. Then rabbit went down.
Alice got up. Alice never see rabbit look thing like thing. Alice went after him. Down. Down. Rabbit went into large round thing. Alice went down after rabbit. Down. Down. Long? thought Alice. Very long.
Alice was growing very tired of sitting by her sister on the bank, and of having nothing to do.
Just then a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that — nor did Alice think it so very strange to hear the Rabbit say to itself: “Oh dear! Oh dear! I shall be too late!” But when the Rabbit took a watch out of its waistcoat-pocket and looked at it, Alice started to her feet. It flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it.
Burning with curiosity, she ran across the field after it, and was just in time to see it pop down a large rabbit-hole under the hedge.
In another moment down went Alice after it, never once considering how in the world she was to get out again.
The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down what seemed to be a very deep well.
“Well!” thought Alice. “After such a fall as this, I shall think nothing of tumbling down stairs. How brave they’ll all think me at home!”
Down, down, down. Would the fall never come to an end?
しかし、まだキャロルの「言葉遊び」は再現できていません。Curiouser and curiouser! のような造語、Mad Hatterの逆説的な論理——それらを描くには、まだ言葉が足りない。
次回の1000語編では、「言葉遊びが復活する瞬間」を追いかけます。
この記事のPythonコード
前回同様、Cursor × Pythonで解析しています。コードの主要部分はこちら。
import nltk
from collections import Counter
from nltk.corpus import stopwords
# アリス原文(Project Gutenberg)を読み込み
with open("alice.txt", "r") as f:
text = f.read().lower()
# トークン化
tokens = nltk.word_tokenize(text)
# ストップワード除外
stop_words = set(stopwords.words("english"))
filtered = [w for w in tokens if w.isalpha() and w not in stop_words]
# 頻出500語を抽出
freq = Counter(filtered)
top_500 = [word for word, count in freq.most_common(500)]
# 500語のみで本文を再構成(500語以外はマスク)
reconstructed = " ".join([w if w in top_500 else "___" for w in tokens])