Here’s a clean, accurate spec distilled from that bookmarklet.
What it is
A self‑contained HTML→Markdown exporter (bookmarklet) tailored to the ChatGPT web UI. It clones the page DOM, converts assistant messages’ HTML to Markdown using an embedded Turndown implementation (with custom rules), treats user messages as plain text, and downloads a single .md
file.
Inputs & assumptions
Runtime: Executed in the active browser tab as a bookmarklet.
Target DOM (ChatGPT):
Each message container matches
.text-message
.Role is read from
data-message-author-role
("user"
vs"assistant"
).User content is in
.whitespace-pre-wrap
.Assistant content is in
.markdown
.Code blocks appear as
<pre>
elements with either:<div class="overflow-y-auto"><code class="language-…">…</code></div>
, and an optional header bar.flex.items-center
that contains a language label; or- plain
<pre><code>…</code></pre>
.
Title: Uses
document.title
as the exported document title and filename fallback"ChatGPT Conversation"
.
Output
- Single Markdown file named
"{document.title}.md"
. - Top matter (plain Markdown prose), then a sequence of message sections, then a Files section with a link back to the current page (
window.location.href
).
Processing pipeline (high‑level)
Instantiate Turndown (
codeBlockStyle: "fenced"
, fence is).
Add custom rule for code blocks:
Filter: any
<pre>
that containscode
ordiv.overflow-y-auto
.Language: tries
.flex.items-center
text (header bar) if present.Content:
- Prefer
.overflow-y-auto
text (typical ChatGPT code wrapper), - else fall back to nested
<code>
text.
- Prefer
Emit:
```{lang} {code}
Clone
document.body
to a working DOM.Trim ChatGPT chrome:
- Remove
div.text-xs.gap-1
elements (token/utility badges). - Remove a footer/overlay
.absolute.bottom-0
if present.
- Remove
Build Markdown:
Heading:
# {document.title}
Two introductory paragraphs describing the export (mentions an “llms.txt specification”).
For each
.text-message
:- Role label:
"Human"
ifdata-message-author-role==="user"
, else"Assistant"
. - Section heading:
## {Role}
- User: take
.whitespace-pre-wrap
innerText (no HTML conversion). - Assistant: take
.markdown
innerHTML and run Turndown.
- Role label:
Files: append a bullet linking to the source page URL.
Download via a transient
<a download>…
usingBlob
+URL.createObjectURL
.
Turndown config & notable rules
Headings: setext for H1/H2, ATX (
###
…) for H3–H6.Lists: supports
<ul>/<ol>
; ordered items honorstart
attribute.Inline:
*
bullets,**strong**
,_em_
, inline links[]()
(no reference‑style unless switched).- Images

, hard line breaks as␠␠\n
.
Code:
- Inline code uses backticks, expands fence length if content contains backticks.
- Blocks use fenced style (```) due to bookmarklet options.
Whitespace normalization:
- Collapses extraneous spaces/newlines around inline nodes.
- Preserves pre/code unless intentionally converted.
DOM selectors the script relies on (brittle points)
Purpose | Selector/Attribute | Notes |
---|---|---|
Message containers | .text-message |
Must exist per message. |
Role detection | data-message-author-role |
"user" → Human; otherwise Assistant. |
User text | .whitespace-pre-wrap |
Extracts innerText. |
Assistant HTML | .markdown |
Converted with Turndown. |
Code wrapper | pre containing code or .overflow-y-auto |
Custom fenced rule. |
Code language label | .flex.items-center |
Header strip; optional. |
Code content (preferred) | .overflow-y-auto |
If absent, falls back to <code> . |
UI badges to remove | div.text-xs.gap-1 |
Removed from clone. |
Footer overlay to remove | .absolute.bottom-0 |
Removed if present. |
Edge cases & limitations
- Fragile to UI changes: Any change to ChatGPT class names or structure will break selection (e.g.,
.text-message
,.markdown
,.overflow-y-auto
). - Filename safety: Uses
document.title
verbatim asdownload
. If title contains/\:*?"<>|
or newlines, some platforms may mangle the name. - Language detection: Header bar text may be empty/pretty‑printed (e.g., “JavaScript • Copy”). No parsing/sanitizing; it’s inserted after the backticks.
- User content fidelity: Using
innerText
means you lose inline formatting (links, bold) in user messages. - Assistant conversion: Depends on Turndown defaults; complex HTML (tables, iframes) may degrade.
- Non‑ChatGPT pages: Will still run, but selectors may match nothing, producing a sparse file.
Security & privacy
- Runs entirely client‑side; no network I/O.
- Clones the DOM and serializes visible content; any secret shown on page will be exported.
Improvement opportunities (quick wins)
- Robust selectors: Fall back chains (e.g., try
.prose
,[data-testid=“markdown”]
, etc.) and guard if nodes are missing. - Safer filename: Sanitize
document.title
to a filesystem‑safe basename; add date stamp. - Language detection: Prefer
code[class*="language-"]
and strip UI fluff from header text. - User HTML: If a user message contains formatting, consider Turndown on
.whitespace-pre-wrap
innerHTML while preserving plaintext for pure text. - Table support: Add a Turndown plugin/rule for
<table>
to GitHub‑flavored Markdown tables. - Provenance: Append metadata block (timestamp, URL, model name if available in DOM).
- Scope limiting: If multiple chats are on page, select within the main conversation region to avoid stray matches.
Background that may help
- Turndown is a widely used HTML→Markdown engine. Custom rules (like the
pre
handler here) are the key to preserving code blocks and languages cleanly. - ChatGPT’s DOM wrappers for code often vary; relying primarily on
code[class*="language-"]
is more stable than class names like.overflow-y-auto
.
Native‑style rewrites of your ask (for reuse)
Style | Rewrite |
---|---|
Tone‑matched | “This is a bookmarklet. Can you analyze its spec?” |
Casual | “Got this bookmarklet—can you tell me what it does, exactly?” |
Academic | “Please provide a formal specification and behavioral analysis of this bookmarklet.” |
Gen‑Z | “Can u break down what this bookmarklet actually does, like for real?” |
Polite | “Could you analyze the specification of this bookmarklet, please?” |
Business | “Please review the bookmarklet and summarize its functional specification.” |
Minimalist | “Analyze this bookmarklet’s spec.” |
Socratic | “If this bookmarklet runs on a ChatGPT page, what evidence in the code shows how it extracts and formats content?” |
Poetic | “Unfold this folded script; tell me how it harvests words and binds them into Markdown.” |
Provocative | “If this bookmarklet mangles my chat, where does the spec fail?” |
Child‑friendly | “This tiny tool saves your chat as a text file. Can you explain how it works?” |
Old‑school | “Pray examine this bookmarklet and enumerate its operations.” |
Journalistic | “What does the bookmarklet capture, how does it process the page, and what file does it produce?” |
Tweetable | “Bookmarklet that turns ChatGPT chats into a Markdown file via custom Turndown rules. Spec breakdown?” |