Here’s a clean, accurate spec distilled from that bookmarklet.

What it is

A self‑contained HTML→Markdown exporter (bookmarklet) tailored to the ChatGPT web UI. It clones the page DOM, converts assistant messages’ HTML to Markdown using an embedded Turndown implementation (with custom rules), treats user messages as plain text, and downloads a single .md file.

Inputs & assumptions

  • Runtime: Executed in the active browser tab as a bookmarklet.

  • Target DOM (ChatGPT):

    • Each message container matches .text-message.

    • Role is read from data-message-author-role ("user" vs "assistant").

    • User content is in .whitespace-pre-wrap.

    • Assistant content is in .markdown.

    • Code blocks appear as <pre> elements with either:

      • <div class="overflow-y-auto"><code class="language-…">…</code></div>, and an optional header bar .flex.items-center that contains a language label; or
      • plain <pre><code>…</code></pre>.
  • Title: Uses document.title as the exported document title and filename fallback "ChatGPT Conversation".

Output

  • Single Markdown file named "{document.title}.md".
  • Top matter (plain Markdown prose), then a sequence of message sections, then a Files section with a link back to the current page (window.location.href).

Processing pipeline (high‑level)

  1. Instantiate Turndown (codeBlockStyle: "fenced", fence is ).

  2. Add custom rule for code blocks:

    • Filter: any <pre> that contains code or div.overflow-y-auto.

    • Language: tries .flex.items-center text (header bar) if present.

    • Content:

      • Prefer .overflow-y-auto text (typical ChatGPT code wrapper),
      • else fall back to nested <code> text.
    • Emit:

      ```{lang}
      {code}
  3. Clone document.body to a working DOM.

  4. Trim ChatGPT chrome:

    • Remove div.text-xs.gap-1 elements (token/utility badges).
    • Remove a footer/overlay .absolute.bottom-0 if present.
  5. Build Markdown:

    • Heading: # {document.title}

    • Two introductory paragraphs describing the export (mentions an “llms.txt specification”).

    • For each .text-message:

      • Role label: "Human" if data-message-author-role==="user", else "Assistant".
      • Section heading: ## {Role}
      • User: take .whitespace-pre-wrap innerText (no HTML conversion).
      • Assistant: take .markdown innerHTML and run Turndown.
    • Files: append a bullet linking to the source page URL.

  6. Download via a transient <a download>… using Blob + URL.createObjectURL.

Turndown config & notable rules

  • Headings: setext for H1/H2, ATX (###…) for H3–H6.

  • Lists: supports <ul>/<ol>; ordered items honor start attribute.

  • Inline:

    • * bullets, **strong**, _em_, inline links []() (no reference‑style unless switched).
    • Images ![alt](src "title"), hard line breaks as ␠␠\n.
  • Code:

    • Inline code uses backticks, expands fence length if content contains backticks.
    • Blocks use fenced style (```) due to bookmarklet options.
  • Whitespace normalization:

    • Collapses extraneous spaces/newlines around inline nodes.
    • Preserves pre/code unless intentionally converted.

DOM selectors the script relies on (brittle points)

Purpose Selector/Attribute Notes
Message containers .text-message Must exist per message.
Role detection data-message-author-role "user" → Human; otherwise Assistant.
User text .whitespace-pre-wrap Extracts innerText.
Assistant HTML .markdown Converted with Turndown.
Code wrapper pre containing code or .overflow-y-auto Custom fenced rule.
Code language label .flex.items-center Header strip; optional.
Code content (preferred) .overflow-y-auto If absent, falls back to <code>.
UI badges to remove div.text-xs.gap-1 Removed from clone.
Footer overlay to remove .absolute.bottom-0 Removed if present.

Edge cases & limitations

  • Fragile to UI changes: Any change to ChatGPT class names or structure will break selection (e.g., .text-message, .markdown, .overflow-y-auto).
  • Filename safety: Uses document.title verbatim as download. If title contains /\:*?"<>| or newlines, some platforms may mangle the name.
  • Language detection: Header bar text may be empty/pretty‑printed (e.g., “JavaScript • Copy”). No parsing/sanitizing; it’s inserted after the backticks.
  • User content fidelity: Using innerText means you lose inline formatting (links, bold) in user messages.
  • Assistant conversion: Depends on Turndown defaults; complex HTML (tables, iframes) may degrade.
  • Non‑ChatGPT pages: Will still run, but selectors may match nothing, producing a sparse file.

Security & privacy

  • Runs entirely client‑side; no network I/O.
  • Clones the DOM and serializes visible content; any secret shown on page will be exported.

Improvement opportunities (quick wins)

  1. Robust selectors: Fall back chains (e.g., try .prose, [data-testid=“markdown”], etc.) and guard if nodes are missing.
  2. Safer filename: Sanitize document.title to a filesystem‑safe basename; add date stamp.
  3. Language detection: Prefer code[class*="language-"] and strip UI fluff from header text.
  4. User HTML: If a user message contains formatting, consider Turndown on .whitespace-pre-wrap innerHTML while preserving plaintext for pure text.
  5. Table support: Add a Turndown plugin/rule for <table> to GitHub‑flavored Markdown tables.
  6. Provenance: Append metadata block (timestamp, URL, model name if available in DOM).
  7. Scope limiting: If multiple chats are on page, select within the main conversation region to avoid stray matches.

Background that may help

  • Turndown is a widely used HTML→Markdown engine. Custom rules (like the pre handler here) are the key to preserving code blocks and languages cleanly.
  • ChatGPT’s DOM wrappers for code often vary; relying primarily on code[class*="language-"] is more stable than class names like .overflow-y-auto.

Native‑style rewrites of your ask (for reuse)

Style Rewrite
Tone‑matched “This is a bookmarklet. Can you analyze its spec?”
Casual “Got this bookmarklet—can you tell me what it does, exactly?”
Academic “Please provide a formal specification and behavioral analysis of this bookmarklet.”
Gen‑Z “Can u break down what this bookmarklet actually does, like for real?”
Polite “Could you analyze the specification of this bookmarklet, please?”
Business “Please review the bookmarklet and summarize its functional specification.”
Minimalist “Analyze this bookmarklet’s spec.”
Socratic “If this bookmarklet runs on a ChatGPT page, what evidence in the code shows how it extracts and formats content?”
Poetic “Unfold this folded script; tell me how it harvests words and binds them into Markdown.”
Provocative “If this bookmarklet mangles my chat, where does the spec fail?”
Child‑friendly “This tiny tool saves your chat as a text file. Can you explain how it works?”
Old‑school “Pray examine this bookmarklet and enumerate its operations.”
Journalistic “What does the bookmarklet capture, how does it process the page, and what file does it produce?”
Tweetable “Bookmarklet that turns ChatGPT chats into a Markdown file via custom Turndown rules. Spec breakdown?”