Shaping LLMs Into Writing Collaborators

From Tool to Collaborator: Why Most People Get This Wrong

I’ve watched colleagues open an LLM interface, type a vague request, get generic output, and conclude “AI writing sucks.” They’re not wrong about the output—but they’re treating a sophisticated tool like a magic 8-ball. Give it no context, no direction, no feedback loop, and you’ll get exactly what you’d expect: the algorithm’s median default.

The actual problems:

Expecting it to read your mind without setup
Taking whatever it produces on the first try
Never defining tone, role, or purpose
Restarting from scratch every session, losing all continuity

With strategic design—what I call behavioral shaping—these platforms become less of a black box and more of a partner. I’ve made my primary LLM match my voice, not by outsourcing the writing, but by architecting the conditions in which it helps me write faster, deeper, and more personally.

This isn’t about endorsing any particular platform. It’s about understanding that if you take the time to learn how these tools actually work, you’ll likely find a place for them in your research and writing workflow—whichever platform you choose.

Author’s note: While, I’ve tried to keep this generic, I primarily use ChatGPT Plus and Claude (free plan), so some bias in favor of these two platforms is inevitable.

The “LLMs Make Everything Worse” Crowd Has a Point

Academic Twitter is full of hot takes about how these tools are turning student essays into beige mush, how every blog post now sounds like it was written by the same uninspired algorithm. When people use these tools passively, that criticism lands.

Critics say:

“LLMs produce generic, lifeless text.”
“They encourage laziness or plagiarism.”
“Students lose their voice.”
“Everything becomes middlebrow default.”

All valid, when the tool is used passively. But here’s what frustrates me: the assumption that the problem is inherent to the technology, when really it’s about approach. With behavioral shaping, you:

Set tone, depth, and intellectual direction
Maintain your voice
Scaffold thinking instead of outsourcing it

In creative writing, you can configure your LLM to offer narrative turns while keeping control. In academic writing, it helps structure logic, propose counterarguments, or draft sections without writing the whole paper.

Once your LLM has seen enough of your writing within a scoped context, it can approximate your voice—at least enough to prevent jarring tonal shifts or unwanted genre leakage. This makes it easier to produce first drafts that are closer to your intent, even if they still require substantial revision.

What Is Behavioral Shaping?

Borrowed from behavioral psychology, shaping means reinforcing incremental steps toward a desired behavior through feedback. In the LLM context, you accomplish this through:

Instructions: defining behavioral parameters
Memory: preserving context across sessions
Projects or Scoped Contexts: compartmentalizing domains
Prompt strategies: guiding tone and structure
Iterative correction: recalibrating when output drifts

You’re not retraining the model—you’re designing its interaction conditions.

Platform Capabilities: What Each LLM Offers

Different platforms take different approaches to behavioral shaping. Understanding these differences helps you choose the right tool for your workflow:

ChatGPT (OpenAI) offers the most mature implementation: global custom instructions available even on free tier, persistent memory across conversations, and robust Projects for paid subscribers. The free tier is surprisingly capable—custom instructions and memory make it viable for basic behavioral shaping, though you’ll hit the ~10 messages per 5 hours pretty quickly.¹²³

Claude (Anthropic) takes a three-tiered approach: Profile Preferences (account-wide settings), Styles (tone and format customization), and Projects with custom instructions (Pro only). The free tier includes Profile Preferences and Styles, making it more capable than it initially appears. What it lacks is persistent memory between conversations on the free tier.⁴⁵⁶

Gemini (Google) recently democratized its Gems feature—custom AI assistants with specific instructions—rolling it out to free users in March 2025. This was a significant shift. Gems function like lightweight projects, though without the full structure of ChatGPT or Claude’s implementations. For developers, Google AI Studio offers free access to Gemini 2.5 Pro with system instructions, though it requires API knowledge and lacks a conversational interface.⁷⁸⁹¹⁰

Grok (xAI) remains less mature but is evolving. It added persistent memory via vector embeddings in April 2025 and offers response style presets (Concise, Formal, Socratic) plus custom styles. Documentation is sparse compared to established competitors, and features like “workspaces” are mentioned but not fully documented.¹¹¹²¹³

The comparison table below shows detailed feature availability across free and paid tiers.

Core Techniques for Behavioral Shaping

1. Setting Instructions

Every platform now offers some form of persistent instructions. The implementation varies—ChatGPT calls them “custom instructions,” Claude uses “Profile Preferences,” Gemini embeds them in “Gems”—but the principle remains the same: you’re establishing behavioral norms that persist across conversations.

What the LLM should know about you

“I’m a creative writer and philosophy professor. I blog as a personal essayist; I write philosophy formally.”

Note what I’m not including here. I’m a lawyer, and that affects my writing, but I don’t tell my LLM this because I want to avoid it attempting a “lawyerly” tone. It may infer this from content, but I don’t want it applied automatically. I tell it I’m a creative writer to establish boundaries; I tell it I’m a philosophy professor to shape tone for technically complex discussions. You need to make your own judgments about the voice and tone you’re looking for. Think of this like coaching a research assistant on how to help you write, “don’t do this, do that”.

How it should respond

“Use reflective tone for blog posts; formal and precise for academic writing. Avoid sentimentality and write in Markdown.”

These set your baseline. Platform-specific or per-conversation instructions can override them selectively.¹⁴

2. Crafting Effective Instructions

Strong instructions are:

Concrete (not vague)
Situated (provide context)
Scoped (define application boundaries)

Examples

I write short, reflective blog posts in Markdown. Avoid formulaic openings—foreground observation.

In academic mode: help me test arguments, not summarize. Use formal tone and cite sources.

When outlining blog posts, produce 3 distinct angle options before drafting.

Poor instructions sound like:

“Help me write better.”
“Be smart.”

And the worst instruction:

3. Role & Perspective Prompting

It’s equally important to define the model’s role as it is to describe your own. The power dynamic you establish—student/teacher, assistant/expert—significantly affects output tone and content.

When seeking explanations: “You are a college professor in this subject, I am a college sophomore and the subject is not my major. I’ve asked you for help understanding this subject.”

When collaborating on expert-level work: “You are a graduate student assisting me with writing. I’m a university professor and recognized expert in the field. I am writing for an academic journal that specializes in this field.”

The framing matters more than most people realize.

4. Memory and Context Retention

ChatGPT’s implementation of memory is the most robust—it works globally on free tier and can be scoped to specific Projects on paid plans. You can view and edit what it remembers via Settings → Personalization → Manage Memory.²

This is underutilized. My ChatGPT instance knows I use Hugo for blogging, that I work in Neovim with LazyVim, that I prefer side-by-side dual-language translations. I didn’t explicitly tell it these things in instructions—it learned them through conversation and retained them.

Crucially, you can specify whether memory should apply to a particular conversation. This matters for multi-domain users. I don’t want the same rules applied when writing a travelogue as when drafting a philosophy article. When my partner uses my account, his Project is configured to neither reference nor record to global memory.

Claude’s memory is Project-specific (Pro only), while Gemini’s is session-level with experimental personalization features. Grok uses vector embeddings rather than full transcripts, which is an interesting architectural choice.¹²

5. Projects and Scoped Contexts

Both ChatGPT (Plus/Pro) and Claude (Pro) now offer full Project implementations—scoped workspaces with their own memory, files, instructions, and goals. These let you maintain distinct behavioral contexts:

Personal Blog: Casual, reflective tone, Markdown, images
Philosophy Journal: Formal reasoning, citations, logical structure

Projects inherit global instructions unless you override them. The memory operates independently within each Project’s scope.¹⁴

Gemini’s Gems function similarly but with less structure. Google AI Studio requires manual prompt management—you’re essentially building your own scoping system.

For open-source models, you’re entirely responsible for context management, typically through frameworks like LangChain.¹⁵

6. Template-Based Instructions

When persistent features aren’t available or sufficient, maintain instruction templates as .md or .txt files:

casual-blog-instructions.md
academic-reasoning-instructions.md

Paste these at conversation start to simulate scoping. It’s less elegant than native Projects but functional.¹⁶

Instruction Examples Across Use Cases

Here are two ready-to-use instruction blocks demonstrating different approaches:

Casual personal blog style

You are my creative writing assistant.  
- Tone: relaxed, sensory, reflective.  
- Format: Markdown with headings and bullet lists.  
- Style: first-person narrative, occasional rhetorical questions.  
- Avoid formulaic intros, generic platitudes, or overly academic phrasing.  
First task: brainstorm 5 vivid openings about revisiting childhood landscapes.

Scholarly philosophy style

You are my academic assistant.  
- Tone: formal, analytical, cautious.  
- Format: 3-part essay outline with thesis, argument, counterargument, conclusion.  
- Provide citations in Chicago style.  
- Avoid personal opinions or emotive language.  
First task: outline a position defending philosophical skepticism about memory accuracy.

Technical Foundations

These behavioral shaping techniques parallel established AI training methods:

Reinforcement Learning from Human Feedback (RLHF): The foundational technique OpenAI and others use to align model behavior. When you provide feedback through instructions and corrections, you’re engaging in a user-level version of this process.¹⁷

Reward shaping: In training, this means adjusting reward signals to improve performance. Your instructions function analogously—rewarding certain outputs through specification and correction.¹⁸

Prompt engineering: All major platforms now document best practices emphasizing clear, concrete, step-by-step instructions for accuracy.¹⁹

The difference is scale. You’re not retraining the model’s weights, but you are shaping its interaction patterns within your specific context.

Platform Comparison: Free vs. Paid Tiers

Platform	Tier	Custom Instructions	Memory	Projects / Scoped Context	Key Limitations
ChatGPT (OpenAI)	Free	✅ Yes (Global)¹	✅ Yes (Global)²	❌ No Projects	~10 messages/5 hours³; switches to mini model
ChatGPT (OpenAI)	Plus/Pro	✅ Yes (Global & Per-Project)²⁰	✅ Yes (Global & Per-Project)	✅ Yes (Full Projects with files & scoped memory)²¹	160 messages/5 hours³; priority access
Claude (Anthropic)	Free	✅ Yes (Profile Preferences + Styles)⁴	❌ No persistent memory	❌ No Projects	No context retention between chats
Claude (Anthropic)	Pro	✅ Yes (Profile Preferences + Styles + Per-project instructions)⁵	✅ Yes (Project-specific memory)	✅ Yes (Unlimited Projects with custom instructions & knowledge uploads)⁶	5x usage vs. free; $20/month or $200/year
Gemini (Google)	Free	✅ Yes (via Gems - now free)⁷	⚠️ Limited (Session + experimental personalization)⁸	✅ Partial (Gems - now free)⁹	Session-level context only; Deep Research limited to few times/month
Gemini (Google)	Advanced	✅ Yes (via Gems + file uploads)	⚠️ Limited (Session + personalization + Search history)⁸	✅ Partial (Gems with file context)²²	No full project structure like ChatGPT/Claude
Google AI Studio (Google)	Free	✅ Yes (System instructions per prompt)¹⁰	❌ No persistent memory	⚠️ Partial (Manual prompt management)	Developer-focused; no conversational UI; requires API knowledge
Grok (xAI)	Free	⚠️ Unknown	⚠️ Unknown	❌ No projects	Limited public documentation; feature availability unclear
Grok (xAI)	Premium	⚠️ Partial (Response style presets: Concise, Formal, Socratic + custom styles)¹¹	✅ Yes (Persistent memory via vector embeddings)¹²	⚠️ Mentioned (Custom workspaces referenced but not fully documented)¹³	Less mature than competitors; evolving feature set
OSS Models	Any	❌ Depends on UI wrapper	❌ None by default	🧷 User-managed (via LangChain, etc.)	Fully dependent on deployment; no native features

Note: This comparison reflects capabilities as of October 2025. Google rolled out Gems to free users in March 2025, xAI added persistent memory to Grok in April 2025, and ChatGPT extended memory to free tier in June 2025. The competitive landscape continues to evolve rapidly.

Practical Demonstration: Same Prompt, Different Instructions

To illustrate behavioral shaping in practice, I generated responses using identical prompts but different instruction sets. You can download these instruction templates:

Each response was generated with no memory, global instructions, or project context—only the instruction block provided inline.

Prompt (Identical for Both)

Write an opening paragraph about returning to a coastal town after many years.

Side-by-Side Output Comparison

Reflective Travelogue	Philosophical Journal Article
I hadn’t realized how much the salt had embedded itself in memory until I stepped off the train and caught the first sour-sweet drift of brine on the air. The town hadn’t changed so much as settled—like an old quilt that’s softened at the edges. Same narrow boardwalk, same rust-streaked signage above the bakery, but quieter now, and perhaps smaller. I walked toward the harbor without really meaning to, noticing how my footsteps echoed more now, as if the place remembered me, too.	The experience of returning to a previously familiar coastal town after a prolonged absence raises questions about the persistence of place in relation to memory and identity. While the spatial coordinates may remain fixed, the phenomenological content of the place—its textures, associations, and emotional valence—has been irrevocably altered by temporal distance. This encounter thus becomes a site for examining the dialectic between continuity and rupture in personal narrative, particularly when filtered through sensory recall and the unreliability of nostalgia.

Reflective Travelogue

Philosophical Journal Article

I hadn’t realized how much the salt had embedded itself in memory until I stepped off the train and caught the first sour-sweet drift of brine on the air. The town hadn’t changed so much as settled—like an old quilt that’s softened at the edges. Same narrow boardwalk, same rust-streaked signage above the bakery, but quieter now, and perhaps smaller. I walked toward the harbor without really meaning to, noticing how my footsteps echoed more now, as if the place remembered me, too.

The experience of returning to a previously familiar coastal town after a prolonged absence raises questions about the persistence of place in relation to memory and identity. While the spatial coordinates may remain fixed, the phenomenological content of the place—its textures, associations, and emotional valence—has been irrevocably altered by temporal distance. This encounter thus becomes a site for examining the dialectic between continuity and rupture in personal narrative, particularly when filtered through sensory recall and the unreliability of nostalgia.

Analysis

Feature	Travelogue	Philosophy
Voice	First-person, narrative	Third-person, analytic
Tone	Lyrical, reflective	Formal, conceptual
Focus	Sensory memory, atmosphere	Identity, phenomenology, time
Purpose	Draws the reader in emotionally	Frames the topic for academic analysis

The same prompt, processed through different behavioral parameters, produces fundamentally different outputs. This demonstrates why understanding instruction design matters—you’re not just asking for content, you’re architecting the conditions under which that content emerges.

Choosing Your Platform

The “best” platform depends on your specific workflow:

For persistent memory across all conversations: ChatGPT’s free tier is surprisingly robust. You get global custom instructions and memory without paying, though message limits will constrain heavy usage.

For robust project structure: ChatGPT Plus/Pro or Claude Pro. Both offer full-featured Projects, with Claude’s implementation being slightly more flexible in how it handles custom instructions per project. ChatGPT has a larger context window and individual chats can be much longer (additionally, ChatGPT can reference other chats).

For developer/API access: Google AI Studio provides free access to Gemini 2.5 Pro with system instructions, if you’re comfortable working programmatically.

For emerging features: Keep an eye on Grok. It’s less mature but evolving rapidly, and its vector embedding approach to memory is architecturally interesting.

The key insight: whichever platform you choose, the principles of behavioral shaping remain consistent. Learn how your chosen LLM handles instructions, memory, and context, then architect your interaction accordingly.

Custom Instructions for ChatGPT - OpenAI, 2023. ↩︎ ↩︎
Memory and new controls for ChatGPT - OpenAI, 2025. ↩︎ ↩︎ ↩︎
GPT-5 in ChatGPT - OpenAI Help Center. ↩︎ ↩︎ ↩︎
Understanding Claude’s Personalization Features - Anthropic Help Center. ↩︎ ↩︎
Ibid. ↩︎ ↩︎
Claude Pricing - Anthropic. ↩︎ ↩︎
New Gemini app features, available to try at no cost - Google Blog, March 2025. ↩︎ ↩︎
Ibid. ↩︎ ↩︎ ↩︎
How to use Gems - Google Gemini Help. ↩︎ ↩︎
Gemini Developer API Pricing - Google AI for Developers. ↩︎ ↩︎
How to Use Grok 4: Modes, Memory & Prompts Explained for Beginner - Medium, September 2025. ↩︎ ↩︎
Grok Memory Explained: How xAI’s Chatbot Remembers You - Grok AI Model, June 2025. ↩︎ ↩︎ ↩︎
xAI Upgrades Grok with Personalized Memory and Custom Workspaces - Maginative, April 2025. ↩︎ ↩︎
Inheritance of Instructions
Project-level instructions typically override only the fields they redefine; global settings fill in the rest. Memory works inside Projects unless disabled. ↩︎ ↩︎
Context Management in OSS LLMs
With models like LLaMA and Mistral, you manage contexts manually—via prompt repetition, chaining, or frameworks like LangChain. ↩︎
Simulating Scoped Behavior with Templates
In the absence of native Projects/memory features, reusable prompt templates allow consistent behavior across sessions. ↩︎
OpenAI: RLHF research OpenAI describes this process as key to making models “more helpful and truthful.” ↩︎
Ng et al. 1999: Reward Shaping ↩︎
OpenAI: Prompt engineering guide ↩︎
Using Projects in ChatGPT - OpenAI Help Center. ↩︎
How to Use Projects in ChatGPT - How-To Geek, January 2025. ↩︎
The Ultimate Guide to Google Gemini Gems - Medium, September 2024. ↩︎

Behavioral Shaping: Making Your LLM into a Writing Collaborator

Shaping LLMs Into Writing Collaborators

From Tool to Collaborator: Why Most People Get This Wrong

The “LLMs Make Everything Worse” Crowd Has a Point

What Is Behavioral Shaping?

Platform Capabilities: What Each LLM Offers

Core Techniques for Behavioral Shaping

1. Setting Instructions

2. Crafting Effective Instructions

3. Role & Perspective Prompting

4. Memory and Context Retention

5. Projects and Scoped Contexts

6. Template-Based Instructions

Instruction Examples Across Use Cases

Casual personal blog style

Scholarly philosophy style

Technical Foundations

Platform Comparison: Free vs. Paid Tiers

Practical Demonstration: Same Prompt, Different Instructions

Prompt (Identical for Both)

Side-by-Side Output Comparison

Analysis

Choosing Your Platform

Sidebar Notes

Shaping LLMs Into Writing Collaborators#

From Tool to Collaborator: Why Most People Get This Wrong#

The “LLMs Make Everything Worse” Crowd Has a Point#

What Is Behavioral Shaping?#

Platform Capabilities: What Each LLM Offers#

Core Techniques for Behavioral Shaping#

1. Setting Instructions#

2. Crafting Effective Instructions#

3. Role & Perspective Prompting#

4. Memory and Context Retention#

5. Projects and Scoped Contexts#

6. Template-Based Instructions#

Instruction Examples Across Use Cases#

Casual personal blog style#

Scholarly philosophy style#

Technical Foundations#

Platform Comparison: Free vs. Paid Tiers#

Practical Demonstration: Same Prompt, Different Instructions#

Prompt (Identical for Both)#

Side-by-Side Output Comparison#

Analysis#

Choosing Your Platform#

Sidebar Notes#

Shaping LLMs Into Writing Collaborators

From Tool to Collaborator: Why Most People Get This Wrong

The “LLMs Make Everything Worse” Crowd Has a Point

What Is Behavioral Shaping?

Platform Capabilities: What Each LLM Offers

Core Techniques for Behavioral Shaping

1. Setting Instructions

2. Crafting Effective Instructions

3. Role & Perspective Prompting

4. Memory and Context Retention

5. Projects and Scoped Contexts

6. Template-Based Instructions

Instruction Examples Across Use Cases

Casual personal blog style

Scholarly philosophy style

Technical Foundations

Platform Comparison: Free vs. Paid Tiers

Practical Demonstration: Same Prompt, Different Instructions

Prompt (Identical for Both)

Side-by-Side Output Comparison

Analysis

Choosing Your Platform

Sidebar Notes