Prompt Trainer Documentation - PromptAsCode

Overview

The Prompt Trainer is PromptAsCode's flagship feature, bringing RLHF (Reinforcement Learning from Human Feedback) principles to prompt engineering. Train your prompts by rating outputs, learning patterns from your preferences, and getting AI-powered improvement suggestions.

Why RLHF for Prompts?

Revolutionary

Just as RLHF revolutionized AI model training (GPT-4, Claude, etc.), applying these principles to prompt engineering creates prompts that consistently produce outputs aligned with human preferences.

Rate Outputs

A/B preference collection

Learn Patterns

AI finds what you prefer

Get Suggestions

Automated improvements

What is RLHF?

RLHF Explained

Reinforcement Learning from Human Feedback (RLHF) is the technique that made modern AI assistants like GPT-4 and Claude so effective at following instructions and being helpful.

The key insight: instead of trying to define "good" output mathematically, you let humans compare outputs and say which one they prefer. Over many comparisons, the system learns what humans actually want.

RLHF for Prompts

The Prompt Trainer applies this same principle:

Generate: Run your prompt to get multiple outputs
Compare: See two outputs side-by-side (A/B test)
Prefer: Select which output you prefer (or tie/neither)
Learn: System identifies patterns in your preferences
Improve: Get suggestions that align with your preferences

How to Use

1
Create a Training Session - Enter your prompt and give the session a name. Configure the model and parameters.
2
Generate Output Pairs - The system generates two outputs (A and B) from your prompt.
3
Rate Preferences - Select which output you prefer: A, B, Tie (both equal), or Neither (both bad).
4
Repeat & Iterate - Continue rating pairs. More ratings = better pattern recognition.
5
Review Learned Patterns - See what the system has learned about your preferences.
6
Get Improvement Suggestions - Receive AI-powered prompt improvements based on your preference patterns.

Training Flow

The Training Loop

┌─────────────────────────────────────────┐
│           TRAINING SESSION              │
├─────────────────────────────────────────┤
│                                         │
│    Your Prompt ──► Generate A & B       │
│                        │                │
│                        ▼                │
│              ┌─────────────────┐        │
│              │   Compare A/B   │        │
│              └────────┬────────┘        │
│                       │                 │
│         ┌─────────────┼─────────────┐   │
│         ▼             ▼             ▼   │
│      [Prefer A]   [Prefer B]   [Tie/Neither]
│         │             │             │   │
│         └─────────────┼─────────────┘   │
│                       │                 │
│                       ▼                 │
│              Preference Recorded        │
│                       │                 │
│                       ▼                 │
│              Pattern Learning           │
│                       │                 │
│                       ▼                 │
│              Next Pair (repeat)         │
│                                         │
└─────────────────────────────────────────┘

Preference Collection

The preference interface shows two outputs side-by-side for comparison:

AOutput A

First generated response. Click "Prefer A" or press 1 if this is better.

BOutput B

Second generated response. Click "Prefer B" or press 2 if this is better.

Rating Options

Prefer A / Prefer B: One output is clearly better
Tie: Both outputs are equally good
Neither: Both outputs are unsatisfactory

How to Rate Effectively

Focus on the specific quality you care about (accuracy, tone, format)
Be consistent in your criteria across ratings
Use "Neither" when both outputs miss the mark - this is valuable signal
Don't overthink - your first instinct is usually right

Pattern Learning

After collecting preferences, the system analyzes patterns in your choices:

Learned Patterns Example

Preference Analysis (47 ratings)
================================

STRONGLY PREFERRED:
✓ Structured responses with bullet points
✓ Code examples with comments
✓ Concise explanations (< 200 words)
✓ Technical accuracy over simplicity

AVOIDED:
✗ Long prose paragraphs
✗ Hedging language ("might", "perhaps")
✗ Generic advice without specifics
✗ Responses starting with "I"

NEUTRAL:
○ Emoji usage
○ Formal vs casual tone

Minimum Ratings

10 ratings: Basic patterns emerge
25 ratings: Reliable pattern detection
50+ ratings: Highly confident patterns

AI Suggestions

Based on learned patterns, the trainer suggests prompt improvements:

Example Suggestions

Based on your preferences, consider these changes:

1. ADD FORMAT INSTRUCTION
   Original: "Explain the concept"
   Suggested: "Explain the concept using bullet
   points with code examples"
   Why: You preferred structured responses in 89% of
   comparisons

2. ADD CONCISENESS CONSTRAINT
   Suggested addition: "Keep explanations under 200
   words unless the topic requires more detail"
   Why: You consistently preferred shorter, focused
   responses

3. REMOVE HEDGING
   Suggested addition: "Be direct and confident.
   Avoid hedging words like 'might' or 'perhaps'"
   Why: You avoided outputs with uncertain language
   in 94% of cases

Keyboard Shortcuts

Speed up your training with keyboard shortcuts:

1Prefer Output A

2Prefer Output B

TMark as Tie

NMark as Neither

AI Expert Use Cases

Production Prompt Fine-Tuning

Use the Trainer to systematically improve prompts for production applications. The data-driven approach ensures improvements are based on actual preferences, not intuition.

Team Alignment

Have multiple team members rate outputs to build consensus on what "good" looks like. The patterns become a shared understanding of quality.

User Preference Research

Collect preferences from actual users to understand what they value. Build prompts that produce outputs your users will appreciate.

Continuous Improvement

Periodically run training sessions on production prompts to identify drift and opportunities for improvement over time.

Tips & Best Practices

Pro Tips

Aim for at least 25 ratings before trusting patterns
Use keyboard shortcuts to speed up rating (1/2/T/N)
Be consistent in your rating criteria
Take breaks if ratings become automatic - fresh eyes matter
Rate diverse inputs to cover different scenarios
Review patterns after every 20-30 ratings

Common Pitfalls

Rating too fast: Quick, careless ratings add noise
Inconsistent criteria: Changing what you value mid-session
Too few ratings: Patterns aren't reliable under 10 ratings
Ignoring "Neither": This signal is valuable - use it!