What the sliders actually do
Model parameters change how the AI chooses the next words. They do not rewrite your scenario, repair weak memory, or make every model behave the same. They shape pacing, variation, repetition, and how adventurous the prose feels.
The main controls
These settings work together. Think of them as steering the model's word choice, not as quality switches.
Temperature
Controls how bold or predictable the writing feels.
- Too low can feel dry or stuck.
- Too high can invent facts or lose who is in the scene.
Top P
Limits the pool of possible words before the model chooses one.
- Top P and Temperature both affect variety.
- If both are high, the model may become unstable.
Max Tokens
Sets how much the model may write in one reply.
- Long replies can stall if the model narrates around the point.
- Short replies can feel abrupt if the scene needs atmosphere.
Presence Penalty
Encourages the model to introduce new details instead of staying on the same idea.
- Too high can make the AI throw in random new elements.
- Use gently for continuity-heavy stories.
Frequency Penalty
Discourages repeated words and phrases.
- Too high can make prose sound strained.
- Better for style cleanup than story direction.
Repetition Penalty
A stronger anti-loop control used by some NanoGPT models.
- Some models need it. Others get worse with it.
- Raise slowly, especially on prose-heavy models.
Top K
Caps how many word options are considered.
- Leave it alone unless a model feels noisy.
- Too low can make every reply feel similar.
Min P
Filters out very unlikely word choices.
- Useful when a model collapses into strange wording.
- Too high can flatten creativity.
Why one model hates another model's settings
Each model has its own training style, default sampling behavior, context handling, and tendency toward repetition. Parameters amplify those traits. They do not affect every model evenly.
A model built for banter may work beautifully at higher Temperature, but the same setting on a lore-heavy model can make it invent family history or jump to the wrong character.
A model that handles huge context may need lower Temperature and fewer novelty penalties. It already has a lot to track, so pushing it to add more can scatter the scene.
A model that repeats itself may improve with Repetition Penalty or Frequency Penalty. A cleaner model can become awkward if those same penalties are too high.
A prose-heavy model often benefits from more tokens and moderate variety. If replies become all atmosphere and no movement, reduce length or raise Presence Penalty slightly.
Roleplay tuning examples
Slow-burn romance
- Temperature: moderate to high
- Top P: high enough for emotional nuance
- Presence Penalty: low to moderate
- Max Tokens: medium or long
Investigation or mystery
- Temperature: lower to moderate
- Top P: moderate
- Presence Penalty: low
- Max Tokens: medium
Combat or chase scene
- Temperature: moderate
- Top P: moderate
- Presence Penalty: moderate
- Max Tokens: short or medium
Loop repair
- Lower Temperature slightly
- Raise Frequency Penalty gently
- Raise Repetition Penalty gently
- Tell the scene to cut to a new action