<aside>
Why Prompt Compression Matters
<aside>
- Token limits (e.g., 4096–128k tokens in GPT-4 variants).
- Performance: Compressed prompts can focus attention on key information.
- Cost: Fewer tokens = cheaper API calls.
- Latency: Smaller prompts mean faster inference.
</aside>
</aside>
<aside>
Prompt Compression Techniques
<aside>
1. Semantic Summarization.
- Reduce a long prompt by summarizing its key meanings, not just text.
- Tools: LLMs themselves (e.g.,
"Summarize this prompt preserving task instructions").
Example
Original:
Write a friendly email to a customer thanking them for their purchase, offering help, and suggesting related products.
Compressed:
Friendly thank-you email to customer with help + upsell.
</aside>
<aside>
2. Template Abstraction
- Replace verbose descriptions with a reusable template name.
- Often used in codegen and agent systems.
Example
Instead of:
“Write Python code to parse a CSV file and calculate average values of columns”
Use:
#csv_avg_template
Define this template in memory or a context repository.
</aside>
<aside>
3. Keyword Tokenization
- Use keywords, variables, or shorthand to capture structure.
Example
"task:email | tone:friendly | action:thank, help, upsell”
This format can be interpreted by a parser or the model directly with training.
</aside>
<aside>
4. Latent Prompt Embedding (Advanced)
- Use embedding vectors as prompts or "soft tokens" (a technique used in prompt tuning).
- Available in some fine-tuning setups (e.g., OpenAI’s “custom GPTs”).
Not directly accessible in vanilla ChatGPT but available via API-level or research integrations.
</aside>
<aside>
5. Reflexive Prompting
- Let the model rewrite its own prompt more concisely.
Example
“Rewrite this prompt using fewer tokens but preserve its functionality: [prompt]”
This works surprisingly well and can be iterated multiple times.
</aside>
<aside>
6. Modular Prompting
- Split long prompts into chunks/modules, referencing only what’s needed dynamically.
Example Modules:
system_instructions
task_description
context_snippet
Then prompt:
“Use system_instructions + task_description to generate a response using context_snippet.”
</aside>
<aside>
7. Code Tokenization
- Use ASTs, pseudocode, or comments to represent logic succinctly.
Example
Verbose:
“Write a function that loops over a list and prints each value squared.”
Compressed:
def square_print(lst): for x in lst: print(x**2)
</aside>
<aside>
8. Knowledge Distillation for Prompts
- Train a smaller prompt (or student model) to replicate the behavior of a larger prompt (or teacher).
- Effective in fine-tuned workflows.
</aside>
</aside>
<aside>
🛠️
Tools
- LangChain: for prompt templates and compression workflows.
- Promptfoo: test and compare prompt variations.
- OpenAI Functions + Memory: maintain compressed instructions across turns.
</aside>
<aside>
✅Topic Completed!
🌟Great work! You’re one step closer to your goal.
Ready to Move On →
</aside>
<aside>
- [ ] I have revised the topic at least once
- [ ] I want to practice more on this topic
- [ ] I have practiced enough and feel confident
- [ ] I need to revisit this topic later
</aside>