TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Can we solve AI prompt injection attacks with an indented data format?

1 pointsby alexrusticabout 1 year ago
Hi HN ! I&#x27;m Alex, a tech enthusiast. I have an idea that I can&#x27;t test and that concerns an area in which I am not an expert. I am making this post to find out to what extent this idea is relevant to the state of the art.<p>From what little I know, raw user inputs are not directly submitted to LLMs. Typically, user input is carefully wrapped in a special format before being sent to the LLM. The format usually has tags, including special tags to tell the AI, for example, which topic is prohibited.<p>As with SQL injection, an attacker can craft malicious user input by introducing special tags. Input sanitization can be seen as a solution, but it seems that it isn&#x27;t enough. Anyway, it doesn&#x27;t seem very intuitive, I think a document intended to be read by an LLM should also be very human-readable. I also wonder what happens when an attacker uses obscure Unicode characters to forge a string that looks like a special tag.<p>Instead of using an XML-like language, my idea is to use a format that seamlessly interweave human-readable structured data with prose within a single document. Also, the format must natively support indentation to remove the need for input sanitization, thereby eliminating an entire class of injection attacks.<p>I am the author of Braq, a data format that seems to be a good candidate.<p>The idea to better structure a prompt is described in this Markdown section: https:&#x2F;&#x2F;github.com&#x2F;pyrustic&#x2F;braq?tab=readme-ov-file#ai-prompts<p>And here, ChatML from OpenAI: https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34988748<p>As mentioned above, I can&#x27;t test this idea. Therefore, I&#x27;m asking to you: Can we solve AI prompt injection attacks with an indented data format ?

2 comments

alexrusticabout 1 year ago
The backspace escape character (<a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;6792812&#x2F;the-backspace-escape-character-b-unexpected-behavior" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;6792812&#x2F;the-backspace-es...</a>) might be a good candidate for successfully creating a valid section in a document.<p>In a ChatML document, this character can also help destroy the closing tag of an instruction node.<p>But this can only work if the escape character is actually &#x27;executed&#x27;.
wmfabout 1 year ago
I don&#x27;t understand how indentation can remove the need for input sanitization since the input can definitely include brackets, spaces, tabs, and newline characters.<p>You might be able to test this by fine-tuning a local LLM to understand your format then breaking it.
评论 #39721345 未加载