prompt-guard

Pattern library

27 attack patterns across 8 categories, version 2026.06.04. This is the curated core that ships with every plan and updates regularly — not a static keyword list.

Instruction override

  • highio.ignore-previousAttempt to ignore or override prior/system instructions
  • highio.forget-everythingReset/forget-everything override
  • mediumio.new-instructionsClaims a new/updated set of instructions supersedes the system prompt
  • highio.disregard-policyDisregard safety policy / guidelines

Role / jailbreak personas

  • criticalrj.danDAN / 'do anything now' jailbreak persona
  • highrj.developer-mode'Developer mode' jailbreak
  • highrj.named-jailbreaksKnown jailbreak persona names (STAN, DUDE, BetterDAN, etc.)
  • highrj.act-as-unrestrictedRoleplay as an unrestricted / unfiltered AI
  • highrj.no-restrictions-aiAsserts the AI now has no restrictions/filters
  • mediumrj.jailbreak-wordExplicit jailbreak request

System-prompt exfiltration

  • highspe.reveal-system-promptRequest to reveal / repeat the system prompt or hidden instructions
  • mediumspe.verbatimAsks to output instructions verbatim / word-for-word
  • mediumspe.everything-aboveAsks to print everything above the current message

Delimiter injection

  • highdi.fake-system-tagInjected system/role delimiter tokens
  • highdi.begin-system-promptFabricated 'BEGIN/END SYSTEM PROMPT' framing
  • mediumdi.role-impersonationImpersonates a system/admin/developer speaker

Data exfiltration

  • criticalde.send-to-urlInstructs the model/agent to send data to an external URL
  • highde.curl-fetchEmbedded curl/fetch/webhook exfiltration call
  • highde.markdown-image-exfilMarkdown image used to smuggle data into a URL (zero-click exfil)

Encoding / evasion

  • mediumee.zero-widthZero-width / invisible characters used to hide instructions
  • mediumee.base64-instructionAsks the model to decode and execute base64/hex/rot13 content
  • lowee.long-base64-blobLarge base64 blob (possible hidden payload)

Refusal suppression

  • highrs.do-not-refusePressures the model not to refuse / warn
  • mediumrs.must-complyAsserts the model must comply regardless of policy
  • mediumrs.hypothetical-bypassHypothetical/fiction framing used to bypass safety

Tool / agent hijack

  • highth.invoke-destructive-toolDirects an agent to invoke destructive tools/commands
  • mediumth.override-tool-allowlistAttempts to expand or ignore the agent's tool allow-list