Special-Character Adversarial Attacks:
A Forensic Analysis
Tracing hidden manipulations in large language models through Unicode, encoding, and tokenization forensics.
Goal: This section shows how each attack type works and what forensic traces it leaves.
- How the attacker hides the manipulation
- Where the forensic evidence lives
- How analysts detect and fix it
Overview
This project investigates how character-level adversarial attacks exploit Unicode and encoding vulnerabilities in open-source language models, focusing on forensic detection at the token and encoding layers.
AI & Forensics Glossary
Learn every technical term used in the research paper.
Academic + simple definitions for faster studying.
Examples of Hidden Manipulations
Unicode Forensics
Hidden zero-width characters or Cyrillic look-alikes.
Example: Pаssword (first “a” is Cyrillic U+0430).
Encoding Forensics
Example: U29ja2V0IHBhc3N3b3Jk → “socket password”.
Tokenization Forensics
Example: he@@llo splits unpredictably.
Attack Taxonomy
Unicode Control
Invisible chars and direction markers.
Forensic: Look for zero-width chars (\u200b).
Detect: cat -v, hexdump.
Homoglyph Confusion
Cross-script letter swaps.
Forensic: Compare Unicode points.
Detect: Script-detection tools.
Structural Perturbation
Spacing & punctuation anomalies.
Forensic: Normalization mismatch.
Detect: Re-tokenization and diffing.
Encoding Obfuscation
Base64 / hex hiding content.
Forensic: Random-looking strings.
Detect: Decode & inspect.
The chart below shows forensic difficulty across attack types.
Category Vulnerabilities
Data from Sarabamoun (2025), arXiv.
Defense Framework
▼ 1. Pre-tokenization Normalization
Normalize Unicode, strip zero-width characters, detect script mixing.
▼ 2. Encoding Validation
Detect and decode Base64/hex strings safely.
▼ 3. Security-Aware Training
Include adversarial Unicode/encoding samples during training.
▼ 4. Runtime Monitoring
Log token anomalies and spike detection.
Download: Example Poisoned Dataset
This synthetic dataset illustrates Unicode, encoding, tokenization, and labeling anomalies used in the research paper.
It is small, safe, and intended only for forensic demonstration.
Conclusion: Forensic Insight
Character-level adversarial attacks leave detectable digital fingerprints.
By analyzing Unicode, encoding, and tokenization behavior, forensic analysts can reconstruct how an AI model was manipulated and why.
This unified forensic approach strengthens both AI safety and digital investigation practices.