Special-Character Adversarial Attacks: A Forensic Analysis

Goal: This section shows how each attack type works and what forensic traces it leaves.

How the attacker hides the manipulation
Where the forensic evidence lives
How analysts detect and fix it

Overview

This project investigates how character-level adversarial attacks exploit Unicode and encoding vulnerabilities in open-source language models, focusing on forensic detection at the token and encoding layers.

Examples of Hidden Manipulations

Unicode Forensics
Hidden zero-width characters or Cyrillic look-alikes.
Example: Pаssword (first “a” is Cyrillic U+0430).

Encoding Forensics
Example: U29ja2V0IHBhc3N3b3Jk → “socket password”.

Tokenization Forensics
Example: he@@llo splits unpredictably.

Attack Taxonomy

Unicode Control
Invisible chars and direction markers. Forensic: Look for zero-width chars (\u200b).
Detect: cat -v, hexdump.

Homoglyph Confusion
Cross-script letter swaps.

Structural Perturbation
Spacing & punctuation anomalies.

Encoding Obfuscation
Base64 / hex hiding content.

Defense Framework

▼ 1. Pre-tokenization Normalization

Normalize Unicode, strip zero-width characters, detect script mixing.

▼ 2. Encoding Validation

Detect and decode Base64/hex strings safely.

▼ 3. Security-Aware Training

Include adversarial Unicode/encoding samples during training.

▼ 4. Runtime Monitoring

Log token anomalies and spike detection.

Download: Example Poisoned Dataset

This synthetic dataset illustrates Unicode, encoding, tokenization, and labeling anomalies used in the research paper. It is small, safe, and intended only for forensic demonstration.

Download Example Dataset

Conclusion: Forensic Insight

Character-level adversarial attacks leave detectable digital fingerprints. By analyzing Unicode, encoding, and tokenization behavior, forensic analysts can reconstruct how an AI model was manipulated and why.

This unified forensic approach strengthens both AI safety and digital investigation practices.

← Back to Network Forensic Challenge