Head-to-head comparison of PII detection tools on accuracy, multilingual support, latency, and cost. Tested on 1,200 annotated samples across 9 languages.
Across 9 entity types and 9 languages
Single model, no language routing
INT8 on 4-vCPU, no GPU needed
Self-hosted on a single VPS
Evaluated on 500 multilingual PII examples with ground truth annotations. Higher is better.
F1 scores from a 1,200-sample multilingual test set with 9 entity types.
| Language | PII Engineer | Presidio | spaCy | AWS Comprehend |
|---|---|---|---|---|
| English | 0.931 | 0.80 | 0.83 | 0.82 |
| Chinese | 0.918 | 0.31 | 0.71 | 0.68 |
| Vietnamese | 0.912 | 0.28 | 0.42 | 0.55 |
| Malay | 0.895 | 0.25 | 0.38 | 0.48 |
| Indonesian | 0.901 | 0.30 | 0.61 | 0.58 |
| Tamil | 0.878 | 0.15 | 0.35 | 0.40 |
| Thai | 0.885 | 0.22 | 0.52 | 0.55 |
| Hindi | 0.892 | 0.20 | 0.58 | 0.62 |
| Korean | 0.905 | 0.18 | 0.65 | 0.70 |
Presidio scores reflect default recognizers without custom per-locale rules. spaCy uses the best available model per language.
| Entity Type | F1 | Precision | Recall |
|---|---|---|---|
| email_address | 0.970 | 0.98 | 0.96 |
| phone_number | 0.968 | 0.97 | 0.96 |
| government_id | 0.920 | 0.94 | 0.90 |
| bank_account_number | 0.915 | 0.93 | 0.90 |
| street_address | 0.891 | 0.90 | 0.88 |
| date_of_birth | 0.887 | 0.91 | 0.87 |
| passport_number | 0.880 | 0.90 | 0.86 |
| license_plate | 0.833 | 0.85 | 0.82 |
| person_name | 0.823 | 0.84 | 0.81 |
Evaluated on PII Engineer v1.3 with INT8 encoder. 8-stage post-processing pipeline improves raw F1 from 0.779 to 0.902.
Tested on a 4-vCPU AMD cloud instance, no GPU. Input: 50-word text with mixed PII.
| System | p50 | p99 | RAM | GPU Required |
|---|---|---|---|---|
| Presidio (regex only) | 3ms | 12ms | 200MB | No |
| Presidio + spaCy | 80ms | 250ms | 1.8GB | No |
| spaCy (transformer) | 120ms | 350ms | 1.5GB | Optional |
| PII Engineer (INT8) | 180ms | 400ms | 700MB | No |
| AWS Comprehend | 200ms | 800ms | N/A | N/A (managed) |
| GPT-4 | 1500ms | 4000ms | N/A | N/A (managed) |
Presidio regex-only mode misses person names and addresses. With spaCy backend, latency approaches PII Engineer's. GPT-4 requires API calls with per-token billing.
| System | Monthly Cost | At 1M requests/mo | Self-Hosted |
|---|---|---|---|
| PII Engineer | $42 (VPS) | $42 | Yes |
| Presidio | $42 (VPS) | $42 | Yes |
| spaCy | $42 (VPS) | $42 | Yes |
| AWS Comprehend | Pay-per-use | ~$1,000 | No |
| Google DLP | Pay-per-use | ~$1,500 | No |
| GPT-4 | Pay-per-token | ~$3,000+ | No |
Self-hosted costs assume a 4-vCPU AMD VPS at $42/month. Managed service costs vary by region and volume.
| Feature | PII Engineer | Presidio | spaCy | AWS Comprehend |
|---|---|---|---|---|
| Languages (single model) | 50+ | ~10 locales | 1 per model | 12 |
| PII-specific labels | Yes (30+ types) | Yes | No (generic NER) | Yes |
| GPU required | No | No | Optional | N/A |
| Self-hosted | Yes | Yes | Yes | No |
| Single binary deploy | Yes (Rust) | No (Python) | No (Python) | N/A |
| REST API included | Yes | Optional | No (library) | Yes |
| Open source | Apache-2.0 | MIT | MIT | No |
| Model size (all langs) | 620MB | 500MB+ (with NER) | 2GB+ (5 langs) | N/A |
| Add new language | Already covered | Write recognizers | Train new model | Not possible |
| Maintenance effort | Low | High (per locale) | Medium | None (managed) |
All benchmarks were conducted on a standardized test set of 1,200 manually annotated samples across 9 languages and 9 PII entity types. The dataset covers real-world text patterns including:
Each system was tested with default configurations unless noted. Presidio was tested with built-in recognizers (no custom rules). spaCy used the best available transformer model per language. AWS Comprehend and GPT-4 were tested via their respective APIs.
All latency measurements were taken on a 4-vCPU AMD Premium instance (DigitalOcean, SGP1 region) with input texts averaging 50 words.