← Back to PII Engineer

Benchmarks

Head-to-head comparison of PII detection tools on accuracy, multilingual support, latency, and cost. Tested on 1,200 annotated samples across 9 languages.

0.902

Macro F1 Score

Across 9 entity types and 9 languages

50+

Languages

Single model, no language routing

250ms

Avg Latency

INT8 on 4-vCPU, no GPU needed

$42

Monthly Cost

Self-hosted on a single VPS

Overall Accuracy (F1 Score)

Evaluated on 500 multilingual PII examples with ground truth annotations. Higher is better.

PII Engineer
Presidio
spaCy
AWS Comprehend
GPT-4
English PII
PII Engineer
0.88
GPT-4
0.85
spaCy + rules
0.83
Presidio
0.80
AWS Comprehend
0.82
Multilingual PII (non-English)
PII Engineer
0.86
GPT-4
0.78
spaCy + rules
0.64
Presidio
0.44
AWS Comprehend
0.52
Structured PII (Phone, ID, Email)
PII Engineer
0.93
Presidio
0.91
GPT-4
0.87
AWS Comprehend
0.75
spaCy
0.52

Accuracy by Language

F1 scores from a 1,200-sample multilingual test set with 9 entity types.

Language PII Engineer Presidio spaCy AWS Comprehend
English0.9310.800.830.82
Chinese0.9180.310.710.68
Vietnamese0.9120.280.420.55
Malay0.8950.250.380.48
Indonesian0.9010.300.610.58
Tamil0.8780.150.350.40
Thai0.8850.220.520.55
Hindi0.8920.200.580.62
Korean0.9050.180.650.70

Presidio scores reflect default recognizers without custom per-locale rules. spaCy uses the best available model per language.

Per-Entity Accuracy

Entity Type F1 Precision Recall
email_address0.9700.980.96
phone_number0.9680.970.96
government_id0.9200.940.90
bank_account_number0.9150.930.90
street_address0.8910.900.88
date_of_birth0.8870.910.87
passport_number0.8800.900.86
license_plate0.8330.850.82
person_name0.8230.840.81

Evaluated on PII Engineer v1.3 with INT8 encoder. 8-stage post-processing pipeline improves raw F1 from 0.779 to 0.902.

Latency

Tested on a 4-vCPU AMD cloud instance, no GPU. Input: 50-word text with mixed PII.

System p50 p99 RAM GPU Required
Presidio (regex only)3ms12ms200MBNo
Presidio + spaCy80ms250ms1.8GBNo
spaCy (transformer)120ms350ms1.5GBOptional
PII Engineer (INT8)180ms400ms700MBNo
AWS Comprehend200ms800msN/AN/A (managed)
GPT-41500ms4000msN/AN/A (managed)

Presidio regex-only mode misses person names and addresses. With spaCy backend, latency approaches PII Engineer's. GPT-4 requires API calls with per-token billing.

Cost Comparison

System Monthly Cost At 1M requests/mo Self-Hosted
PII Engineer$42 (VPS)$42Yes
Presidio$42 (VPS)$42Yes
spaCy$42 (VPS)$42Yes
AWS ComprehendPay-per-use~$1,000No
Google DLPPay-per-use~$1,500No
GPT-4Pay-per-token~$3,000+No

Self-hosted costs assume a 4-vCPU AMD VPS at $42/month. Managed service costs vary by region and volume.

Feature Comparison

Feature PII Engineer Presidio spaCy AWS Comprehend
Languages (single model)50+~10 locales1 per model12
PII-specific labelsYes (30+ types)YesNo (generic NER)Yes
GPU requiredNoNoOptionalN/A
Self-hostedYesYesYesNo
Single binary deployYes (Rust)No (Python)No (Python)N/A
REST API includedYesOptionalNo (library)Yes
Open sourceApache-2.0MITMITNo
Model size (all langs)620MB500MB+ (with NER)2GB+ (5 langs)N/A
Add new languageAlready coveredWrite recognizersTrain new modelNot possible
Maintenance effortLowHigh (per locale)MediumNone (managed)

Methodology

All benchmarks were conducted on a standardized test set of 1,200 manually annotated samples across 9 languages and 9 PII entity types. The dataset covers real-world text patterns including:

Each system was tested with default configurations unless noted. Presidio was tested with built-in recognizers (no custom rules). spaCy used the best available transformer model per language. AWS Comprehend and GPT-4 were tested via their respective APIs.

All latency measurements were taken on a 4-vCPU AMD Premium instance (DigitalOcean, SGP1 region) with input texts averaging 50 words.

Try PII Engineer

Open source, self-hosted, no GPU required. Run it locally in 60 seconds.

cargo build --release && cargo run --release

GitHub  ·  Live Demo  ·  API Docs