Case study
BarkScan
A search engine for internet-connected devices — built to scan, fingerprint, and index the live internet on a continuous cycle.
Visit BarkScan →Case study
A search engine for internet-connected devices — built to scan, fingerprint, and index the live internet on a continuous cycle.
Visit BarkScan →Internet-wide scanning is a category dominated by a few entrenched players: Shodan, Censys, ZoomEye. They each own significant portions of the data set, and the market has converged on per-query pricing models that make exploratory work cost-prohibitive. For security researchers, threat hunters, and infrastructure teams, getting timely visibility into the global internet — what services are exposed where, what software versions are running, what changes occurred since yesterday — is harder than it should be.
BarkScan was built to address that gap. A new internet-wide scanner with continuous coverage, fast fingerprinting, and a query interface designed for researchers. The hard parts were scanning at scale without becoming a network nuisance, fingerprinting accurately across heterogeneous services, and serving the resulting dataset with sub-second query latency on a budget that did not require enterprise pricing.
The scanner is a Go-based distributed system that performs internet-wide TCP probes followed by service-specific protocol fingerprinting. The probe layer is rate-limited per source IP, per destination AS, and globally — both to be a good network citizen and to avoid triggering automated blocking. Fingerprinting is layered: cheap protocol detection first, expensive deep banner grabs only when justified by what the cheap layer found.
Storage and query are the parts that benefit most from architectural care. Banner data and historical scans live in object storage (Cloudflare R2) for durability and cost, while the queryable fingerprint dataset lives in ClickHouse — which handles the high-cardinality, time-series, full-text-ish queries that internet scan data demands. Postgres serves the metadata and user accounts; the heavy lifting is in ClickHouse.
Continuous scanning is the operational hard part. The internet is large; one full pass takes time even with parallelism, and you have to decide what to re-scan more often versus what to leave for the next full cycle. We split the scanning workload by importance: core infrastructure ports get re-scanned weekly, esoteric ports monthly. Change detection runs continuously against the existing fingerprint dataset, so newly-exposed services surface within hours.
BarkScan provides continuous internet-wide scan coverage with a query interface designed for researchers and security teams. The platform handles the operational realities of internet-wide scanning — rate limiting, courteous probing, distributed scanner pools — that determine whether a scanner can sustain coverage long-term or gets blackholed by upstream networks.
Query latency stays low (single-digit milliseconds for the common patterns) because the fingerprint dataset is structured for ClickHouse from the ingestion side, not retrofit afterwards. Banner archives in object storage stay cheap to retain indefinitely without inflating the queryable dataset.
The scan-and-index pipeline composes well: new fingerprint rules can be deployed without re-scanning, change detection runs continuously, and the queryable surface area can be extended without invalidating prior data.
Email us a paragraph about what you are building. We respond within one business day.
[email protected]