Every cyberattack begins with reconnaissance. Before an adversary sends a phishing email, exploits a vulnerability, or attempts to breach your network, they spend time gathering information about your organization from publicly available sources. This process is known as Open Source Intelligence (OSINT) -- the collection and analysis of information from public, freely accessible data.

The uncomfortable truth is that attackers often know more about your external attack surface than you do. They enumerate your subdomains, find leaked credentials in breach databases, discover API keys committed to public repositories, and map your employees' digital footprints across social media platforms. They do this systematically, using the same tools and techniques available to anyone with an internet connection.

The good news: defenders can use these exact same techniques proactively. By conducting OSINT against your own organization, you can discover what an attacker would find -- and remediate exposures before they are exploited. This article walks through the core OSINT techniques relevant to defensive security and shows you how to build a practical workflow for mapping your organization's attack surface.

The Attack Surface Problem

Modern organizations have sprawling digital footprints that extend far beyond the systems managed by their IT department. Consider what a typical mid-sized company exposes to the public internet:

  • Forgotten subdomains -- old staging servers, decommissioned marketing microsites, test environments that were never properly shut down
  • Leaked credentials -- employee email and password combinations appearing in third-party data breaches
  • Exposed APIs and services -- internal tools accidentally reachable from the internet, admin panels without proper authentication
  • Code in public repositories -- developers pushing internal configurations, API keys, or infrastructure details to public GitHub repositories
  • Employee information on social media -- organizational charts, technology stacks, internal processes revealed through LinkedIn posts, conference talks, and job postings
  • Cloud storage misconfigurations -- publicly readable S3 buckets, Azure Blob containers, or Google Cloud Storage objects

Most organizations lack a complete inventory of these exposures. Traditional vulnerability scanners only cover known assets. OSINT fills the gap by discovering unknown unknowns -- the assets and exposures you did not know existed.

OSINT Techniques for Defense

Domain and DNS Reconnaissance

The first step in any OSINT engagement is mapping the target's domain infrastructure. Subdomain enumeration reveals the full scope of an organization's web-facing assets, often uncovering systems that are not listed in any internal asset inventory.

Certificate Transparency (CT) logs are one of the most reliable sources for subdomain discovery. Every publicly trusted SSL/TLS certificate is recorded in CT logs, and services like crt.sh allow you to query all certificates ever issued for a given domain. This frequently reveals staging environments (staging.example.com), internal tools (jira.example.com), and legacy systems (old-app.example.com) that administrators may have forgotten about.

Beyond CT logs, DNS brute-forcing and permutation scanning can identify subdomains that do not appear in any certificate. WHOIS history reveals past registrant information, hosting changes, and associated domains. Reverse DNS lookups on known IP ranges can uncover services running on non-standard hostnames.

Key tools: subfinder -- fast passive subdomain enumeration using dozens of sources | amass -- comprehensive attack surface mapping with DNS, scraping, and API integrations | crt.sh -- free Certificate Transparency log search

Email and Credential Exposure

Credential stuffing attacks remain one of the most effective initial access vectors, and they rely on leaked credentials from third-party data breaches. If an employee reuses their corporate email and password on a service that later gets breached, that credential may appear in publicly traded breach databases.

Have I Been Pwned (HIBP) maintains the largest publicly accessible database of breached credentials. Organizations can use the HIBP API to check whether any of their corporate email addresses appear in known breaches. The results indicate which breaches contain the address and what data types were exposed (passwords, phone numbers, physical addresses, etc.).

Beyond breach databases, tools like Holehe can determine which online services a given email address is registered on. This reveals the third-party attack surface -- services where employees may have accounts and where a compromise could lead to lateral access or social engineering opportunities.

Google dorking (using advanced search operators) can also reveal email addresses, login pages, and credential-related exposures. Queries like site:example.com filetype:pdf or "@example.com" password can surface documents and pages that inadvertently expose sensitive information.

Key tools: HIBP API -- check emails against known breaches | Holehe -- identify registered services for an email | Google dorking -- advanced search operators for targeted discovery

Code and Repository Leaks

Developers accidentally push secrets to public repositories with alarming frequency. API keys, database connection strings, cloud credentials, internal URLs, and configuration files regularly appear in public GitHub, GitLab, and Bitbucket repositories. Even if a secret is later deleted from a repository, it often remains in the Git history.

Effective repository scanning involves two approaches. First, search for your organization's domain name, internal hostnames, and known project names across public code hosting platforms using GitHub search operators such as org:yourcompany password, "example.com" api_key, or filename:.env "DB_PASSWORD". Second, run secret-scanning tools against your own repositories to identify any credentials that may have been committed.

This extends beyond your organization's official repositories. Employees may have personal GitHub accounts where they have pushed work-related code, contract developers may have forked internal projects, and former employees may retain copies of proprietary code.

Key tools: GitLeaks -- scan Git repositories for secrets and credentials | TruffleHog -- deep credential scanning across Git history | GitHub search operators -- manual and targeted code discovery

Social Media and Employee OSINT

People are often the most informative -- and most overlooked -- component of an organization's attack surface. Employee profiles on LinkedIn reveal organizational structure, technology stacks, ongoing projects, and internal tooling. Job postings disclose what software and frameworks the organization uses. Conference presentations and blog posts by employees can expose internal architecture details.

Username enumeration tools can map an individual's presence across hundreds of platforms, revealing personal accounts that may be linked to corporate identities. Metadata in shared documents (PDFs, images, office files) can expose internal usernames, software versions, file paths, and even GPS coordinates.

For defenders, the goal is not to invade employee privacy but to understand what information is publicly accessible and could be used in a targeted social engineering attack. An attacker who knows an employee's role, recent projects, personal interests, and social media presence can craft a highly convincing spear phishing email.

Key tools: Sherlock -- username enumeration across 400+ platforms | WhatsMyName -- cross-platform username search | Metadata extraction -- exiftool, FOCA for document metadata

Infrastructure Exposure

Internet-wide scanners like Shodan and Censys continuously index every publicly reachable IP address, cataloging open ports, running services, SSL certificates, HTTP headers, and banner information. Searching these databases for your organization's IP ranges, domain names, or SSL certificate details reveals exactly what an attacker can see.

Common findings include: databases exposed to the internet without authentication (MongoDB, Elasticsearch, Redis), administrative interfaces accessible without VPN (phpMyAdmin, Jenkins, Grafana), development services left running on production servers, outdated software versions with known vulnerabilities, and default credentials on network devices.

Cloud storage enumeration is another critical area. Misconfigured S3 buckets, Azure containers, and GCP storage objects are routinely discovered containing sensitive data. Automated tools can brute-force common bucket names based on your organization's name and known naming conventions.

Key tools: Shodan -- search engine for internet-connected devices | Censys -- internet-wide scanning and asset discovery | Cloud bucket enumeration tools -- cloud_enum, S3Scanner

Dark Web Monitoring

Beyond the surface web, breached data, stolen credentials, and organizational intelligence are regularly traded on dark web forums, paste sites, and Telegram channels. Monitoring these sources provides early warning of compromised credentials, planned attacks, or data leaks that have not yet been publicly reported.

Dark web monitoring typically involves searching for your organization's domain in credential dumps, monitoring for mentions of your company name or key personnel, and watching for leaked internal documents. While some of this can be done manually using Tor, most organizations benefit from automated monitoring services that continuously scan these sources and alert on new findings.

Building an OSINT Workflow

Effective OSINT is not a one-time exercise. It requires a structured, repeatable workflow that can be run regularly to detect new exposures as they appear.

1
Define Scope Identify all primary and secondary domains, corporate email patterns (e.g., firstname.lastname@company.com), IP ranges, key personnel (executives, IT staff), brand names, and any known cloud infrastructure. Document this as your target scope.
2
Automated Enumeration Run subdomain discovery, credential breach checks, repository scanning, and infrastructure searches across all in-scope assets. Automate this with scripts that can be scheduled to run weekly or monthly.
3
Manual Investigation Review automated findings for false positives and dig deeper into interesting results. Verify that discovered subdomains are actually owned by your organization. Investigate the context of credential leaks. Confirm whether exposed services are intentionally public.
4
Risk Assessment Prioritize findings by impact and exploitability. An exposed database with customer data is critical. An old blog subdomain with no sensitive content is low priority. Map findings to your risk framework and assign severity levels.
5
Remediation Address findings systematically: rotate leaked credentials, decommission forgotten servers, remove secrets from repositories (and Git history), restrict access to internal services, and update DNS records for decommissioned assets.
6
Continuous Monitoring Establish ongoing monitoring for new credential leaks, new subdomains, new code commits, and new infrastructure exposures. Integrate OSINT feeds into your SIEM or alerting system. Re-run the full workflow quarterly at minimum.

Real-World Example: What OSINT Reveals

Scenario: "Company X" OSINT Assessment

A mid-sized Austrian software company engaged us for a proactive OSINT assessment. Here is a representative composite of what such an engagement typically uncovers:

Subdomain enumeration using Certificate Transparency logs and passive DNS revealed 47 subdomains. Among them was staging-api.companyx.at -- a staging server running an outdated version of a REST API framework with known remote code execution vulnerabilities. The server had been provisioned two years earlier for a client demo and was never decommissioned. It had no WAF, no access restrictions, and was running with debug mode enabled, exposing detailed stack traces to anyone who triggered an error.

Credential breach analysis through HIBP found that 12 corporate email addresses appeared across three separate data breaches (a compromised SaaS platform, a breached forum, and a leaked marketing database). Four of these included plaintext or weakly hashed passwords. Cross-referencing with Holehe showed that several of these email addresses were registered on services that did not enforce multi-factor authentication.

Repository scanning on GitHub revealed that a former developer's personal account contained a forked repository with a .env file that included a valid API key for the company's payment processing gateway and an internal database connection string. The repository had been public for eight months.

Impact: The staging server was immediately taken offline. All breached credentials were force-reset and MFA was mandated. The exposed API key was revoked and the payment provider was notified. The former developer was contacted and the repository was made private. Total time to discover these issues through OSINT: approximately four hours.

This scenario is not unusual. In our experience, most organizations have at least one forgotten server, several breached credentials, and some form of code or configuration leak in public repositories. The difference between a secure organization and a breached one is often whether someone looked for these exposures before an attacker did.

Free Tools You Can Use Today

You do not need a professional engagement to begin mapping your attack surface. A.KHAT provides several free tools on this website that cover key OSINT use cases:

  • Email Leak Checker -- Check if an email address appears in known data breaches. Uses the HIBP database to identify credential exposures.
  • DNS Lookup -- Query DNS records for any domain. Discover MX records, TXT records (including SPF/DKIM/DMARC), and subdomains.
  • WHOIS Lookup -- View domain registration details, registrar history, and contact information for any domain.
  • Dark Web Monitor -- Check if your domain or email address appears in dark web credential dumps and breach databases.
  • SSL/TLS Checker -- Analyze the SSL/TLS configuration of any server. Identify certificate issues, weak cipher suites, and protocol vulnerabilities.
  • Technology Detector -- Identify the technologies, frameworks, and services running on any website.
  • Security Scorecard -- Get a comprehensive security assessment of any domain, including header analysis, SSL rating, and configuration checks.
  • URL Scanner -- Analyze URLs for potential phishing indicators, malicious redirects, and suspicious content.

These tools provide a starting point for understanding your organization's exposure. For a comprehensive assessment that combines automated scanning with expert analysis, consider a professional OSINT engagement.

Conclusion

OSINT is not just for attackers. The same techniques, tools, and data sources that adversaries use during reconnaissance are available to defenders. The difference is timing: if you find your exposures first, you can fix them before they are exploited.

The key principles to remember:

  • Assume everything is findable. If a system is on the internet, someone will discover it. If a credential is in a breach database, someone will try it. If a secret is in a Git commit, someone will extract it.
  • Automate what you can. Subdomain enumeration, breach monitoring, and repository scanning can all run on a schedule. Invest in automation so that continuous monitoring does not require continuous manual effort.
  • Investigate the gaps. Automated tools miss things. Complement automated scanning with periodic manual investigation, especially for social media exposure, code repository analysis, and dark web monitoring.
  • Make it a habit. Attack surfaces change constantly. New subdomains are created, new breaches are disclosed, new code is pushed to repositories. OSINT is not a one-time project -- it is an ongoing practice.
The bottom line: Organizations that proactively map their own attack surface using OSINT techniques are significantly harder to compromise. You cannot defend what you do not know exists. Start discovering your exposures today.