Every cyberattack begins with reconnaissance. Before an adversary sends a phishing email, exploits a vulnerability, or attempts to breach your network, they spend time gathering information about your organization from publicly available sources. This process is known as Open Source Intelligence (OSINT) -- the collection and analysis of information from public, freely accessible data.
The uncomfortable truth is that attackers often know more about your external attack surface than you do. They enumerate your subdomains, find leaked credentials in breach databases, discover API keys committed to public repositories, and map your employees' digital footprints across social media platforms. They do this systematically, using the same tools and techniques available to anyone with an internet connection.
The good news: defenders can use these exact same techniques proactively. By conducting OSINT against your own organization, you can discover what an attacker would find -- and remediate exposures before they are exploited. This article walks through the core OSINT techniques relevant to defensive security and shows you how to build a practical workflow for mapping your organization's attack surface.
The Attack Surface Problem
Modern organizations have sprawling digital footprints that extend far beyond the systems managed by their IT department. Consider what a typical mid-sized company exposes to the public internet:
- Forgotten subdomains -- old staging servers, decommissioned marketing microsites, test environments that were never properly shut down
- Leaked credentials -- employee email and password combinations appearing in third-party data breaches
- Exposed APIs and services -- internal tools accidentally reachable from the internet, admin panels without proper authentication
- Code in public repositories -- developers pushing internal configurations, API keys, or infrastructure details to public GitHub repositories
- Employee information on social media -- organizational charts, technology stacks, internal processes revealed through LinkedIn posts, conference talks, and job postings
- Cloud storage misconfigurations -- publicly readable S3 buckets, Azure Blob containers, or Google Cloud Storage objects
Most organizations lack a complete inventory of these exposures. Traditional vulnerability scanners only cover known assets. OSINT fills the gap by discovering unknown unknowns -- the assets and exposures you did not know existed.
OSINT Techniques for Defense
Domain and DNS Reconnaissance
The first step in any OSINT engagement is mapping the target's domain infrastructure. Subdomain enumeration reveals the full scope of an organization's web-facing assets, often uncovering systems that are not listed in any internal asset inventory.
Certificate Transparency (CT) logs are one of the most reliable sources for subdomain discovery. Every publicly trusted SSL/TLS certificate is recorded in CT logs, and services like crt.sh allow you to query all certificates ever issued for a given domain. This frequently reveals staging environments (staging.example.com), internal tools (jira.example.com), and legacy systems (old-app.example.com) that administrators may have forgotten about.
Beyond CT logs, DNS brute-forcing and permutation scanning can identify subdomains that do not appear in any certificate. WHOIS history reveals past registrant information, hosting changes, and associated domains. Reverse DNS lookups on known IP ranges can uncover services running on non-standard hostnames.
subfinder -- fast passive subdomain enumeration using dozens of sources |
amass -- comprehensive attack surface mapping with DNS, scraping, and API integrations |
crt.sh -- free Certificate Transparency log search
Email and Credential Exposure
Credential stuffing attacks remain one of the most effective initial access vectors, and they rely on leaked credentials from third-party data breaches. If an employee reuses their corporate email and password on a service that later gets breached, that credential may appear in publicly traded breach databases.
Have I Been Pwned (HIBP) maintains the largest publicly accessible database of breached credentials. Organizations can use the HIBP API to check whether any of their corporate email addresses appear in known breaches. The results indicate which breaches contain the address and what data types were exposed (passwords, phone numbers, physical addresses, etc.).
Beyond breach databases, tools like Holehe can determine which online services a given email address is registered on. This reveals the third-party attack surface -- services where employees may have accounts and where a compromise could lead to lateral access or social engineering opportunities.
Google dorking (using advanced search operators) can also reveal email addresses, login pages, and credential-related exposures. Queries like site:example.com filetype:pdf or "@example.com" password can surface documents and pages that inadvertently expose sensitive information.
HIBP API -- check emails against known breaches |
Holehe -- identify registered services for an email |
Google dorking -- advanced search operators for targeted discovery
Code and Repository Leaks
Developers accidentally push secrets to public repositories with alarming frequency. API keys, database connection strings, cloud credentials, internal URLs, and configuration files regularly appear in public GitHub, GitLab, and Bitbucket repositories. Even if a secret is later deleted from a repository, it often remains in the Git history.
Effective repository scanning involves two approaches. First, search for your organization's domain name, internal hostnames, and known project names across public code hosting platforms using GitHub search operators such as org:yourcompany password, "example.com" api_key, or filename:.env "DB_PASSWORD". Second, run secret-scanning tools against your own repositories to identify any credentials that may have been committed.
This extends beyond your organization's official repositories. Employees may have personal GitHub accounts where they have pushed work-related code, contract developers may have forked internal projects, and former employees may retain copies of proprietary code.
GitLeaks -- scan Git repositories for secrets and credentials |
TruffleHog -- deep credential scanning across Git history |
GitHub search operators -- manual and targeted code discovery
Social Media and Employee OSINT
People are often the most informative -- and most overlooked -- component of an organization's attack surface. Employee profiles on LinkedIn reveal organizational structure, technology stacks, ongoing projects, and internal tooling. Job postings disclose what software and frameworks the organization uses. Conference presentations and blog posts by employees can expose internal architecture details.
Username enumeration tools can map an individual's presence across hundreds of platforms, revealing personal accounts that may be linked to corporate identities. Metadata in shared documents (PDFs, images, office files) can expose internal usernames, software versions, file paths, and even GPS coordinates.
For defenders, the goal is not to invade employee privacy but to understand what information is publicly accessible and could be used in a targeted social engineering attack. An attacker who knows an employee's role, recent projects, personal interests, and social media presence can craft a highly convincing spear phishing email.
Sherlock -- username enumeration across 400+ platforms |
WhatsMyName -- cross-platform username search |
Metadata extraction -- exiftool, FOCA for document metadata
Infrastructure Exposure
Internet-wide scanners like Shodan and Censys continuously index every publicly reachable IP address, cataloging open ports, running services, SSL certificates, HTTP headers, and banner information. Searching these databases for your organization's IP ranges, domain names, or SSL certificate details reveals exactly what an attacker can see.
Common findings include: databases exposed to the internet without authentication (MongoDB, Elasticsearch, Redis), administrative interfaces accessible without VPN (phpMyAdmin, Jenkins, Grafana), development services left running on production servers, outdated software versions with known vulnerabilities, and default credentials on network devices.
Cloud storage enumeration is another critical area. Misconfigured S3 buckets, Azure containers, and GCP storage objects are routinely discovered containing sensitive data. Automated tools can brute-force common bucket names based on your organization's name and known naming conventions.
Shodan -- search engine for internet-connected devices |
Censys -- internet-wide scanning and asset discovery |
Cloud bucket enumeration tools -- cloud_enum, S3Scanner
Dark Web Monitoring
Beyond the surface web, breached data, stolen credentials, and organizational intelligence are regularly traded on dark web forums, paste sites, and Telegram channels. Monitoring these sources provides early warning of compromised credentials, planned attacks, or data leaks that have not yet been publicly reported.
Dark web monitoring typically involves searching for your organization's domain in credential dumps, monitoring for mentions of your company name or key personnel, and watching for leaked internal documents. While some of this can be done manually using Tor, most organizations benefit from automated monitoring services that continuously scan these sources and alert on new findings.
Building an OSINT Workflow
Effective OSINT is not a one-time exercise. It requires a structured, repeatable workflow that can be run regularly to detect new exposures as they appear.
Real-World Example: What OSINT Reveals
Scenario: "Company X" OSINT Assessment
A mid-sized Austrian software company engaged us for a proactive OSINT assessment. Here is a representative composite of what such an engagement typically uncovers:
Subdomain enumeration using Certificate Transparency logs and passive DNS revealed 47 subdomains. Among them was staging-api.companyx.at -- a staging server running an outdated version of a REST API framework with known remote code execution vulnerabilities. The server had been provisioned two years earlier for a client demo and was never decommissioned. It had no WAF, no access restrictions, and was running with debug mode enabled, exposing detailed stack traces to anyone who triggered an error.
Credential breach analysis through HIBP found that 12 corporate email addresses appeared across three separate data breaches (a compromised SaaS platform, a breached forum, and a leaked marketing database). Four of these included plaintext or weakly hashed passwords. Cross-referencing with Holehe showed that several of these email addresses were registered on services that did not enforce multi-factor authentication.
Repository scanning on GitHub revealed that a former developer's personal account contained a forked repository with a .env file that included a valid API key for the company's payment processing gateway and an internal database connection string. The repository had been public for eight months.
Impact: The staging server was immediately taken offline. All breached credentials were force-reset and MFA was mandated. The exposed API key was revoked and the payment provider was notified. The former developer was contacted and the repository was made private. Total time to discover these issues through OSINT: approximately four hours.
This scenario is not unusual. In our experience, most organizations have at least one forgotten server, several breached credentials, and some form of code or configuration leak in public repositories. The difference between a secure organization and a breached one is often whether someone looked for these exposures before an attacker did.
Free Tools You Can Use Today
You do not need a professional engagement to begin mapping your attack surface. A.KHAT provides several free tools on this website that cover key OSINT use cases:
- Email Leak Checker -- Check if an email address appears in known data breaches. Uses the HIBP database to identify credential exposures.
- DNS Lookup -- Query DNS records for any domain. Discover MX records, TXT records (including SPF/DKIM/DMARC), and subdomains.
- WHOIS Lookup -- View domain registration details, registrar history, and contact information for any domain.
- Dark Web Monitor -- Check if your domain or email address appears in dark web credential dumps and breach databases.
- SSL/TLS Checker -- Analyze the SSL/TLS configuration of any server. Identify certificate issues, weak cipher suites, and protocol vulnerabilities.
- Technology Detector -- Identify the technologies, frameworks, and services running on any website.
- Security Scorecard -- Get a comprehensive security assessment of any domain, including header analysis, SSL rating, and configuration checks.
- URL Scanner -- Analyze URLs for potential phishing indicators, malicious redirects, and suspicious content.
These tools provide a starting point for understanding your organization's exposure. For a comprehensive assessment that combines automated scanning with expert analysis, consider a professional OSINT engagement.
Conclusion
OSINT is not just for attackers. The same techniques, tools, and data sources that adversaries use during reconnaissance are available to defenders. The difference is timing: if you find your exposures first, you can fix them before they are exploited.
The key principles to remember:
- Assume everything is findable. If a system is on the internet, someone will discover it. If a credential is in a breach database, someone will try it. If a secret is in a Git commit, someone will extract it.
- Automate what you can. Subdomain enumeration, breach monitoring, and repository scanning can all run on a schedule. Invest in automation so that continuous monitoring does not require continuous manual effort.
- Investigate the gaps. Automated tools miss things. Complement automated scanning with periodic manual investigation, especially for social media exposure, code repository analysis, and dark web monitoring.
- Make it a habit. Attack surfaces change constantly. New subdomains are created, new breaches are disclosed, new code is pushed to repositories. OSINT is not a one-time project -- it is an ongoing practice.