The External Recon Braindump

index

This blog was written as a writing sample when applying for jobs a while ago. I figured I might as well post it anyway.

Intro

No matter what discipline of offensive security you pursue, recon is everything. It’s always cool to read about flashy new exploits and vulnerability classes, but some of the most critical vulnerabilities are found in the places that people have long since forgotten. Comparing hacking to a military operation or a physical break-in is an overplayed trope, but when it comes to recon, I can’t help but think of an Ocean’s 11-style heist, spending a majority of your time “casing the joint” before you even begin to make plans.

Analogy aside, let’s take a look at one of the most common scenarios for recon in hacking: bug bounty. Programs will often have defined scopes that can clue you in on where you need to look. However, there are also programs where you have a lot more freedom, most commonly with some kind of wildcard like *.company.com. Suppose we’re trying to find a list of endpoints across a company’s public-facing web applications, only knowing the name of the company. Where do you start?

Finding Our Target(s)

Start Simple, Google!

Before you go in with your nmap or your gobuster or whatever other tool you want, you need to figure out what domains you’re going after in the first place, and preferably, make sure the domain you’re going after is owned by the company you’re attacking. Danny Ocean didn’t go headfirst into the closest casino he laid his eyes on; he specifically picked three casinos owned by his friend’s rival.

Before we go right into performing recon on a single company, it’s worth taking the time to understand the corporate structure. If your target company is large (e.g., Coca-Cola, LVMH), there’s a good chance they own many, many subsidiaries. Doing research using websites like Crunchbase or Wikipedia to learn about the business can be surprisingly helpful in understanding your attack surface. But, for the purposes of this blog, let’s focus on one target. As of writing, Tesla has a broad bug bounty program, so we’ll focus on them (always be sure to verify you’re authorized to test something!).

Tesla is a very big company, so it’s no surprise that their main domain is tesla.com, but if we weren’t sure, we could always verify with Google Dorks, which help us tailor our Google searches.

1
site:linkedin.com tesla
2
site:github.com tesla
3
inurl:tesla

Diving Deeper: ASNs, DNS, and WHOIS

However, not every domain that Tesla owns is going to say “HELLO WE ARE TESLA” in the URL or even the front page, so we need a way to figure out what the company actually owns. Fortunately for us, the internet already has a solution for us! The Internet Assigned Numbers Authority (IANA), responsible for coordinating IP addressing, has lists of who owns what IP addresses. Essentially, blocks of IP addresses that belong to an organization get assigned an autonomous system number (ASN), and there are plenty of public resources to query this information.

IP addresses alone don’t paint a full picture, though, especially when one IP could host multiple websites. Here, we can leverage the Domain Name Service (DNS) to see what IP addresses map to what domains. dnsrecon can help us automatically do the reverse, that is, using IP addresses to find domain names. Below, we’re using one of the IP ranges we identified from a Tesla ASN and checking those IPs against Google’s public DNS server at 8.8.8.8.

1
$ dnsrecon -r 2.18.48.0/24 -n 8.8.8.8

On top of checking IP addresses against domain names and vice versa, we can also check who owns a domain. The WHOIS protocol can query databases that store information about what users have registered a domain in the first place.

1
$ whois tesla.com

Some domains might obfuscate this information for privacy reasons, but for the ones that don’t, you could use reverse WHOIS searches to find other domains registered by the same email/registrant.

Now, this was already a lot. You could do all of this manually, but there are plenty of projects out there that have attempted to automate many of these steps. The most popular example is OWASP Amass, but there are plenty of others with their pros and cons.

At the end of the day, not every IP address will map to a web domain, and even if we do all of this enumeration, there’s still a chance that we might miss an ASN or a domain because of the weirdness with mergers and acquisitions, or some other reason. It’s important to remember that recon is not a one-off step, OSINT and enumeration are steps that you perform continuously while hunting for vulnerabilities, so if you find a new domain that you’ve never seen in these results before, go back and rerun these steps.

Casing the Joints

So we have a list of domains, now what? Well, like any good heist movie, we need to extensively scope out the spots we’ve picked out. In our case, even though we have a list of domains, we’ve barely scratched the surface!

Subdomain Enumeration

If you’ve been around information security and bug bounty circles for long enough, you have seen any number of tools used to identify subdomains, and for good reason! In a big company, it is a challenge to maintain your own inventory of what your publicly-facing assets are, so if an attacker can find a long-forgotten domain that’s vulnerable, that’s gold. As such, many people have written many tools to enumerate these subdomains using various means:

We could probably write a whole blog about subdomain enumeration by itself because of how many different ways researchers have been able to optimize their searches. Most of these tools perform some brute forcing and attempts at zone transfers, but not all of them will look at TLS/SSL certificates, results returned by VirusTotal, or even the Internet Archive!

By now, we likely have an extremely large list of subdomains, but none of the methods we’ve discussed so far necessarily verify that all of these domains are active. Instead of verifying that each of these domains is live by opening them in the browser, we can use gowitness or eyewitness to not only verify if a domain is accessible, but also get screenshots of the landing pages. Once we’ve narrowed down our list of domains to the ones that are active, we can finally start searching for endpoints.

The Search For Endpoints

Like the other techniques we’ve discussed so far, we could either brute force for endpoints, or we could find references to endpoints in source code or search engines. Both of these approaches are valid, and in fact, should be used together for the most effective results.

Similar to the subdomain enumeration, many tools have been written to brute force endpoints (e.g., feroxbuster, gobuster, ffuf), but they all work similarly. Using one of the many wordlists from SecLists, we can run a command like the following to recursively brute force directories:

1
$ feroxbuster -u https://www.tesla.com -w /opt/SecLists/Discovery/Web-Content/raft-large-directories.txt -t 16

In the event that brute force isn’t effective, you should start digging into the web pages themselves. Static JavaScript on web apps frequently contains references to other endpoints on the website. It may be minified or obfuscated in many instances, but if it’s not, BishopFox’s jsluice tool can be extremely handy to extract URLs. Alternatively, this JavaScript one-liner from renniepak can be pasted into the dev tools console to extract URLs in the client-side code.

1
javascript:(function(){var scripts=document.getElementsByTagName("script"),regex=/(?<=(\"|\'|\`))\/[a-zA-Z0-9_?&=\/\-\#\.]*(?=(\"|\'|\`))/g;const results=new Set;for(var i=0;i<scripts.length;i++){var t=scripts[i].src;""!=t&&fetch(t).then(function(t){return t.text()}).then(function(t){var e=t.matchAll(regex);for(let r of e)results.add(r[0])}).catch(function(t){console.log("An error occurred: ",t)})}var pageContent=document.documentElement.outerHTML,matches=pageContent.matchAll(regex);for(const match of matches)results.add(match[0]);function writeResults(){results.forEach(function(t){document.write(t+"<br>")})}setTimeout(writeResults,3e3);})();

Some of this crawling can also be automated with the help of hakrawler and gau.

Note: I would have 100% expanded on this section but the application limited me to only 1500 words. I might revisit this later to add some more detail.

Conclusion

If you’ve followed along with each of these steps, you’ve seen that there’s a ton of work that goes into searching for assets on the internet, and this wasn’t even a comprehensive list! In fact, there are plenty of things we left out here, such as enumerating GitHub organizations or finding web applications hosted in the cloud, because recon is inherently deep and would make this blog go on forever. Knowing how deep the recon rabbit hole is, there are plenty of projects out there that seek to automate this effort. Here are just a few:

That said, there’s still plenty of room to innovate here. The most successful attackers aren’t solely using these tools and wordlists we’ve mentioned here; they’re building their own tools and wordlists by thinking about what no one else thinks of. With the advent of AI, you might be able to make an even more efficient workflow that triages outputs for you.

In the Ocean’s movies, people told Danny Ocean and his crew that what they were doing was impossible, but there’s a reason they have multiple movies. Recon isn’t about running one tool and hoping for the best, continuously re-evaluating scope, and making the impossible seem possible.