Crawling
Crawling, often called spidering, is the automated process of systematically browsing the World Wide Web. Similar to how a spider navigates its web, a web crawler follows links from one page to another, collecting information. These crawlers are essentially bots that use pre-defined algorithms to discover and index web pages, making them accessible through search engines or for other purposes like data analysis and web reconnaissance.
.Well-Known URIs
The .well-known standard, defined in RFC 8615, serves as a standardized directory within a website's root domain. This designated location, typically accessible via the /.well-known/ path on a web server, centralizes a website's critical metadata, including configuration files and information related to its services, protocols, and security mechanisms.
By establishing a consistent location for such data, .well-known simplifies the discovery and access process for various stakeholders, including web browsers, applications, and security tools. This streamlined approach enables clients to automatically locate and retrieve specific configuration files by constructing the appropriate URL. For instance, to access a website's security policy, a client would request https://example.com/.well-known/security.txt.
The Internet Assigned Numbers Authority (IANA) maintains a registry of .well-known URIs, each serving a specific purpose defined by various specifications and standards. Below is a table highlighting a few notable examples:
security.txt
Contains contact information for security researchers to report vulnerabilities.
Permanent
RFC 9116
/.well-known/change-password
Provides a standard URL for directing users to a password change page.
Provisional
https://w3c.github.io/webappsec-change-password-url/#the-change-password-well-known-uri
openid-configuration
Defines configuration details for OpenID Connect, an identity layer on top of the OAuth 2.0 protocol.
Permanent
http://openid.net/specs/openid-connect-discovery-1_0.html
assetlinks.json
Used for verifying ownership of digital assets (e.g., apps) associated with a domain.
Permanent
https://github.com/google/digitalassetlinks/blob/master/well-known/specification.md
mta-sts.txt
Specifies the policy for SMTP MTA Strict Transport Security (MTA-STS) to enhance email security.
Permanent
RFC 8461
Last updated