Crawling
Last updated
Last updated
Crawling
, often called spidering
, is the automated process of systematically browsing the World Wide Web
. Similar to how a spider navigates its web, a web crawler follows links from one page to another, collecting information. These crawlers are essentially bots that use pre-defined algorithms to discover and index web pages, making them accessible through search engines or for other purposes like data analysis and web reconnaissance.
The .well-known
standard, defined in , serves as a standardized directory within a website's root domain. This designated location, typically accessible via the /.well-known/
path on a web server, centralizes a website's critical metadata, including configuration files and information related to its services, protocols, and security mechanisms.
By establishing a consistent location for such data, .well-known
simplifies the discovery and access process for various stakeholders, including web browsers, applications, and security tools. This streamlined approach enables clients to automatically locate and retrieve specific configuration files by constructing the appropriate URL. For instance, to access a website's security policy, a client would request https://example.com/.well-known/security.txt
.
The Internet Assigned Numbers Authority
(IANA
) maintains a of .well-known
URIs, each serving a specific purpose defined by various specifications and standards. Below is a table highlighting a few notable examples:
security.txt
Contains contact information for security researchers to report vulnerabilities.
Permanent
RFC 9116
/.well-known/change-password
Provides a standard URL for directing users to a password change page.
Provisional
https://w3c.github.io/webappsec-change-password-url/#the-change-password-well-known-uri
openid-configuration
Defines configuration details for OpenID Connect, an identity layer on top of the OAuth 2.0 protocol.
Permanent
http://openid.net/specs/openid-connect-discovery-1_0.html
assetlinks.json
Used for verifying ownership of digital assets (e.g., apps) associated with a domain.
Permanent
https://github.com/google/digitalassetlinks/blob/master/well-known/specification.md
mta-sts.txt
Specifies the policy for SMTP MTA Strict Transport Security (MTA-STS) to enhance email security.
Permanent
RFC 8461