spider-webScrapy (ReconSpider)

Scrapy (ReconSpider)

We will leverage Scrapy and a custom spider tailored for reconnaissance on inlanefreight.com. If you are interested in more information on crawling/spidering techniques, refer to the "Using Web Proxiesarrow-up-right" module, as it forms part of CBBH as well.

Installing Scrapy

Before we begin, ensure you have Scrapy installed on your system. If you don't, you can easily install it using pip, the Python package installer:

pip3 install scrapy
pip3 install scrapy --break-system-packages

Installing scrapy in Virtual Environment

β”Œβ”€[root@parrot]─[/home/z3tssu/HTB]
└──╼ #python3 -m venv scrapy_env
β”Œβ”€[root@parrot]─[/home/z3tssu/HTB]
└──╼ #source scrapy_env/bin/activate
(scrapy_env) β”Œβ”€[root@parrot]─[/home/z3tssu/HTB]
└──╼ #pip install scrapy

Start Virtual Environment to Run ReconSpider

  1. In the location where you create the virtual environment use the following command

source scrapy_env/bin/activate

This command will download and install Scrapy along with its dependencies, preparing your environment for building our spider.

ReconSpider

First, run this command in your terminal to download the custom scrapy spider, ReconSpider, and extract it to the current working directory.

With the files extracted, you can run ReconSpider.py using the following command:

Replace inlanefreight.com with the domain you want to spider. The spider will crawl the target and collect valuable information.

results.json

After running ReconSpider.py, the data will be saved in a JSON file, results.json.

Last updated