Robots has some rules to follow, or break if they are nasty. Those rules are defined in the robots.txt file on the particular web site.
I'll start by creating a stub python script that will access a web page and print that to a console.
First, I need to install a module: requests. To do that, I need to install a Python package manager, PIP:
In https://bootstrap.pypa.io/, download and run getpip.py.
To install a package,
Navigate to the folder where pip.exe is located.
Run pip install requests
Now, requests can be imported and used
In my case, I'll use DI.se. The corresponding robots.txt file will be analyzed. If that page allows, I'll download the stock list and analyse the stocks in the list
No comments:
Post a Comment