How To Down Proxylist .txt

Active5 months ago

I got the proxy list with proxybroker.

To change from the format <Proxy US 0.00s [] 104.131.6.78:80> into 104.131.6.78:80 with grep.

How to Read and Respect Robots.txt. Published by Jacob Koshy on March 3, 2017. The web is known as an open place – but that would be just an exaggeration if you take a closer look. The web that we know is actually just the tip of a huge iceberg. Search engine crawlers have access only to the ‘Surface web’ which is a name for the smaller.

All the proxy in proxy.csv in the following format.

I wrote my scrawler according to the webpage.
Multiple Proxies

Here is my frame structure--test.py.

The error info occurs when to run the spider with scrapy runspider test.py

Connection was refused by other side: 111: Connection refused.

With the same proxy got from proxybroker ,i use my own way to download the url set instead of scrapy.
To make it simple,all broken proxy ip remain instead of being removed.
The codes snippet following is to test whether proxy ip can be used instead of downloading url set perfectly.
The program structure are as following.

Many urls can be downloaded with proxy grabbed by proxybroker.
It is clear that :

many proxy ip grabbed by proxybroker can be used,many of them are free and stable.
some bug in my scrapy codes.

How to fix bugs in my scrapy?

vezunchik

3,2813 gold badges12 silver badges25 bronze badges

it_is_a_literatureit_is_a_literature

27923 gold badges70 silver badges167 bronze badges

1 Answer

try using the scrapy-proxies

In your Settings.py you can make changes something like this:

Hopefully this will help you, as this solved my problem too.

Jaffer WilsonJaffer Wilson

3,2853 gold badges29 silver badges68 bronze badges

1 Answer

Not the answer you're looking for? Browse other questions tagged proxyscrapyhttp-proxy or ask your own question.