How do you catch Baidu Spiders Through P addresses

how to read the website log through the IP address to find Baidu spider crawling rule, for all the webmaster, Baidu included is the most important concern. Understanding Baidu spider crawling law, so as to better improve the collection of the situation is also must master. Many websites are currently using virtual space and can provide logs. The log is the logfiles folder in the root directory of the web site. The date is.Txt, and the text file is

has a lot of introduction, through the HTTP view, return the command that way to view spiders, here is not introduced. More websites now do not provide a log format that can be viewed by software. More similar to the website’s log format, as follows

, 03:28:34, GET, /goods.php,, 20034696, 390

first 03:28:34 access time

The page /goods.php accessed by

second, GET, and get represents the access to

third accesses the source IP

of the web site

fourth 200 successfully access

fifth 34696390 represents the record size

is the format of the log, how to analyze, a look at the head are big. Log every website has more than 1M, thousands of records will not see dizziness. Attention, tell everyone a tip. After long-term observation, found that Baidu’s spider source, server, IP address, is a domain under the following network segment. What do you mean, that is, all of them start with 202.108, and the IP addresses are similar to IP? 202.108.X.X?. The IP address of this network segment is located in Beijing Netcom cable building, belonging to the backbone of the national Internet backbone, and now this section of IP has disappeared. Then, log out of your log and use ctrl+f to find out if there is a IP for this segment. Some words, just look for the time to visit, then you can find out the time Baidu spider access your web site of the law. That is the lever for updates the role of ah.

Leave a Reply

Your email address will not be published. Required fields are marked *