Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 621 Bytes

README.md

File metadata and controls

38 lines (27 loc) · 621 Bytes

Five crawlers, targeted at five different sites.

####Difficulties

  • damai
    • ajax
  • douban
    • captcha
    • block ip
  • zhihu
    • dynamic page
  • weibo
    • post data has random id
  • songtaste

####Solutions

  • douban
    • catch the captcha and enter the characters manually
    • set a interval for each request, or use a proxy
  • zhihu
    • use selenium2 and phantomjs instead of urllib2
  • weibo
    • catch the random id
  • songtaste
    • the simplest one