دانلود رایگان کتاب Web Scraping with Python: Data Extraction from the Modern Web, 3rd Edition در آی‌تی بوک

کتاب وب‌اسکرپینگ با پایتون: استخراج داده از وب مدرن – ویرایش سوم اثرRyan Mitchell

عنوان:

Web Scraping with Python

نویسنده:

Ryan Mitchell

انتشارات:

O'Reilly Media

تاریخ انتشار:

2024

حجم:

11.9MB

دانلود

معرفی فهرست مشخصات

معرفی کتاب:" کتاب وب‌اسکرپینگ با پایتون: استخراج داده از وب مدرن – ویرایش سوم"

اگر برنامه‌نویسی جادو باشد، وب‌اسکرپینگ قطعاً نوعی جادوگری است.

با نوشتن یک برنامه‌ی ساده، می‌توانید داده‌ها را از وب استخراج کرده و برای تحلیل، تحقیق یا ساخت اپلیکیشن استفاده کنید. این کتاب به‌طور کامل به‌روز شده و یک راهنمای گام‌به‌گام برای استخراج خودکار اطلاعات از وب با استفاده از زبان پایتون است.

بخش اول: مبانی وب‌اسکرپینگ

ارسال درخواست (Request) به سرور وب
مدیریت پاسخ (Response) دریافتی
تعامل خودکار با وب‌سایت‌ها

بخش دوم: ابزارهای پیشرفته و کاربردی

معرفی ابزارهایی برای پاسخ به سناریوهای مختلف وب‌اسکرپینگ
کار با صفحات HTML پیچیده
ایجاد خزنده‌های وب با Scrapy

چه چیزهایی خواهید آموخت؟

تجزیه‌ی صفحات HTML با BeautifulSoup و سایر ابزارها
ذخیره‌سازی و پردازش داده‌های استخراج‌شده
کار با اسناد مختلف مانند PDF، JSON، XML
پاک‌سازی و نرمال‌سازی داده‌ها با فرمت ضعیف
استخراج داده‌های متنی و کار با داده‌های زبان طبیعی (NLP)

مناسب برای:

برنامه‌نویسان پایتون، تحلیل‌گران داده، توسعه‌دهندگان بک‌اند، و علاقه‌مندان به جمع‌آوری داده‌های وب که به دنبال یادگیری کاربردی و ساخت ابزارهای استخراج داده هستند.

فهرست مطالب

Preface
I. Building Scrapers
1. How the Internet Works
Networking
HTML
CSS
JavaScript
Watching Websites with Developer Tools
2. The Legalities and Ethics of Web Scraping
Trademarks, Copyrights, Patents, Oh My!
Trespass to Chattels
The Computer Fraud and Abuse Act
robots.txt and Terms of Service
Three Web Scrapers
3. Applications of Web Scraping
Classifying Projects
E-commerce
Academic Research
Product Building
Travel
Sales
SERP Scraping
4. Writing Your First Web Scraper
Installing and Using Jupyter
Connecting
An Introduction to BeautifulSoup
5. Advanced HTML Parsing
Another Serving of BeautifulSoup
Regular Expressions
Regular Expressions and BeautifulSoup
Accessing Attributes
Lambda Expressions
You Don’t Always Need a Hammer
6. Writing Web Crawlers
Traversing a Single Domain
Crawling an Entire Site
Crawling Across the Internet
7. Web Crawling Models
Planning and Defining Objects
Dealing with Different Website Layouts
Structuring Crawlers
Thinking About Web Crawler Models
8. Scrapy
Installing Scrapy
Writing a Simple Scraper
Spidering with Rules
Creating Items
Outputting Items
The Item Pipeline
Logging with Scrapy
More Resources
9. Storing Data
Media Files
Storing Data to CSV
MySQL
Email
II. Advanced Scraping
10. Reading Documents
Document Encoding
Text
CSV
PDF
Microsoft Word and .docx
11. Working with Dirty Data
Cleaning Text
Working with Normalized Text
Cleaning Data with Pandas
12. Reading and Writing Natural Languages
Summarizing Data
Markov Models
Natural Language Toolkit
Additional Resources
13. Crawling Through Forms and Logins
Python Requests Library
Submitting a Basic Form
Radio Buttons, Checkboxes, and Other Inputs
Submitting Files and Images
Handling Logins and Cookies
Other Form Problems
14. Scraping JavaScript
A Brief Introduction to JavaScript
Ajax and Dynamic HTML
Executing JavaScript in Python with Selenium
Additional Selenium WebDrivers
Handling Redirects
A Final Note on JavaScript
15. Crawling Through APIs
A Brief Introduction to APIs
Parsing JSON
Undocumented APIs
Combining APIs with Other Data Sources
More About APIs
16. Image Processing and Text Recognition
Overview of Libraries
Processing Well-Formatted Text
Reading CAPTCHAs and Training Tesseract
Retrieving CAPTCHAs and Submitting Solutions
17. Avoiding Scraping Traps
A Note on Ethics
Looking Like a Human
Common Form Security Features
The Human Checklist
18. Testing Your Website with Scrapers
An Introduction to Testing
Python unittest
Testing with Selenium
19. Web Scraping in Parallel
Processes Versus Threads
Multithreaded Crawling
Multiple Processes
Multiprocess Crawling—Another Approach
20. Web Scraping Proxies
Why Use Remote Servers?
Tor
Remote Hosting
Web Scraping Proxies
Additional Resources
Index
About the Author

مشخصات

نام کتاب

Web Scraping with Python

نویسنده

Ryan Mitchell

انتشارات