We’re looking for experienced data engineers or web scraping specialists to join our team. You’ll be responsible for building and maintaining scraping scripts and data pipelines that collect large volumes of data from public and partner websites. This role is hands-on and focused on reliable data extraction using Python and related tools.
Responsibilities
- Write and maintain web scrapers for public and dynamic websites (including JavaScript-heavy pages).
- Use tools like Scrapy, Selenium, Playwright, or BeautifulSoup to extract and structure data.
- Handle common scraping challenges like captchas, IP blocks, rate limits, and bot protection.
- Store and process data using SQL and/or NoSQL databases.
- Monitor scraper performance and data quality, and apply fixes or updates as needed.
- Work with APIs and handle data from different formats (HTML, JSON, CSV, XML).
- Write clean and reusable Python code.
- Use Docker and Git for version control and deployment.
- Communicate with the team about progress, issues, and requirements.
Requirements
- 3+ years of experience with web scraping or backend data extraction.
- Strong Python skills, especially with scraping libraries (Scrapy, Playwright, Selenium, etc.).
- Experience dealing with websites that have anti-scraping protections.
- Familiar with REST APIs, HTML parsing, and browser automation.
- Comfortable using databases (PostgreSQL, MySQL, MongoDB, or similar).
- Familiarity with Docker, Git, and basic cloud deployment (AWS, GCP, or similar).
- Can troubleshoot scraping failures and maintain working scripts over time.
Nice to Have
- Experience with distributed scraping or large-scale scraping projects.
- Familiarity with Airflow or similar workflow tools.
- Experience using proxies and rotating user agents.
- Understanding of scraping-related legal and ethical considerations.
How to Apply
Interested in this role?
Send your CV and any relevant links to recruiting@technexus.io.