Kroll, Inc. – extensive scraping of demanding web content

Tom Potanski

Last updated on May 25, 2023 | 4 min read

We offer premium technology services for business.

DevsData LLC is a boutique software & recruitment agency, with Google-level engineers and a vast network of senior expert contractors.

You name the source and data to extract and we will come up with a tailor-made solution adjusted to your needs.

Product pricing & details
Finance
Social Media
Real Estate Data
Media News

Battle-tested at web scraping

Our engineers have an in-depth understanding of complex databases and broad experience in processing them.

We are able to extract data from multiple challenging sources, even scrape-proof websites. To achieve such results, we use the most advanced tech solutions such as:

Human Browsing experience simulation using selenium webdriver
Premium mobile proxy & VPN usage
Automatic captcha solving (apart from Google ReCaptcha)

Throughout the years our engineers have gathered, extensive hands-on experience

Creating and maintenance of several small scrapers on short notice

We designed and maintained several small scrapers for a company project on short notice. Our task was to extract the data as quickly as possible and filter it to obtain only the essential information. One of the projects was a Natural Language Processing scraping engine for a London-based hedge fund – it scraped and scored news articles based on precise criteria given by the client.
Scraping and processing confidential data

We created a scraper for a US-based client. It required as few requests as possible to collect responses for 300m SSNs under the protected form on the website. The obvious choice for scraping technology was the low-level request package.
The system was set on ten small machines on a Google cloud.
Scraping data stored deep inside HTML

We worked on extracting data from Filmweb – the second biggest movie database in the world. It required as few requests as possible to collect all data about every movie/TV series on the website. The data was stored deep inside HTML. Beautifulsoup was used to collect essential information and parts of the website.

Communication is the key

We always make sure to be on the same page with our clients as we strongly believe that communication is the key to fruitful cooperation.

Most of our specialists work remotely from our European office, however, we are open to permanent, cross-border relocation of selected engineers. For longer projects, we usually start full-time engagement with 2 weeks of onboarding, locally at the client’s office.

We took part in the maintenance and modification process of many scraping engines.

Scraping unstructured data from Wikipedia

We created a scraper running on Wikipedia to collect a data set regarding movies/television series and their cast. The biggest threat in this project was that the website was non-structured, so links to other subpages could have been located everywhere. Scrapy, which memorizes visited subpages and schedules pages to visit, was the most efficient technology to use.
Boosting the efficiency of an existing scraping engine

We took part in the maintenance and modification process of the company’s scraping engine. It was responsible for collecting profile data about people and companies from about ten confidential sources. The data had been purchased before, so our task was to collect what was either not yet available to buy or to extend the possessed data.
Gathering data from numerous websites

Our client needed to collect data on clothing products, with the main focus being their categorization and prices. There were about 30 websites with varying depth of information and protection against scraping.

Download as PDF

Any questions or comments? Let me know on Twitter/X.

Discover how IT recruitment and staffing can address your talent needs. Explore trending regions like Poland, Portugal, Mexico, Brazil and more.

🗓️ Schedule a consultation

Read full bio

Tom Potanski Managing Director

Tom is a passionate and experienced technology leader with 12 years of commercial experience in software and technology. His focus is on merging business with technology to help American clients find top technical talent in Europe and Latin America. He leverages industry insights and strategic thinking to connect companies with the right professionals, building lasting client relationships.

Kroll, Inc. – extensive scraping of demanding web content

Battle-tested at web scraping

Throughout the years our engineers have gathered, extensive hands-on experience

Creating and maintenance of several small scrapers on short notice

Scraping and processing confidential data

Scraping data stored deep inside HTML

Communication is the key

We took part in the maintenance and modification process of many scraping engines.

Scraping unstructured data from Wikipedia

Boosting the efficiency of an existing scraping engine

Gathering data from numerous websites

Tom Potanski Managing Director

Similar case studies

🇵🇱 Warsaw, Poland

🇺🇸 New York

🇬🇧 London, UK

🇪🇸 Barcelona, Spain

Bucharest, Romania

Lisbon, Portugal

Amsterdam, Netherlands

Sofia, Bulgaria

Mexico City, Mexico

Book a call with our team

For software development projects, minimum engagement is $15,000.

Best back-end engineers I've ever worked with.

Tailored recruitment process, trusted market expertise.

Outstanding vendor, 21 engineers hired.

Proactive partner, exceptional results.

Thank you