What We Do Content Grabber is used for web scraping and web automation. It can extract content from almost any website and save it as structured data in a format of your choice, including Excel reports, XML, CSV and most databases.
We recognize not every client is the same, so we give you options which can be tailored to your needs. Do it yourself web scrapingDo It Yourself You can run Content Grabber on your own PC/Servers to extract content from as many websites as you like.
You have control, so there are no restrictions or monthly data fees. Content Grabber’s visual interface is extremely easy to use, is very powerful and rich in features. Content Grabber was initially developed for corporations who need improved performance and reliability, yet it is so intuitive it’s even suitable for absolute beginners.
Performance & Scalability
Content Grabber was designed from the very beginning with performance and scalability as the top priority. Multi-threading is used wherever appropriate to limit common web scraping bottlenecks such as web page retrieval.
Optimized web browsers
Web browsers are used to load and parse web pages, and Content Grabber has a range of different browsers to achieve maximum performance in every scenario – from a fully dynamic web browser to the ultra-fast HTML5 parser only browser. Different types of browsers can be used on the same website and Content Grabber will normally use many browsers at the same time – all running multi-threaded.
All web scraping tools spend most of their time waiting for new web pages to load, so it’s important to optimize this process. Content Grabber will automatically optimize page loads, but will also allow you to get under the hood to fine tune every aspect of the process.
Web scraping is notoriously unreliable and will often fail because of problems you have no control over. We understand that reliability is extremely important in many situations, so we have tackled this difficult issue head on and added strong support for debugging, error handling and logging.
Content Grabber has one of the best debuggers of any web automation software, and this will help you build reliable agents where all issues that can be resolved at design time are resolved at design time.
Many web scraping errors are unavoidable even with the best designed agents, and this is where error handling comes into play. One example could be an unreliable website that suddenly starts returning only error pages, and requires a web browser restart to start functioning again.
Many dynamic websites have bugs causing errors that are impossible to handle gracefully. Dynamic websites are small applications running in your web browser, and they may crash, hang, leak memory or cause many other fatal issues.
Content Grabber uses a health monitor process that looks for problems in the running web browsers, and restarts browsers that have run into trouble. A restarted web browser will continue from the point where it failed, so in most situations, this will not cause any interruption to the web scraping process.
Logging & notifications
Some website errors may occur very rarely, and may be impossible to catch during debugging. An example could be CAPTCHA protection that appears after hours of web scraping, or simply a broken Internet connection. Content Grabber can log all activity and errors, including the full HTML of web pages that are causing problems. This makes it much easier to identify runtime errors and take appropriate action to resolve these.
Notifications can be used to notify an administrator about specific problems, such as missing web content or other errors.
Content Grabber can email status reports to an administrator when errors or notifications have occurred during web scraping.
The Content Grabber agent editor has a typical point and click user interface where you click on the content you want to extract, or on the buttons and links you want to follow.
The agent editor sets itself apart from the crowd with its built-in smarts that automatically detect and configure all commands. It will automatically create lists of content and links, handle pagination and web forms, download or upload files, and configure any other action you perform on a web page. At the same time, you always have the option to manually fine tune the commands, so Content Grabber gives you both simplicity and control.
The Content Grabber agent editor is so simple to use that it can easily be used by beginners, and the built-in smarts enable users to quickly build large numbers of web scraping agents.
Data is everything when it comes to web scraping. Content Grabber allows you to load data from any source and use it in your agents for anything you need. You can also export extracted data to almost anywhere. This flexibility is key – enabling your technology to grow with your business.
Once data has been extracted and exported, it can be distributed by email, FTP or a custom defined destination.
Agent Management Tools
Content Grabber is designed to manage hundreds of agents in a professional web scraping environment with development, testing and productions servers.
Logs, schedules and status information for all agents can be managed in one centralized location, and all proxies, database connections and script libraries can be managed on a per server basis.
No one wants to write scripts to get things done and with Content Grabber you rarely have to. However, if you have some unusual requirements, or you need to fine tune some process, it’s nice to know the ability is there.
Content Grabber has a fully-fledged built-in script editor with IntelliSense that is more than capable when building smaller scripts.
Distribute Executable Agents Royalty Free
Build royalty free self-contained web scraping agents that can run anywhere without the Content Grabber software. A self-contained agent is a single executable file that is easy to send or copy anywhere, and has a multitude of powerful configuration options.
You are free to sell or give away your self-contained agents and you can add promotional messages and advertisements to the agents’ user interface. Content Grabber imagery / adverts are also included. Note: If you want to white-label your self-contained agent you will need to use the Premium Edition of Content Grabber.
You can run agents from the command-line by using the Content Grabber command-line program. With this you can specify command-line parameters that can easily be used as input data by your agents.
The Premium addition includes all of the features listed above under Professional Edition as well as those below.
Visual Studio 2018 Integration
Content Grabber can integrate with Visual Studio 2018 for the most powerful script editing, debugging and unit testing features.
Custom Display Templates
The standard configuration screens of a self contained agent include promotional messages for Content Grabber. Custom HTML display templates allow you to remove these promotional messages and add your own designs to the screens – effectively allowing you to white label your self-contained agent.
Command-line (royalty free distribution)
The command-line program can run without the Content Grabber software and can be distributed royalty free.
The Content Grabber API can be used to add web automation capabilities to your own desktop and web applications.
The dedicated web API has minimal dependencies and requires no special security privileges, so it’s very easy to use in a web environment. The web API does require access to the Content Grabber Windows service, which is part of the Content Grabber software, and must be installed on the web server or a server accessible to the web server.
Royalty Free Runtime
The Content Grabber runtime can be used to run and edit agents, but it cannot create new agents, or add and remove agent commands.
The runtime can be distributed with your own applications royalty free.