A wildly popular browser automation tool, garnering over 17.8k stars!
Selenium
XPath
I used Selenium before The biggest fear with web crawler scripts is that the websites they crawl will change their layout, and the XPath code they've written will be compromised. The selector code needs to be debugged and modified again.
After finally getting it working, the script crashes again just two days later when the other party changes the button's position. Many developers have likely experienced this frustrating cycle of being constantly put through the wringer.
Skyvern
Recently, I came across a wildly popular open-source automation tool on GitHub: Skyvern , which offers us a different solution.
Instead of relying on hard-coded selectors, AI is now able to identify and understand page content through screenshots, much like a human can understand all the information on a webpage before taking action.
In layman's terms, it can "understand" where the search box and buttons are on the page, instead of memorizing their locations by name.
Next, let's take a closer look at Skyvern.
Understand the webpage, not just memorize its location.
At its core is an open-source AI agent, combined with a large visual model. and language model It can understand page content and break away from traditional automated processing methods.
By taking screenshots and using a visual model to analyze the page structure, identifying which buttons are clickable and which are input fields, and then using a language model to understand the page content, automated operations can be achieved.
Playwright
Compared to traditional Selenium, Playwright With its automated processing approach, Skyvern offers the following advantages:
Give instructions using natural language
Skyvern can not only adapt to changes in page layout, but also issue crawling tasks through natural language.
We no longer need to write complex code; we can simply tell it what task needs to be performed.
For example, send Skyvern: "Search for iPhone 17 on JD.com and add it to your cart."
Upon receiving the task, it automatically breaks it down into multiple steps: opening the website, finding the search box, entering keywords, clicking search, identifying the product, and clicking add to cart.
It's that simple. Just give the task in natural language. Even people who don't know programming can easily automate the process.
Support complex workflows
In addition to executing single tasks, Skyvern also allows us to customize multiple steps to build an automated workflow.
It also supports functions such as loops, conditional statements, and file parsing, basically covering most automation scenarios.
For example, to download invoices in batches, you can design a process: first log in to the website, filter out invoices after a certain date, extract the invoice list, and then click to download them one by one.
Bitwarden
1Password
Even more impressively, Skyvern can handle login verification and integrate with Bitwarden. 1Password These password managers.
When an account login is required during the execution of automated tasks, the system automatically fills in the relevant information to log in.
In addition, it has a built-in CAPTCHA solver, which can handle image CAPTCHAs encountered during the login process, making it quite powerful.
Finally, let's learn how to use it. The project's README document contains a detailed local deployment and installation guide, which supports Docker for quick deployment.
For those who are not very good at programming, they can also use the cloud-hosted version, which is ready to use right out of the box.
In conclusion
If you frequently need to handle repetitive web page operations, such as batch form filling or periodically scraping data, Skyvern might really be able to help you.
GPT-4o
However, it's worth noting that this tool relies on GPT-4o at its core. These large models incur API fees each time they are run.
https://
github.com/Skyvern-AI/s
kyvern
That concludes today's sharing. Thank you for taking the time to read. See you next time, Respect!