k1lib.zircon module
Browser automation tool. This is kinda like selenium, but way more awesome.
How it works is that I’ve developed a chrome extension that can communicate with one of my servers, and functions here also communicate with it. After installing the extension, you can open up a bunch of chrome windows, then using this module, you can “attach” to a specific window. Then using methods provided here, you can execute any random pieces of code as it you’re in chrome’s console.
This works already for some of my projects, but it takes too much time to document everything, and I have so many other things to do, so if you’re interested, ping me at 157239q@gmail.com and I’ll finalize this module. Some examples of what this can do:
(yes, I’m still addicted to touhou and it’s slowly destroying my life)
- class k1lib.zircon.Browser[source]
Bases:
object
- async scan(groupPath: str | list[str] = None)[source]
Scans for all attached Extensions in the system. Example:
b = zircon.newBrowser() await b.scan() # grab metadata about every Extension that's ready await b.scan("touhou") # grab metadata for Extensions in `touhou` group only await b.scan(["touhou", "mint2"]) # grab metadata for Extensions in `touhou` or `mint2` groups
The result might look something like this:
{'_ext_254175259_1701581542_5': {'basics': {'title': '', 'url': 'https://zircon.mlexps.com/'}, 'tabInfo': {'tabId': 872280718, 'data': {'extId': '_ext_254175259_1701581542_5', 'groupPath': 'touhou'}}, 'lastUpdated': 1701584077.7896328, 'lastPing': 1701587171.1521692}, '_ext_254175259_1701581542_11': {'basics': {'title': '', 'url': 'https://zircon.mlexps.com/'}, 'tabInfo': {'tabId': 872280722, 'data': {'extId': '_ext_254175259_1701581542_11', 'groupPath': 'touhou'}}, 'lastUpdated': 1701583562.7153254, 'lastPing': 1701587171.2784846}}
- Parameters:
groupPath – (optional) If specified, only returns metadata for Extensions that have the specified group
- async goto(url, timeout=15)[source]
Typical times for this to be waiting for page change confirmation:
https://mlexps.com: 5.2s
https://www.amazon.com: 6.2s
https://www.youtube.com: 9.4s
Quite a distribution. So I figure 15s would be a reasonable middle ground
- Parameters:
url – url to navigate the page to
timeout – will hang until received confirmation that the extension has been loaded on the new page. If has not received anything after this many seconds, will return regardless
- class k1lib.zircon.Element(browser, selector: dict, extras: dict = None)[source]
Bases:
object
- __init__(browser, selector: dict, extras: dict = None)[source]
Represents a specific element in the current browser
- Parameters:
extras – extra metadata, for nice displaying
- async value(chain: str)[source]
Gets the value of some property of the current element. Example:
browser = ... # returns value of `document.querySelector("body").innerHTML` await browser.querySelector("body").value(".innerHTML") # returns value of `document.querySelector("h1").style.color` await browser.querySelector("h1").value(".style.color")
- Parameters:
chain – resolving chain to the property
- async setValue(chain: str, value)[source]
Sets value of element’s properties
See also:
value()
- Parameters:
chain – resolving chain to the property
value – anything json-dumpable
- async func(chain: str, args=None)[source]
Executes any function on this element. Example:
browser = ... await browser.querySelector("#someBtn").func(".click")
- Parameters:
chain – resolving chain to the function
args – tuple of json-dumpable objects
- async inputText(value, mode=0)[source]
Input text to this element (assuming input box/text area). You’d think that this is pretty simple, but it’s surprisingly very, very complicated. Tons of people doing tons of different things, with lots of frontend frameworks. That means that there is no single mode that just works everywhere, so that’s where the “mode” param comes in. Currently, zircon provides these:
0: sets “.value” of the element
1: send KeyboardEvent events to the element, typing each character out
2: use document.execCommand(“insertText”), works on facebook/instagram
- Parameters:
value – the text value to input
mode – explained above
- async parentC(minWidth=0, minHeight=0, deltaX=0, deltaY=0, maxTries=30, takeAfter=True)[source]
Grabs a nested parent element of this element that meets the specified conditions. “parentC” can be thought of as “complex parent”. Example:
await e.parent() # most straightforwardly, gets the immediate parent await e.parentC(minWidth=600, takeAfter=False)
The second line requires some explaining. Let’s say that there’re these elements: A -> B -> C -> D -> E Assume E is the current element. Then the second line will recursively grabs parent elements, check if it’s width is at least that width (in this case, say “B”), then returns the element B if takeAfter is True, else it returns element C.
There’re more selectors: - minWidth: grab parent that’s at least this wide - minHeight: same with minWidth - deltaY: if positive, grab parent that has y’ > y + deltaY (y is the current element’s y). If negative, grab parent that has y’ < y + deltaY - deltaX: same with deltaY
These together with
childrenC()
should help you to navigate around locally.- Parameters:
minWidth – if specified, finds smallest parent that is bigger than this
maxTries – the number of consecutive parents to try out if minWidth or minHeight is specified
takeAfter – if True, take the parent bigger than the constraints, else take the parent just shy of that
- class k1lib.zircon.Locator(name: str, topleft: int, bottomright: int, text: str = '', tag: str = '', klass: str = '', nChildren: int = 0)[source]
Bases:
object
- class k1lib.zircon.BrowserGroup(groupPath: str | list[str], limit: int = 3)[source]
Bases:
object
- __init__(groupPath: str | list[str], limit: int = 3)[source]
Constructs a browser group.
- Parameters:
groupPath – what group of browsers do you want to take control over?
limit – only take over this many browsers
- async execute(aFn, timeout=20)[source]
Executes the specified async function repeatedly whenever a browser frees up. Example:
linksToVisit = deque([ 'https://en.touhouwiki.net/wiki/Reimu_Hakurei', 'https://en.touhouwiki.net/wiki/Marisa_Kirisame', 'https://en.touhouwiki.net/wiki/Touhou_Project', 'https://en.touhouwiki.net/wiki/Imperishable_Night', 'https://en.touhouwiki.net/wiki/Perfect_Cherry_Blossom', 'https://en.touhouwiki.net/wiki/Embodiment_of_Scarlet_Devil', 'https://en.touhouwiki.net/wiki/Subterranean_Animism', 'https://en.touhouwiki.net/wiki/Mountain_of_Faith', 'https://en.touhouwiki.net/wiki/Phantasmagoria_of_Flower_View', 'https://en.touhouwiki.net/wiki/Hakurei_Shrine', 'https://en.touhouwiki.net/wiki/Touhou_Wiki:Projects', 'https://en.touhouwiki.net/wiki/Yukari_Yakumo', 'https://en.touhouwiki.net/wiki/Undefined_Fantastic_Object', 'https://en.touhouwiki.net/wiki/Aya_Shameimaru', 'https://en.touhouwiki.net/wiki/Sakuya_Izayoi', 'https://en.touhouwiki.net/wiki/Immaterial_and_Missing_Power', 'https://en.touhouwiki.net/wiki/Sanae_Kochiya' ]) data = [] async def crawl(b:"zircon.Browser"): # put here because it seems to resolve lots of problems that I have # when browser instances are scheduled too close together await asyncio.sleep(1) # if it seems like there're no more data to process, then throw zircon.BrowserCancel(). # The current browser will never be scheduled while executing this function again if len(linksToVisit) == 0: raise zircon.BrowserCancel() url = linksToVisit.popleft() try: # do your normal web crawling stuff here await b.goto(url) title = await (await b.querySelector("title")).value("innerHTML") # save data somewhere data.append([url, title]) except: linksToVisit.append(url) # try again later bg = zircon.BrowserGroup("public", 5) # bg = zircon.BrowserGroup(["public", "starcraft"], 5) # or can also be this await bg.execute(crawl)
The last command will run the crawl function over and over again, as long as there’s a free browser to do it. Also, by default, this will only use inactive browsers (no Python clients are sending them commands for a while, configurable at
settings.zircon.conflictDuration
)Notice how I wrapped all browser interactions inside a try-except block? If some errors were to appear, like connection lost and the system is trying to restore the connection and you don’t resolve it, .execute() will throw that same error and cancels all current tasks. So if you want to design something that will run for a long time, catch it and try to schedule the job for later
- Parameters:
aFn – async function to be executed
timeout – if the function takes longer than this amount of time, then cancel the task and make the browser available in the future again. Can be None, but I’d advise against that