k1lib.zircon module
Browser automation tool. This is kinda like selenium, but way more awesome.
How it works is that I’ve developed a chrome extension that can communicate with one of my servers, and functions here also communicate with it. After installing the extension, you can open up a bunch of chrome windows, then using this module, you can “attach” to a specific window. Then using methods provided here, you can execute any random pieces of code as it you’re in chrome’s console.
This works already for some of my projects, but it takes too much time to document everything, and I have so many other things to do, so if you’re interested, ping me at 157239q@gmail.com and I’ll finalize this module. Some examples of what this can do:
(yes, I’m still addicted to touhou and it’s slowly destroying my life)
- class k1lib.zircon.Browser[source]
Bases:
object
- async scan(groupPath: str | list[str] = None)[source]
Scans for all attached Extensions in the system. Example:
b = zircon.newBrowser() await b.scan() # grab metadata about every Extension that's ready await b.scan("touhou") # grab metadata for Extensions in `touhou` group only await b.scan(["touhou", "mint2"]) # grab metadata for Extensions in `touhou` or `mint2` groups
The result might look something like this:
{'_ext_254175259_1701581542_5': {'basics': {'title': '', 'url': 'https://zircon.mlexps.com/'}, 'tabInfo': {'tabId': 872280718, 'data': {'extId': '_ext_254175259_1701581542_5', 'groupPath': 'touhou'}}, 'lastUpdated': 1701584077.7896328, 'lastPing': 1701587171.1521692}, '_ext_254175259_1701581542_11': {'basics': {'title': '', 'url': 'https://zircon.mlexps.com/'}, 'tabInfo': {'tabId': 872280722, 'data': {'extId': '_ext_254175259_1701581542_11', 'groupPath': 'touhou'}}, 'lastUpdated': 1701583562.7153254, 'lastPing': 1701587171.2784846}}
- Parameters
groupPath – (optional) If specified, only returns metadata for Extensions that have the specified group
- async goto(url, timeout=15)[source]
Typical times for this to be waiting for page change confirmation:
https://mlexps.com: 5.2s
https://www.amazon.com: 6.2s
https://www.youtube.com: 9.4s
Quite a distribution. So I figure 15s would be a reasonable middle ground
- Parameters
url – url to navigate the page to
timeout – will hang until received confirmation that the extension has been loaded on the new page. If has not received anything after this many seconds, will return regardless
- async querySelectorAll(selector: str) list[k1lib.zircon.Element] [source]
- async locate(s: str) list[k1lib.zircon.Element] [source]
Locates text somewhere and returns plausible Elements
- class k1lib.zircon.Element(browser, selector: dict, extras: Optional[dict] = None)[source]
Bases:
object
- __init__(browser, selector: dict, extras: Optional[dict] = None)[source]
Represents a specific element in the current browser
- Parameters
extras – extra metadata, for nice displaying
- async value(chain: str)[source]
Gets the value of some property of the current element. Example:
browser = ... # returns value of `document.querySelector("body").innerHTML` await browser.querySelector("body").value(".innerHTML") # returns value of `document.querySelector("h1").style.color` await browser.querySelector("h1").value(".style.color")
- Parameters
chain – resolving chain to the property
- async setValue(chain: str, value)[source]
Sets value of element’s properties
See also:
value()
- Parameters
chain – resolving chain to the property
value – anything json-dumpable
- async func(chain: str, args=None)[source]
Executes any function on this element. Example:
browser = ... await browser.querySelector("#someBtn").func(".click")
- Parameters
chain – resolving chain to the function
args – tuple of json-dumpable objects
- async inputText(value)[source]
Input text to this element (assuming input box/text area). Also dispatches ‘input’ event to trigger many systems
- async parentC(minWidth=0, minHeight=0, deltaX=0, deltaY=0, maxTries=30, takeAfter=True)[source]
Grabs a nested parent element of this element that meets the specified conditions. “parentC” can be thought of as “complex parent”. Example:
await e.parent() # most straightforwardly, gets the immediate parent await e.parentC(minWidth=600, takeAfter=False)
The second line requires some explaining. Let’s say that there’re these elements: A -> B -> C -> D -> E Assume E is the current element. Then the second line will recursively grabs parent elements, check if it’s width is at least that width (in this case, say “B”), then returns the element B if takeAfter is True, else it returns element C.
There’re more selectors: - minWidth: grab parent that’s at least this wide - minHeight: same with minWidth - deltaY: if positive, grab parent that has y’ > y + deltaY (y is the current element’s y). If
negative, grab parent that has y’ < y + deltaY
deltaX: same with deltaY
These together with
childrenC()
should help you to navigate around locally.- Parameters
minWidth – if specified, finds smallest parent that is bigger than this
maxTries – the number of consecutive parents to try out if minWidth or minHeight is specified
takeAfter – if True, take the parent bigger than the constraints, else take the parent just shy of that
- async childrenC(minWidth=0, minHeight=0, maxWidth=inf, maxHeight=inf) List[Element] [source]
Recursively grabs all children, and returns all elements that’s within the specified bounds. Note that if A is the parent of B, and both meets the conditions, then only A is returned.
- async querySelectorAll(selector: str) list[k1lib.zircon.Element] [source]
- async locate(s: str) list[k1lib.zircon.Element] [source]
- class k1lib.zircon.Locator(name: str, topleft: int, bottomright: int, text: str = '', tag: str = '', klass: str = '', nChildren: int = 0)[source]
Bases:
object
- class k1lib.zircon.BrowserGroup(groupPath: str | list[str], limit: int = 3)[source]
Bases:
object
- __init__(groupPath: str | list[str], limit: int = 3)[source]
Constructs a browser group.
- Parameters
groupPath – what group of browsers do you want to take control over?
limit – only take over this many browsers
- async execute(aFn, timeout=20)[source]
Executes the specified async function repeatedly whenever a browser frees up. Example:
linksToVisit = deque([ 'https://en.touhouwiki.net/wiki/Reimu_Hakurei', 'https://en.touhouwiki.net/wiki/Marisa_Kirisame', 'https://en.touhouwiki.net/wiki/Touhou_Project', 'https://en.touhouwiki.net/wiki/Imperishable_Night', 'https://en.touhouwiki.net/wiki/Perfect_Cherry_Blossom', 'https://en.touhouwiki.net/wiki/Embodiment_of_Scarlet_Devil', 'https://en.touhouwiki.net/wiki/Subterranean_Animism', 'https://en.touhouwiki.net/wiki/Mountain_of_Faith', 'https://en.touhouwiki.net/wiki/Phantasmagoria_of_Flower_View', 'https://en.touhouwiki.net/wiki/Hakurei_Shrine', 'https://en.touhouwiki.net/wiki/Touhou_Wiki:Projects', 'https://en.touhouwiki.net/wiki/Yukari_Yakumo', 'https://en.touhouwiki.net/wiki/Undefined_Fantastic_Object', 'https://en.touhouwiki.net/wiki/Aya_Shameimaru', 'https://en.touhouwiki.net/wiki/Sakuya_Izayoi', 'https://en.touhouwiki.net/wiki/Immaterial_and_Missing_Power', 'https://en.touhouwiki.net/wiki/Sanae_Kochiya' ]) data = [] async def crawl(b:"zircon.Browser"): # put here because it seems to resolve lots of problems that I have # when browser instances are scheduled too close together await asyncio.sleep(1) # if it seems like there're no more data to process, then throw zircon.BrowserCancel(). # The current browser will never be scheduled while executing this function again if len(linksToVisit) == 0: raise zircon.BrowserCancel() url = linksToVisit.popleft() try: # do your normal web crawling stuff here await b.goto(url) title = await (await b.querySelector("title")).value("innerHTML") # save data somewhere data.append([url, title]) except: linksToVisit.append(url) # try again later bg = zircon.BrowserGroup("public", 5) # bg = zircon.BrowserGroup(["public", "starcraft"], 5) # or can also be this await bg.execute(crawl)
The last command will run the crawl function over and over again, as long as there’s a free browser to do it. Also, by default, this will only use inactive browsers (no Python clients are sending them commands for a while, configurable at
settings.zircon.conflictDuration
)Notice how I wrapped all browser interactions inside a try-except block? If some errors were to appear, like connection lost and the system is trying to restore the connection and you don’t resolve it, .execute() will throw that same error and cancels all current tasks. So if you want to design something that will run for a long time, catch it and try to schedule the job for later
- Parameters
aFn – async function to be executed
timeout – if the function takes longer than this amount of time, then cancel the task and make the browser available in the future again. Can be None, but I’d advise against that