k1lib.zircon module

Browser automation tool. This is kinda like selenium, but way more awesome.

How it works is that I’ve developed a chrome extension that can communicate with one of my servers, and functions here also communicate with it. After installing the extension, you can open up a bunch of chrome windows, then using this module, you can “attach” to a specific window. Then using methods provided here, you can execute any random pieces of code as it you’re in chrome’s console.

This works already for some of my projects, but it takes too much time to document everything, and I have so many other things to do, so if you’re interested, ping me at 157239q@gmail.com and I’ll finalize this module. Some examples of what this can do:

(yes, I’m still addicted to touhou and it’s slowly destroying my life)

k1lib.zircon.newBrowser() Browser[source]

Creates a new browser

class k1lib.zircon.Browser[source]

Bases: object

close()[source]
async send(d) res[source]
async scan(groupPath: str | list[str] = None)[source]

Scans for all attached Extensions in the system. Example:

b = zircon.newBrowser()
await b.scan()                    # grab metadata about every Extension that's ready
await b.scan("touhou")            # grab metadata for Extensions in `touhou` group only
await b.scan(["touhou", "mint2"]) # grab metadata for Extensions in `touhou` or `mint2` groups

The result might look something like this:

{'_ext_254175259_1701581542_5': {'basics': {'title': '',
   'url': 'https://zircon.mlexps.com/'},
  'tabInfo': {'tabId': 872280718,
   'data': {'extId': '_ext_254175259_1701581542_5', 'groupPath': 'touhou'}},
  'lastUpdated': 1701584077.7896328,
  'lastPing': 1701587171.1521692},
 '_ext_254175259_1701581542_11': {'basics': {'title': '',
   'url': 'https://zircon.mlexps.com/'},
  'tabInfo': {'tabId': 872280722,
   'data': {'extId': '_ext_254175259_1701581542_11', 'groupPath': 'touhou'}},
  'lastUpdated': 1701583562.7153254,
  'lastPing': 1701587171.2784846}}
Parameters

groupPath – (optional) If specified, only returns metadata for Extensions that have the specified group

async pickExt(extId: str)[source]
async pickExtFromGroup(groupName: str)[source]
async goto(url, timeout=15)[source]

Typical times for this to be waiting for page change confirmation:

Quite a distribution. So I figure 15s would be a reasonable middle ground

Parameters
  • url – url to navigate the page to

  • timeout – will hang until received confirmation that the extension has been loaded on the new page. If has not received anything after this many seconds, will return regardless

async querySelector(selector: str) Element[source]
async querySelectorAll(selector: str) list[k1lib.zircon.Element][source]
async window() Element[source]
async document() Element[source]
async locate(s: str) list[k1lib.zircon.Element][source]

Locates text somewhere and returns plausible Elements

async locate2(locator: Locator, depth: int = 20, width: int = 100)[source]
async scrollDown(timeout=120, step=3000, sleep=5)[source]

Scrolls <step> pixels down continuously every <sleep> seconds, until can’t, or time exceeds <timeout>

class k1lib.zircon.Element(browser, selector: dict, extras: Optional[dict] = None)[source]

Bases: object

__init__(browser, selector: dict, extras: Optional[dict] = None)[source]

Represents a specific element in the current browser

Parameters

extras – extra metadata, for nice displaying

async value(chain: str)[source]

Gets the value of some property of the current element. Example:

browser = ...
# returns value of `document.querySelector("body").innerHTML`
await browser.querySelector("body").value(".innerHTML")
# returns value of `document.querySelector("h1").style.color`
await browser.querySelector("h1").value(".style.color")
Parameters

chain – resolving chain to the property

async setValue(chain: str, value)[source]

Sets value of element’s properties

See also: value()

Parameters
  • chain – resolving chain to the property

  • value – anything json-dumpable

async func(chain: str, args=None)[source]

Executes any function on this element. Example:

browser = ...
await browser.querySelector("#someBtn").func(".click")
Parameters
  • chain – resolving chain to the function

  • args – tuple of json-dumpable objects

async inputText(value)[source]

Input text to this element (assuming input box/text area). Also dispatches ‘input’ event to trigger many systems

async parent()[source]

Grabs the direct parent of this element. Short, sweet and simple

async parentC(minWidth=0, minHeight=0, deltaX=0, deltaY=0, maxTries=30, takeAfter=True)[source]

Grabs a nested parent element of this element that meets the specified conditions. “parentC” can be thought of as “complex parent”. Example:

await e.parent() # most straightforwardly, gets the immediate parent
await e.parentC(minWidth=600, takeAfter=False)

The second line requires some explaining. Let’s say that there’re these elements: A -> B -> C -> D -> E Assume E is the current element. Then the second line will recursively grabs parent elements, check if it’s width is at least that width (in this case, say “B”), then returns the element B if takeAfter is True, else it returns element C.

There’re more selectors: - minWidth: grab parent that’s at least this wide - minHeight: same with minWidth - deltaY: if positive, grab parent that has y’ > y + deltaY (y is the current element’s y). If

negative, grab parent that has y’ < y + deltaY

  • deltaX: same with deltaY

These together with childrenC() should help you to navigate around locally.

Parameters
  • minWidth – if specified, finds smallest parent that is bigger than this

  • maxTries – the number of consecutive parents to try out if minWidth or minHeight is specified

  • takeAfter – if True, take the parent bigger than the constraints, else take the parent just shy of that

async children() List[Element][source]

Grabs all direct children of this element.

async childrenC(minWidth=0, minHeight=0, maxWidth=inf, maxHeight=inf) List[Element][source]

Recursively grabs all children, and returns all elements that’s within the specified bounds. Note that if A is the parent of B, and both meets the conditions, then only A is returned.

async querySelector(selector: str) Element[source]
async querySelectorAll(selector: str) list[k1lib.zircon.Element][source]
async snake() str[source]
async locate(s: str) list[k1lib.zircon.Element][source]
async locate2(locator: Locator, depth=20, width=100)[source]
class k1lib.zircon.Locator(name: str, topleft: int, bottomright: int, text: str = '', tag: str = '', klass: str = '', nChildren: int = 0)[source]

Bases: object

addChild(child: Locator)[source]
json()[source]
static fromJson(d)[source]
static builder()[source]
plot()[source]
exception k1lib.zircon.BrowserCancel[source]

Bases: Exception

class k1lib.zircon.BrowserGroup(groupPath: str | list[str], limit: int = 3)[source]

Bases: object

__init__(groupPath: str | list[str], limit: int = 3)[source]

Constructs a browser group.

Parameters
  • groupPath – what group of browsers do you want to take control over?

  • limit – only take over this many browsers

async execute(aFn, timeout=20)[source]

Executes the specified async function repeatedly whenever a browser frees up. Example:

linksToVisit = deque([
    'https://en.touhouwiki.net/wiki/Reimu_Hakurei',
    'https://en.touhouwiki.net/wiki/Marisa_Kirisame',
    'https://en.touhouwiki.net/wiki/Touhou_Project',
    'https://en.touhouwiki.net/wiki/Imperishable_Night',
    'https://en.touhouwiki.net/wiki/Perfect_Cherry_Blossom',
    'https://en.touhouwiki.net/wiki/Embodiment_of_Scarlet_Devil',
    'https://en.touhouwiki.net/wiki/Subterranean_Animism',
    'https://en.touhouwiki.net/wiki/Mountain_of_Faith',
    'https://en.touhouwiki.net/wiki/Phantasmagoria_of_Flower_View',
    'https://en.touhouwiki.net/wiki/Hakurei_Shrine',
    'https://en.touhouwiki.net/wiki/Touhou_Wiki:Projects',
    'https://en.touhouwiki.net/wiki/Yukari_Yakumo',
    'https://en.touhouwiki.net/wiki/Undefined_Fantastic_Object',
    'https://en.touhouwiki.net/wiki/Aya_Shameimaru',
    'https://en.touhouwiki.net/wiki/Sakuya_Izayoi',
    'https://en.touhouwiki.net/wiki/Immaterial_and_Missing_Power',
    'https://en.touhouwiki.net/wiki/Sanae_Kochiya'
])
data = []

async def crawl(b:"zircon.Browser"):
    # put here because it seems to resolve lots of problems that I have
    # when browser instances are scheduled too close together
    await asyncio.sleep(1)

    # if it seems like there're no more data to process, then throw zircon.BrowserCancel().
    # The current browser will never be scheduled while executing this function again
    if len(linksToVisit) == 0: raise zircon.BrowserCancel()


    url = linksToVisit.popleft()
    try: # do your normal web crawling stuff here
        await b.goto(url)
        title = await (await b.querySelector("title")).value("innerHTML")

        # save data somewhere
        data.append([url, title])
    except: linksToVisit.append(url) # try again later

bg = zircon.BrowserGroup("public", 5)
# bg = zircon.BrowserGroup(["public", "starcraft"], 5) # or can also be this
await bg.execute(crawl)

The last command will run the crawl function over and over again, as long as there’s a free browser to do it. Also, by default, this will only use inactive browsers (no Python clients are sending them commands for a while, configurable at settings.zircon.conflictDuration)

Notice how I wrapped all browser interactions inside a try-except block? If some errors were to appear, like connection lost and the system is trying to restore the connection and you don’t resolve it, .execute() will throw that same error and cancels all current tasks. So if you want to design something that will run for a long time, catch it and try to schedule the job for later

Parameters
  • aFn – async function to be executed

  • timeout – if the function takes longer than this amount of time, then cancel the task and make the browser available in the future again. Can be None, but I’d advise against that