k1lib.k1ui module

k1ui is another project made in Java that aims to record and manipulate the screen, keyboard and mouse. The interface to that project on its own is clunky, and this module is the Python interface to ease its use.

Not quite developed yet tho, because I’m lazy.

k1lib.k1ui.main.get(path)[source]

Sends a get request to the Java server. Example:

k1ui.get("mouse/200/300") # move mouse to (200, 300)
class k1lib.k1ui.main.WsSession(eventCb: Callable[[WsSession, dict], None], mainThreadCb: Callable[[WsSession], None])[source]

Bases: object

__init__(eventCb: Callable[[WsSession, dict], None], mainThreadCb: Callable[[WsSession], None])[source]

Creates a websocket connection with the server, with some callback functions

The callback functions (most are async btw) will be passed a WebSocket object as the first argument. You can use it to send messages like this:

# this will send a signal to the server to close the session
sess.ws.send(json.dumps({"type": "close"}))
# this will send a signal to the server requesting the current screenshot. Result will be deposited into eventCb
sess.ws.send(json.dumps({"type": "screenshot"}))
# this will execute a single event
sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyTyped", "javaKeyCode": 0, ...}}))

Complete, minimum example:

events = []
async def eventCb(sess, event): events.append(event)
async def mainThreadCb(sess):
    sess.stream(300) # starts a stream with output screen width of 300px
    await asyncio.sleep(2)
    await sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyPressed", "javaKeyCode": 65, "timestamp": 0}}))
    await sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyReleased", "javaKeyCode": 65, "timestamp": 0}}))
    await asyncio.sleep(10); sess.close()
await k1ui.WsSession(eventCb, mainThreadCb).run()

What this code does is that it will communicate with the server continuously for 12 seconds, capturing all events in the mean time and save them into events list. It will start up a UDP stream to capture screenshots continuously, and after 2 seconds, it sends 2 events to the server, trying to type the letter “A”. Finally, it waits for another 10 seconds and then terminates the connection.

This interface is quite low-level, and is the basis for all other functionalities. Some of them include:

Parameters
  • eventCb – (async) will be called whenever there’s a new event

  • mainThreadCb – (async) will be called after setting up everything

  • streamWidth – specifies the width of the UDP stream, in pixels

stream(width)[source]

Starts a stream with a particular output width. The lower the width, the higher the fps and vice versa

async run()[source]

Connects with Java server, set things up and runs mainThreadCb

close()[source]

Closes the connection with the Java server

async execute(events)[source]

Executes a series of events

k1lib.k1ui.main.selectArea(x, y, w, h)[source]

Selects an area on the screen to focus into

async k1lib.k1ui.main.record(t=None, keyCode=None, streamWidth=300, f=<k1lib.cli.utils.iden object>)[source]

Records activities. Examples:

events = await k1ui.record(t=5) # records for 5 seconds
events = await k1ui.record(keyCode=5) # records until "Escape" is pressed
events = await k1ui.record() # records until interrupt signal is sent to the process

Note: these examples only work on jupyter notebooks. For regular Python processes, check out official Python docs (https://docs.python.org/3/library/asyncio-task.html)

Parameters
  • t – record duration

  • keyCode – key to stop the recording

  • streamWidth – whether to opens the UDP stream and capture screenshots at this width or not

  • f – extra event post processing function

async k1lib.k1ui.main.execute(events: List[dict])[source]

Executes some events

class k1lib.k1ui.main.Recording(events)[source]

Bases: object

property duration
addTracks(*tracks) Recording[source]

Adds tracks to the Recording

removeTracks(*tracks) Recording[source]

Removes tracks from the Recording

zoom(t1=None, t2=None)[source]

Zooms into a particular time range. If either bounds are not specified, they will default to the start and end of all events.

Parameters

t1 – time values are relative to the recording’s start time

sel(t1=None, t2=None, klass=None) List[Track][source]

Selects a subset of tracks using several filters.

For selecting time, assuming we have a track that looks like this (x, y are t1, t2):

# |-1--|   |-2-|
#    |---3---|
#  x     y

Then, tracks 1 and 3 are selected. Time values are relative to recording’s start time

Parameters
  • t1 – choose tracks that happen after this time

  • t2 – choose tracks that happen before this time

  • klass – choose specific track class

sel1(**kwargs) List[Track][source]

Like sel(), but this time gets the first element only.

time0() List[float][source]

Start and end recording times. Start time is zero

timeUnix() List[float][source]

Start and end recording times. Both are absolute unix times

events() List[dict][source]

Reconstructs events from the Recording’s internal data. The events are lossy though:

events = ... # events recorded
r = k1ui.Recording(events)
assert r.events() != events # this is the lossy part. Don't expect the produced events match exactly with each other
copy() Recording[source]

Creates a clone of this recording

addTime(t: float, duration: float) Recording

Inserts a specific duration into a specific point in time. More clearly, this transfroms this:

# |-1--|   |-2-|
#    |---3---|
#         ^ insert duration=3 here

Into this:

# |-1--|      |-2-|
#    |---3------|

Tracks that partly overlaps with the range will have their start/end times modified, and potentially delete some of the Track’s internal data:

  • Tracks whose only start and end times are modified: Char, Word, Click, Wheel

  • Tracks whose internal data are also modified: Contour, Stream

Parameters
  • t – where to insert the duration, relative to Recording’s start time

  • duration – how long (in seconds) to insert?

formWords() Recording

Tries to merge nearby CharTracks together that looks like the user is trying to type something, if they make sense. Assuming the user types “a”, then “b”, then “c”. This should be able to detect the intent that the user is trying to type “abc”, and replace 3 CharTracks with a WordTrack. Example:

# example recording, run in notebook cell to see interactive interface
r = k1ui.Recording.sample(); r
# run in another notebook cell and compare difference
r.formWords()
refine(enabled: List[int] = [1, 1, 0]) Recording

Perform sensible default operations to refine the Recording. This currently includes:

    1. Splitting ContourTracks into multiple smaller tracks using click events

    1. Forming words from nearby CharTracks

    1. Removing open-close CharTracks. Basically, CharTracks that don’t have a begin or end time

Parameters

enabled – list of integers, whether to turn on or off certain features. 1 to turn on, 0 to turn off

removeTime(t1: float, t2: float) Recording

Deletes time from t1 to t2 (relative to Recording’s start time). All tracks lying completely inside this range will be deleted. More clearly, it transforms this:

# |-1--|  |-2-|   |-3-|
#    |---4---|  |-5-|
#        ^       ^ delete between these carets

Into this:

# |-1--|   |-3-|
#    |-4-||5-|

Tracks that partly overlaps with the range will have their start/end times modified, and potentially delete some of the Track’s internal data:

  • Tracks whose only start and end times are modified: Char, Word, Click, Wheel

  • Tracks whose internal data are also modified: Contour, Stream

static sample() Recording

Creates a Recording from sampleEvents()

static sampleEvents() List[dict]

Grabs the built-in example events. Results will be really long, so beware, as it can crash your notebook if you try to display it.

class k1lib.k1ui.main.Track(startTime, endTime)[source]

Bases: object

__init__(startTime, endTime)[source]

Time values are absolute unix time.

time0() List[float][source]

Start and end track times. Times are relative to track’s start time

time0Rec() List[float][source]

Start and end track times. Times are relative to recording’s start time

timeUnix() List[float][source]

Start and end track times. Times are absolute unix times

concurrent() List[Track][source]

Grabs all tracks that are concurrent to this track

events() List[dict][source]

Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]

Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters

deltaTime – if negative, move left by this number of seconds, else move right

nextTrack() Track

Grabs the next track (ordered by start time) in the recording

class k1lib.k1ui.main.CharTrack(keyText: str, keyCode: int, mods: List[bool], times: List[float])[source]

Bases: Track

__init__(keyText: str, keyCode: int, mods: List[bool], times: List[float])[source]

Representing 1 key pressed and released.

Parameters
  • keyText – text to display to user, like “Enter”

  • keyCode – event’s “javaKeyCode”

  • mods – list of 3 booleans, whether ctrl, shift or alt is pressed

static parse(events) List[CharTrack][source]
events()[source]

Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]

Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters

deltaTime – if negative, move left by this number of seconds, else move right

class k1lib.k1ui.main.WordTrack(text, times: List[float])[source]

Bases: Track

__init__(text, times: List[float])[source]

Representing normal text input. This is not created from events directly. Rather, it’s created from scanning over CharTracks and merging them together

events()[source]

Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]

Creates a clone of this Track, to be implemented by subclasses

class k1lib.k1ui.main.ContourTrack(coords)[source]

Bases: Track

__init__(coords)[source]

Representing mouse trajectory (“mouseMoved” event).

Parameters

coords – numpy array with shape (#events, [x, y, unix time])

static parse(events) List[ContourTrack][source]
events()[source]

Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]

Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters

deltaTime – if negative, move left by this number of seconds, else move right

movePoint(x, y, start=True)

Move contour’s start/end to another location, smoothly scaling all intermediary points along.

Parameters

start – if True, move the start point, else move the end point

split(times: List[float])

Splits this contour track by multiple timestamps relative to recording’s start time. Example:

r = k1ui.Recording.sample()
r.sel1(klass=k1ui.ContourTrack).split([5])
splitClick(clickTracks: Optional[List[ClickTrack]] = None)

Splits this contour track by click events. Essentially, the click events chops this contour into multiple segments. Example:

r = k1ui.Recording.sample()
r.sel1(klass=k1ui.ContourTrack).splitClick()
Parameters

clickTracks – if not specified, use all ClickTracks from the recording

class k1lib.k1ui.main.ClickTrack(coords: ndarray, times: List[float])[source]

Bases: Track

__init__(coords: ndarray, times: List[float])[source]

Representing a mouse pressed and released event

static parse(events) List[ClickTrack][source]
isClick(threshold=1)[source]

Whether this ClickTrack represents a single click.

Parameters

threshold – if Manhattan distance between start and end is less than this amount, then declare it a single click

events()[source]

Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]

Creates a clone of this Track, to be implemented by subclasses

class k1lib.k1ui.main.WheelTrack(coords: ndarray, times: List[float])[source]

Bases: Track

__init__(coords: ndarray, times: List[float])[source]

Representing mouse wheel moved event

static parse(events) List[WheelTrack][source]
events()[source]

Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]

Creates a clone of this Track, to be implemented by subclasses

class k1lib.k1ui.main.StreamTrack(frames: ndarray, times: ndarray)[source]

Bases: Track

__init__(frames: ndarray, times: ndarray)[source]

Representing screenshots from the UDP stream

static parse(events) List[StreamTrack][source]
events()[source]

Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]

Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters

deltaTime – if negative, move left by this number of seconds, else move right

k1lib.k1ui.main.distNet() Module[source]

Grabs a pretrained network that might be useful in distinguishing between screens. Example:

net = k1ui.distNet()
net(torch.randn(16, 3, 192, 192)) # returns tensor of shape (16, 10)
class k1lib.k1ui.main.TrainScreen(r: Recording)[source]

Bases: object

__init__(r: Recording)[source]

Creates a screen training system that will train a small neural network to recognize different screens using a small amount of feedback from the user. Overview on how it’s supposed to look like:

Setting up:

r = k1ui.Recording(await k1ui.record(30)) # record everything for 30 seconds, and creates a recording out of it
ts = k1ui.TrainScreen(r) # creates the TrainScreen object
r # run this in a cell to display the recording, including StreamTrack
ts.addRule("home", "settings", "home") # add expected screen transition dynamics (home -> settings -> home)

Training with user’s feedback:

ts.registerFrames({"home": [100, 590, 4000, 4503], "settings": [1200, 2438]}) # label some frames of the recording. Network will train for ~6 seconds
next(ts) # display 20 images that confuses the network the most
ts.register({"home": [2, 6], "settings": [1, 16]}) # label some frames from the last line. Notice the frame numbers are much smaller and are <20
next(ts); ts.register({}); next(ts); ts.register({}) # repeat the last 2 lines for a few times (3-5 times is probably good enough for ~7 screens)

Evaluating the performance:

ts.graphs() # displays 2 graphs: network's prediction graph and the actual rule graph. Best way to judge performance
ts.l.Accuracy.plot() # actual accuracy metric while training. Network could have bad accuracy here while still able to construct a perfect graph, so don't rely much on this

Using the model:

ts.predict(torch.randn(2, 3, 192, 192) | k1ui.distNet()) # returns list of ints. Can use ts.idx2Name dict to convert to screen names

Saving the model:

ts | aS(dill.dumps) | file("ts.pth")

Warning

This won’t actually save the associated recording, because recordings are very heavy objects (several GB). It is expected that you manually manage the lifecycle of the recording.

data: List[Tuple[int, str]]

Core dataset of TrainScreen. Essentially just a list of (frameId, screen name)

train(restart=True)[source]

Trains the network for a while (300 epochs/6 seconds). Will be called automatically when you register new frames to the system

Parameters

restart – whether to restart the small network or not

trainParams(joinAlpha: Optional[float] = None, epochs: Optional[int] = None)[source]

Sets training parameters.

Parameters
  • joinAlpha – (default 0) alpha used in joinStreamsRandom component for each screen categories. Read more at joinStreamsRandom

  • epochs – (default 300) number of epochs for each training session

property frames: ndarray

Grab the frames from the first StreamTrack from the Recording

property feats: List[ndarray]

Gets the feature array of shape (N, 10) by passing the frames through distNet(). This returns a list of arrays, not a giant, stacked array for memory performance

register(d)[source]

Tells the object which images previously displayed by TrainScreen.__next__() associate with what screen name. Example:

next(ts) # displays the images out to a notebook cell
ts.register({"home": [3, 4, 7], "settings": [5, 19, 2], "monkeys": [15, 11], "guns": []})

This will also quickly (around 6 seconds) train a small neural network on all available frames based on the new information you provided.

See also: registerFrames()

registerFrames(data: Dict[str, List[int]])[source]

Tells the object which frames should have which labels. Example:

ts.registerFrames({"home": [328, 609], "settings": [12029], "monkeys": [1238]})

This differs from register() in that the frame id here is the absolute frame index in the recording, while in register(), it’s the frame displayed by TrainScreen.__next__().

addRule(*screenNames: List[str]) TrainScreen[source]

Adds a screen transition rule. Let’s say that the transition dynamic looks like this:

home <---> settings <---> account
              ^
              |
              v
          shortcuts

You can represent it like this:

ts.addRule("home", "settings", "account", "settings", "home")
ts.addRule("settings", "shortcuts", "settings")
transitionScreens(obeyRule: bool = <object object>) List[Tuple[int, str]][source]

Get the list of screens (list of (frameId, screen name) tuple) that the network deems to be transitions between screen states.

Parameters

obeyRule – if not specified, then don’t filter. If True, returns only screens that are part of the specified rule and vice versa

newEvent(sess: WsSession, event: dict)[source]
predict(feats: Tensor) List[int][source]

Using the built-in network, tries to predict the screen name for a bunch of features of shape (N, 10). Example:

r = ...; ts = k1ui.TrainScreen(r); next(ts)
ts.register({"bg": [9, 10, 11, 12, 17, 19], "docs": [5, 6, 7, 8, 0, 1, 4], "jupyter": [2, 3]})
# returns list of 2 integers
ts.predict(torch.randn(2, 3, 192, 192) | aS(k1ui.distNet()))
transitionGraph() graphviz.graphs.Digraph[source]

Gets a screen transition graph of the entire recording. See also: graphs()

ruleGraph() graphviz.graphs.Digraph[source]

Gets a screen transition graph based on the specified rules. Rules are added using addRule(). See also: graphs()

graphs() Carousel[source]

Combines both graphs from transitionGraph() and ruleGraph()

labeledData() Carousel[source]

Visualizes labeled data

correctRatio()[source]

Ratio between the number of screens that is in a valid transition and ones that isn’t in a valid transition. Just a quick metric to see how well the network is doing. The higher the number, the better it is