k1lib.k1ui module

k1ui is another project made in Java that aims to record and manipulate the screen, keyboard and mouse. The interface to that project on its own is clunky, and this module is the Python interface to ease its use.

Not quite developed yet tho, because I’m lazy.

k1lib.k1ui.main.get(path)[source]

Sends a get request to the Java server. Example:

k1ui.get("mouse/200/300") # move mouse to (200, 300)

class k1lib.k1ui.main.WsSession(eventCb: Callable[[WsSession, dict], None], mainThreadCb: Callable[[WsSession], None])[source]

Bases: object

__init__(eventCb: Callable[[WsSession, dict], None], mainThreadCb: Callable[[WsSession], None])[source]

Creates a websocket connection with the server, with some callback functions

The callback functions (most are async btw) will be passed a WebSocket object as the first argument. You can use it to send messages like this:

# this will send a signal to the server to close the session
sess.ws.send(json.dumps({"type": "close"}))
# this will send a signal to the server requesting the current screenshot. Result will be deposited into eventCb
sess.ws.send(json.dumps({"type": "screenshot"}))
# this will execute a single event
sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyTyped", "javaKeyCode": 0, ...}}))

Complete, minimum example:

events = []
async def eventCb(sess, event): events.append(event)
async def mainThreadCb(sess):
    sess.stream(300) # starts a stream with output screen width of 300px
    await asyncio.sleep(2)
    await sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyPressed", "javaKeyCode": 65, "timestamp": 0}}))
    await sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyReleased", "javaKeyCode": 65, "timestamp": 0}}))
    await asyncio.sleep(10); sess.close()
await k1ui.WsSession(eventCb, mainThreadCb).run()

What this code does is that it will communicate with the server continuously for 12 seconds, capturing all events in the mean time and save them into events list. It will start up a UDP stream to capture screenshots continuously, and after 2 seconds, it sends 2 events to the server, trying to type the letter “A”. Finally, it waits for another 10 seconds and then terminates the connection.

This interface is quite low-level, and is the basis for all other functionalities. Some of them include:

record(): recording a session
execute(): executes a list of events

Parameters:

eventCb – (async) will be called whenever there’s a new event
mainThreadCb – (async) will be called after setting up everything
streamWidth – specifies the width of the UDP stream, in pixels

stream(width)[source]: Starts a stream with a particular output width. The lower the width, the higher the fps and vice versa

async run()[source]: Connects with Java server, set things up and runs mainThreadCb

close()[source]: Closes the connection with the Java server

async execute(events)[source]: Executes a series of events

k1lib.k1ui.main.selectArea(x, y, w, h)[source]: Selects an area on the screen to focus into

async k1lib.k1ui.main.record(t=None, keyCode=None, streamWidth=300, f=<k1lib.cli.utils.iden object>)[source]

Records activities. Examples:

events = await k1ui.record(t=5) # records for 5 seconds
events = await k1ui.record(keyCode=5) # records until "Escape" is pressed
events = await k1ui.record() # records until interrupt signal is sent to the process

Note: these examples only work on jupyter notebooks. For regular Python processes, check out official Python docs (https://docs.python.org/3/library/asyncio-task.html)

Parameters:

t – record duration
keyCode – key to stop the recording
streamWidth – whether to opens the UDP stream and capture screenshots at this width or not
f – extra event post processing function

async k1lib.k1ui.main.execute(events: List[dict])[source]: Executes some events

class k1lib.k1ui.main.Recording(events)[source]

Bases: object

property duration

addTracks(*tracks) → Recording[source]: Adds tracks to the Recording

removeTracks(*tracks) → Recording[source]: Removes tracks from the Recording

zoom(t1=None, t2=None)[source]

Zooms into a particular time range. If either bounds are not specified, they will default to the start and end of all events.

Parameters:: t1 – time values are relative to the recording’s start time

sel(t1=None, t2=None, klass=None) → List[Track][source]

Selects a subset of tracks using several filters.

For selecting time, assuming we have a track that looks like this (x, y are t1, t2):

# |-1--|   |-2-|
#    |---3---|
#  x     y

Then, tracks 1 and 3 are selected. Time values are relative to recording’s start time

Parameters:

t1 – choose tracks that happen after this time
t2 – choose tracks that happen before this time
klass – choose specific track class

sel1(**kwargs) → List[Track][source]: Like sel(), but this time gets the first element only.

time0() → List[float][source]: Start and end recording times. Start time is zero

timeUnix() → List[float][source]: Start and end recording times. Both are absolute unix times

events() → List[dict][source]

Reconstructs events from the Recording’s internal data. The events are lossy though:

events = ... # events recorded
r = k1ui.Recording(events)
assert r.events() != events # this is the lossy part. Don't expect the produced events match exactly with each other

copy() → Recording[source]: Creates a clone of this recording

addTime(t: float, duration: float) → Recording

Inserts a specific duration into a specific point in time. More clearly, this transfroms this:

# |-1--|   |-2-|
#    |---3---|
#         ^ insert duration=3 here

Into this:

# |-1--|      |-2-|
#    |---3------|

Tracks that partly overlaps with the range will have their start/end times modified, and potentially delete some of the Track’s internal data:

Tracks whose only start and end times are modified: Char, Word, Click, Wheel
Tracks whose internal data are also modified: Contour, Stream

Parameters:

t – where to insert the duration, relative to Recording’s start time
duration – how long (in seconds) to insert?

formWords() → Recording

Tries to merge nearby CharTracks together that looks like the user is trying to type something, if they make sense. Assuming the user types “a”, then “b”, then “c”. This should be able to detect the intent that the user is trying to type “abc”, and replace 3 CharTracks with a WordTrack. Example:

# example recording, run in notebook cell to see interactive interface
r = k1ui.Recording.sample(); r
# run in another notebook cell and compare difference
r.formWords()

refine(enabled: List[int] = [1, 1, 0]) → Recording

Perform sensible default operations to refine the Recording. This currently includes:

1. Splitting ContourTracks into multiple smaller tracks using click events
1. Forming words from nearby CharTracks
1. Removing open-close CharTracks. Basically, CharTracks that don’t have a begin or end time

Parameters:: enabled – list of integers, whether to turn on or off certain features. 1 to turn on, 0 to turn off

removeTime(t1: float, t2: float) → Recording

Deletes time from t1 to t2 (relative to Recording’s start time). All tracks lying completely inside this range will be deleted. More clearly, it transforms this:

# |-1--|  |-2-|   |-3-|
#    |---4---|  |-5-|
#        ^       ^ delete between these carets

Into this:

# |-1--|   |-3-|
#    |-4-||5-|

Tracks that partly overlaps with the range will have their start/end times modified, and potentially delete some of the Track’s internal data:

Tracks whose only start and end times are modified: Char, Word, Click, Wheel
Tracks whose internal data are also modified: Contour, Stream

static sample() → Recording: Creates a Recording from sampleEvents()

static sampleEvents() → List[dict]: Grabs the built-in example events. Results will be really long, so beware, as it can crash your notebook if you try to display it.

class k1lib.k1ui.main.Track(startTime, endTime)[source]

Bases: object

__init__(startTime, endTime)[source]: Time values are absolute unix time.

time0() → List[float][source]: Start and end track times. Times are relative to track’s start time

time0Rec() → List[float][source]: Start and end track times. Times are relative to recording’s start time

timeUnix() → List[float][source]: Start and end track times. Times are absolute unix times

concurrent() → List[Track][source]: Grabs all tracks that are concurrent to this track

events() → List[dict][source]: Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]: Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters:: deltaTime – if negative, move left by this number of seconds, else move right

nextTrack() → Track: Grabs the next track (ordered by start time) in the recording

class k1lib.k1ui.main.CharTrack(keyText: str, keyCode: int, mods: List[bool], times: List[float])[source]

Bases: Track

__init__(keyText: str, keyCode: int, mods: List[bool], times: List[float])[source]

Representing 1 key pressed and released.

Parameters:

keyText – text to display to user, like “Enter”
keyCode – event’s “javaKeyCode”
mods – list of 3 booleans, whether ctrl, shift or alt is pressed

static parse(events) → List[CharTrack][source]

events()[source]: Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]: Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters:: deltaTime – if negative, move left by this number of seconds, else move right

class k1lib.k1ui.main.WordTrack(text, times: List[float])[source]

Bases: Track

__init__(text, times: List[float])[source]: Representing normal text input. This is not created from events directly. Rather, it’s created from scanning over CharTracks and merging them together

events()[source]: Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]: Creates a clone of this Track, to be implemented by subclasses

class k1lib.k1ui.main.ContourTrack(coords)[source]

Bases: Track

__init__(coords)[source]

Representing mouse trajectory (“mouseMoved” event).

Parameters:: coords – numpy array with shape (#events, [x, y, unix time])

static parse(events) → List[ContourTrack][source]

events()[source]: Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]: Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters:: deltaTime – if negative, move left by this number of seconds, else move right

movePoint(x, y, start=True)

Move contour’s start/end to another location, smoothly scaling all intermediary points along.

Parameters:: start – if True, move the start point, else move the end point

split(times: List[float])

Splits this contour track by multiple timestamps relative to recording’s start time. Example:

r = k1ui.Recording.sample()
r.sel1(klass=k1ui.ContourTrack).split([5])

splitClick(clickTracks: List[ClickTrack] = None)

Splits this contour track by click events. Essentially, the click events chops this contour into multiple segments. Example:

r = k1ui.Recording.sample()
r.sel1(klass=k1ui.ContourTrack).splitClick()

Parameters:: clickTracks – if not specified, use all ClickTracks from the recording

class k1lib.k1ui.main.ClickTrack(coords: ndarray, times: List[float])[source]

Bases: Track

__init__(coords: ndarray, times: List[float])[source]: Representing a mouse pressed and released event

static parse(events) → List[ClickTrack][source]

isClick(threshold=1)[source]

Whether this ClickTrack represents a single click.

Parameters:: threshold – if Manhattan distance between start and end is less than this amount, then declare it a single click

events()[source]: Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]: Creates a clone of this Track, to be implemented by subclasses

class k1lib.k1ui.main.WheelTrack(coords: ndarray, times: List[float])[source]

Bases: Track

__init__(coords: ndarray, times: List[float])[source]: Representing mouse wheel moved event

static parse(events) → List[WheelTrack][source]

events()[source]: Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]: Creates a clone of this Track, to be implemented by subclasses

class k1lib.k1ui.main.StreamTrack(frames: ndarray, times: ndarray)[source]

Bases: Track

__init__(frames: ndarray, times: ndarray)[source]: Representing screenshots from the UDP stream

static parse(events) → List[StreamTrack][source]

events()[source]: Reconstructs events from the Track’s internal data, to be implemented by subclasses.

copy()[source]: Creates a clone of this Track, to be implemented by subclasses

move(deltaTime)[source]

Moves the entire track left or right, to be implemented by subclasses.

Parameters:: deltaTime – if negative, move left by this number of seconds, else move right

k1lib.k1ui.main.distNet() → Module[source]

Grabs a pretrained network that might be useful in distinguishing between screens. Example:

net = k1ui.distNet()
net(torch.randn(16, 3, 192, 192)) # returns tensor of shape (16, 10)

class k1lib.k1ui.main.TrainScreen(r: Recording)[source]

Bases: object

__init__(r: Recording)[source]

Creates a screen training system that will train a small neural network to recognize different screens using a small amount of feedback from the user. Overview on how it’s supposed to look like:

Setting up:

r = k1ui.Recording(await k1ui.record(30)) # record everything for 30 seconds, and creates a recording out of it
ts = k1ui.TrainScreen(r) # creates the TrainScreen object
r # run this in a cell to display the recording, including StreamTrack
ts.addRule("home", "settings", "home") # add expected screen transition dynamics (home -> settings -> home)

Training with user’s feedback:

ts.registerFrames({"home": [100, 590, 4000, 4503], "settings": [1200, 2438]}) # label some frames of the recording. Network will train for ~6 seconds
next(ts) # display 20 images that confuses the network the most
ts.register({"home": [2, 6], "settings": [1, 16]}) # label some frames from the last line. Notice the frame numbers are much smaller and are <20
next(ts); ts.register({}); next(ts); ts.register({}) # repeat the last 2 lines for a few times (3-5 times is probably good enough for ~7 screens)

Evaluating the performance:

ts.graphs() # displays 2 graphs: network's prediction graph and the actual rule graph. Best way to judge performance
ts.l.Accuracy.plot() # actual accuracy metric while training. Network could have bad accuracy here while still able to construct a perfect graph, so don't rely much on this

Using the model:

ts.predict(torch.randn(2, 3, 192, 192) | k1ui.distNet()) # returns list of ints. Can use ts.idx2Name dict to convert to screen names

Saving the model:

ts | aS(dill.dumps) | file("ts.pth")

Warning

This won’t actually save the associated recording, because recordings are very heavy objects (several GB). It is expected that you manually manage the lifecycle of the recording.

data: List[Tuple[int, str]]: Core dataset of TrainScreen. Essentially just a list of (frameId, screen name)

train(restart=True)[source]

Trains the network for a while (300 epochs/6 seconds). Will be called automatically when you register new frames to the system

Parameters:: restart – whether to restart the small network or not

trainParams(joinAlpha: float = None, epochs: int = None)[source]

Sets training parameters.

Parameters:

joinAlpha – (default 0) alpha used in joinStreamsRandom component for each screen categories. Read more at joinStreamsRandom
epochs – (default 300) number of epochs for each training session

property frames: ndarray: Grab the frames from the first StreamTrack from the Recording

property feats: List[ndarray]: Gets the feature array of shape (N, 10) by passing the frames through distNet(). This returns a list of arrays, not a giant, stacked array for memory performance

register(d)[source]

Tells the object which images previously displayed by TrainScreen.__next__() associate with what screen name. Example:

next(ts) # displays the images out to a notebook cell
ts.register({"home": [3, 4, 7], "settings": [5, 19, 2], "monkeys": [15, 11], "guns": []})

This will also quickly (around 6 seconds) train a small neural network on all available frames based on the new information you provided.