k1lib.k1ui module
k1ui is another project made in Java that aims to record and manipulate the screen, keyboard and mouse. The interface to that project on its own is clunky, and this module is the Python interface to ease its use.
Not quite developed yet tho, because I’m lazy.
- k1lib.k1ui.main.get(path)[source]
Sends a get request to the Java server. Example:
k1ui.get("mouse/200/300") # move mouse to (200, 300)
- class k1lib.k1ui.main.WsSession(eventCb: Callable[[WsSession, dict], None], mainThreadCb: Callable[[WsSession], None])[source]
Bases:
object
- __init__(eventCb: Callable[[WsSession, dict], None], mainThreadCb: Callable[[WsSession], None])[source]
Creates a websocket connection with the server, with some callback functions
The callback functions (most are async btw) will be passed a WebSocket object as the first argument. You can use it to send messages like this:
# this will send a signal to the server to close the session sess.ws.send(json.dumps({"type": "close"})) # this will send a signal to the server requesting the current screenshot. Result will be deposited into eventCb sess.ws.send(json.dumps({"type": "screenshot"})) # this will execute a single event sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyTyped", "javaKeyCode": 0, ...}}))
Complete, minimum example:
events = [] async def eventCb(sess, event): events.append(event) async def mainThreadCb(sess): sess.stream(300) # starts a stream with output screen width of 300px await asyncio.sleep(2) await sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyPressed", "javaKeyCode": 65, "timestamp": 0}})) await sess.ws.send(json.dumps({"type": "execute", "event": {"type": "keyReleased", "javaKeyCode": 65, "timestamp": 0}})) await asyncio.sleep(10); sess.close() await k1ui.WsSession(eventCb, mainThreadCb).run()
What this code does is that it will communicate with the server continuously for 12 seconds, capturing all events in the mean time and save them into
events
list. It will start up a UDP stream to capture screenshots continuously, and after 2 seconds, it sends 2 events to the server, trying to type the letter “A”. Finally, it waits for another 10 seconds and then terminates the connection.This interface is quite low-level, and is the basis for all other functionalities. Some of them include:
- Parameters
eventCb – (async) will be called whenever there’s a new event
mainThreadCb – (async) will be called after setting up everything
streamWidth – specifies the width of the UDP stream, in pixels
- async k1lib.k1ui.main.record(t=None, keyCode=None, streamWidth=300, f=<k1lib.cli.utils.iden object>)[source]
Records activities. Examples:
events = await k1ui.record(t=5) # records for 5 seconds events = await k1ui.record(keyCode=5) # records until "Escape" is pressed events = await k1ui.record() # records until interrupt signal is sent to the process
Note: these examples only work on jupyter notebooks. For regular Python processes, check out official Python docs (https://docs.python.org/3/library/asyncio-task.html)
- Parameters
t – record duration
keyCode – key to stop the recording
streamWidth – whether to opens the UDP stream and capture screenshots at this width or not
f – extra event post processing function
- class k1lib.k1ui.main.Recording(events)[source]
Bases:
object
- property duration
- zoom(t1=None, t2=None)[source]
Zooms into a particular time range. If either bounds are not specified, they will default to the start and end of all events.
- Parameters
t1 – time values are relative to the recording’s start time
- sel(t1=None, t2=None, klass=None) List[Track] [source]
Selects a subset of tracks using several filters.
For selecting time, assuming we have a track that looks like this (x, y are t1, t2):
# |-1--| |-2-| # |---3---| # x y
Then, tracks 1 and 3 are selected. Time values are relative to recording’s start time
- Parameters
t1 – choose tracks that happen after this time
t2 – choose tracks that happen before this time
klass – choose specific track class
- events() List[dict] [source]
Reconstructs events from the Recording’s internal data. The events are lossy though:
events = ... # events recorded r = k1ui.Recording(events) assert r.events() != events # this is the lossy part. Don't expect the produced events match exactly with each other
- addTime(t: float, duration: float) Recording
Inserts a specific duration into a specific point in time. More clearly, this transfroms this:
# |-1--| |-2-| # |---3---| # ^ insert duration=3 here
Into this:
# |-1--| |-2-| # |---3------|
Tracks that partly overlaps with the range will have their start/end times modified, and potentially delete some of the Track’s internal data:
Tracks whose only start and end times are modified: Char, Word, Click, Wheel
Tracks whose internal data are also modified: Contour, Stream
- Parameters
t – where to insert the duration, relative to Recording’s start time
duration – how long (in seconds) to insert?
- formWords() Recording
Tries to merge nearby CharTracks together that looks like the user is trying to type something, if they make sense. Assuming the user types “a”, then “b”, then “c”. This should be able to detect the intent that the user is trying to type “abc”, and replace 3 CharTracks with a WordTrack. Example:
# example recording, run in notebook cell to see interactive interface r = k1ui.Recording.sample(); r # run in another notebook cell and compare difference r.formWords()
- refine(enabled: List[int] = [1, 1, 0]) Recording
Perform sensible default operations to refine the Recording. This currently includes:
Splitting ContourTracks into multiple smaller tracks using click events
Forming words from nearby CharTracks
Removing open-close CharTracks. Basically, CharTracks that don’t have a begin or end time
- Parameters
enabled – list of integers, whether to turn on or off certain features. 1 to turn on, 0 to turn off
- removeTime(t1: float, t2: float) Recording
Deletes time from t1 to t2 (relative to Recording’s start time). All tracks lying completely inside this range will be deleted. More clearly, it transforms this:
# |-1--| |-2-| |-3-| # |---4---| |-5-| # ^ ^ delete between these carets
Into this:
# |-1--| |-3-| # |-4-||5-|
Tracks that partly overlaps with the range will have their start/end times modified, and potentially delete some of the Track’s internal data:
Tracks whose only start and end times are modified: Char, Word, Click, Wheel
Tracks whose internal data are also modified: Contour, Stream
- static sample() Recording
Creates a Recording from
sampleEvents()
- class k1lib.k1ui.main.Track(startTime, endTime)[source]
Bases:
object
- time0Rec() List[float] [source]
Start and end track times. Times are relative to recording’s start time
- events() List[dict] [source]
Reconstructs events from the Track’s internal data, to be implemented by subclasses.
- class k1lib.k1ui.main.CharTrack(keyText: str, keyCode: int, mods: List[bool], times: List[float])[source]
Bases:
Track
- __init__(keyText: str, keyCode: int, mods: List[bool], times: List[float])[source]
Representing 1 key pressed and released.
- Parameters
keyText – text to display to user, like “Enter”
keyCode – event’s “javaKeyCode”
mods – list of 3 booleans, whether ctrl, shift or alt is pressed
- class k1lib.k1ui.main.WordTrack(text, times: List[float])[source]
Bases:
Track
- __init__(text, times: List[float])[source]
Representing normal text input. This is not created from events directly. Rather, it’s created from scanning over CharTracks and merging them together
- class k1lib.k1ui.main.ContourTrack(coords)[source]
Bases:
Track
- __init__(coords)[source]
Representing mouse trajectory (“mouseMoved” event).
- Parameters
coords – numpy array with shape (#events, [x, y, unix time])
- static parse(events) List[ContourTrack] [source]
- events()[source]
Reconstructs events from the Track’s internal data, to be implemented by subclasses.
- move(deltaTime)[source]
Moves the entire track left or right, to be implemented by subclasses.
- Parameters
deltaTime – if negative, move left by this number of seconds, else move right
- movePoint(x, y, start=True)
Move contour’s start/end to another location, smoothly scaling all intermediary points along.
- Parameters
start – if True, move the start point, else move the end point
- split(times: List[float])
Splits this contour track by multiple timestamps relative to recording’s start time. Example:
r = k1ui.Recording.sample() r.sel1(klass=k1ui.ContourTrack).split([5])
- splitClick(clickTracks: Optional[List[ClickTrack]] = None)
Splits this contour track by click events. Essentially, the click events chops this contour into multiple segments. Example:
r = k1ui.Recording.sample() r.sel1(klass=k1ui.ContourTrack).splitClick()
- Parameters
clickTracks – if not specified, use all ClickTracks from the recording
- class k1lib.k1ui.main.ClickTrack(coords: ndarray, times: List[float])[source]
Bases:
Track
- __init__(coords: ndarray, times: List[float])[source]
Representing a mouse pressed and released event
- static parse(events) List[ClickTrack] [source]
- isClick(threshold=1)[source]
Whether this ClickTrack represents a single click.
- Parameters
threshold – if Manhattan distance between start and end is less than this amount, then declare it a single click
- class k1lib.k1ui.main.WheelTrack(coords: ndarray, times: List[float])[source]
Bases:
Track
- static parse(events) List[WheelTrack] [source]
- class k1lib.k1ui.main.StreamTrack(frames: ndarray, times: ndarray)[source]
Bases:
Track
- static parse(events) List[StreamTrack] [source]
- k1lib.k1ui.main.distNet() Module [source]
Grabs a pretrained network that might be useful in distinguishing between screens. Example:
net = k1ui.distNet() net(torch.randn(16, 3, 192, 192)) # returns tensor of shape (16, 10)
- class k1lib.k1ui.main.TrainScreen(r: Recording)[source]
Bases:
object
- __init__(r: Recording)[source]
Creates a screen training system that will train a small neural network to recognize different screens using a small amount of feedback from the user. Overview on how it’s supposed to look like:
Setting up:
r = k1ui.Recording(await k1ui.record(30)) # record everything for 30 seconds, and creates a recording out of it ts = k1ui.TrainScreen(r) # creates the TrainScreen object r # run this in a cell to display the recording, including StreamTrack ts.addRule("home", "settings", "home") # add expected screen transition dynamics (home -> settings -> home)
Training with user’s feedback:
ts.registerFrames({"home": [100, 590, 4000, 4503], "settings": [1200, 2438]}) # label some frames of the recording. Network will train for ~6 seconds next(ts) # display 20 images that confuses the network the most ts.register({"home": [2, 6], "settings": [1, 16]}) # label some frames from the last line. Notice the frame numbers are much smaller and are <20 next(ts); ts.register({}); next(ts); ts.register({}) # repeat the last 2 lines for a few times (3-5 times is probably good enough for ~7 screens)
Evaluating the performance:
ts.graphs() # displays 2 graphs: network's prediction graph and the actual rule graph. Best way to judge performance ts.l.Accuracy.plot() # actual accuracy metric while training. Network could have bad accuracy here while still able to construct a perfect graph, so don't rely much on this
Using the model:
ts.predict(torch.randn(2, 3, 192, 192) | k1ui.distNet()) # returns list of ints. Can use ts.idx2Name dict to convert to screen names
Saving the model:
ts | aS(dill.dumps) | file("ts.pth")
Warning
This won’t actually save the associated recording, because recordings are very heavy objects (several GB). It is expected that you manually manage the lifecycle of the recording.
- data: List[Tuple[int, str]]
Core dataset of TrainScreen. Essentially just a list of (frameId, screen name)
- train(restart=True)[source]
Trains the network for a while (300 epochs/6 seconds). Will be called automatically when you register new frames to the system
- Parameters
restart – whether to restart the small network or not
- trainParams(joinAlpha: Optional[float] = None, epochs: Optional[int] = None)[source]
Sets training parameters.
- Parameters
joinAlpha – (default 0) alpha used in joinStreamsRandom component for each screen categories. Read more at
joinStreamsRandom
epochs – (default 300) number of epochs for each training session
- property frames: ndarray
Grab the frames from the first
StreamTrack
from theRecording
- property feats: List[ndarray]
Gets the feature array of shape (N, 10) by passing the frames through
distNet()
. This returns a list of arrays, not a giant, stacked array for memory performance
- register(d)[source]
Tells the object which images previously displayed by
TrainScreen.__next__()
associate with what screen name. Example:next(ts) # displays the images out to a notebook cell ts.register({"home": [3, 4, 7], "settings": [5, 19, 2], "monkeys": [15, 11], "guns": []})
This will also quickly (around 6 seconds) train a small neural network on all available frames based on the new information you provided.
See also:
registerFrames()
- registerFrames(data: Dict[str, List[int]])[source]
Tells the object which frames should have which labels. Example:
ts.registerFrames({"home": [328, 609], "settings": [12029], "monkeys": [1238]})
This differs from
register()
in that the frame id here is the absolute frame index in the recording, while inregister()
, it’s the frame displayed byTrainScreen.__next__()
.
- addRule(*screenNames: List[str]) TrainScreen [source]
Adds a screen transition rule. Let’s say that the transition dynamic looks like this:
home <---> settings <---> account ^ | v shortcuts
You can represent it like this:
ts.addRule("home", "settings", "account", "settings", "home") ts.addRule("settings", "shortcuts", "settings")
- transitionScreens(obeyRule: bool = <object object>) List[Tuple[int, str]] [source]
Get the list of screens (list of (frameId, screen name) tuple) that the network deems to be transitions between screen states.
- Parameters
obeyRule – if not specified, then don’t filter. If True, returns only screens that are part of the specified rule and vice versa
- predict(feats: Tensor) List[int] [source]
Using the built-in network, tries to predict the screen name for a bunch of features of shape (N, 10). Example:
r = ...; ts = k1ui.TrainScreen(r); next(ts) ts.register({"bg": [9, 10, 11, 12, 17, 19], "docs": [5, 6, 7, 8, 0, 1, 4], "jupyter": [2, 3]}) # returns list of 2 integers ts.predict(torch.randn(2, 3, 192, 192) | aS(k1ui.distNet()))
- transitionGraph() graphviz.graphs.Digraph [source]
Gets a screen transition graph of the entire recording. See also:
graphs()
- ruleGraph() graphviz.graphs.Digraph [source]
Gets a screen transition graph based on the specified rules. Rules are added using
addRule()
. See also:graphs()
- graphs() Carousel [source]
Combines both graphs from
transitionGraph()
andruleGraph()