Technology behind Koharu

I have been working on a project called Koharu for almost a year. The same period I have been learning Rust from scratch. The technology behind Koharu varies a lot, from desktop application to networking, from machine learning to text rendering. As the project grows, I have gained more experience in these fields. I would like to share some of the technology behind Koharu in this post.

Meet Tauri

Tauri has been more popular in recent years as RIIR (Rewrite It In Rust) movement goes on. From my point of view, Tauri will never replace Electron, writing backend and frontend in the same language is a big advantage for Electron, it is much easier to communicate between backend and frontend. However, Tauri is built on top of the Rust ecosystem, which means it is much more flexible and low-level than Electron.

The ease of using JavaScript/TypeScript for GUI development is a big advantage. Reusing existing web packages makes it much easier to build the UI and control the state. The UI won't be a bottleneck for the application, as we have seen Microsoft is adapting web technologies for their desktop applications, i.e. React Native for Windows.

The hard part of desktop application development is sharing the state between the frontend and backend. In Tauri, it uses WebView's IPC to communicate between the frontend and backend. By looking at the source code, it injects a global variable window.__TAURI_INTERNALS__ into the WebView, which is a invoke script that serialize the arguments and send it to the webview, the webview (Wry) handles the request and route it to the generated handler defined by #[tauri::command].¹

The implementation of IPC is quite hacky, it just works. That's why I created a more reliable RPC for Koharu.

Websocket-based RPC

There is already a websocket-based RPC library for Tauri, which is tauri-awesome-rpc. However, it uses JSON-RPC, which is not as efficient as binary RPC. So I decided to create my own websocket-based RPC for Koharu.

The message format is as follows:

#[derive(Debug, Serialize)]
#[serde(tag = "type")]
pub enum OutgoingMessage {
    #[serde(rename = "req")]
    Request {
        id: u32,
        method: String,
        params: Option<rmpv::Value>,
    },
    #[serde(rename = "res")]
    Response {
        id: u32,
        result: Option<rmpv::Value>,
        error: Option<String>,
    },
    #[serde(rename = "ntf")]
    Notification { method: String, params: rmpv::Value },
}

It uses MessagePack for serialization, which is more efficient than JSON, the Typescript support is also good. It also supports streaming, which is useful for sending notifications.

I'm not going to rebuild the RPC system from scratch, so I use Tower for the service dispatch, and Axum for the websocket server, use enum dispatch to route the request to the corresponding handler. This reduces the boilerplate code significantly.

The benefit of using a different RPC system is that we can decouple from Tauri's IPC system. Tauri's IPC system tied to the WebView, which means we cannot use it in the browser. By using a different RPC system, we can use the same RPC system in both the desktop application and the web application. This supports the feature of headless mode of Koharu.

Slint

I have tried to use Slint for the GUI of Koharu. It is a declarative UI toolkit for Rust. However, its support of async is not good enough for me. I need to use async/await for the GUI, but Slint does not support async/await. So I decided to use Tauri for the GUI of Koharu.

I felt the future of GUI is in Rust, but it is not mature enough yet. The fundamental work we can see is GPUI from Zed. Hope to see it grows.

Meet candle

candle is a PyTorch-like ML framework in Rust. It is developed by the maintainer of very popular Rust ML framework tch-rs. It is a very good framework for ML in Rust. It is very easy to use and it is very efficient.

We have re-implemented a few ML models in candle, such as YOLOv5, manga-ocr, etc. At the beginning, we convert the PyTorch models to ONNX format, and then use ort to run the ONNX models. However, we found that the ONNX format is not flexible enough for us, so we decided to re-implement the models in candle directly. This is a lot of work, but it is worth it.

What is worth mentioning is that codex saved us a lot of time. We used codex to generate the boilerplate code for the models, and then we manually optimized it. Thank you, little AI agent.

FFT

candle does not have built-in FFT support, and we need FFT for big LaMa model. So I added FFT support to the upstream CUDA dependency of candle, which is cudarc. With the optimization of FFT, we can run big LaMa model on consumer GPUs with very high speed. ²

LLM

We say AI, but we mean LLM. LLM is a very big model, and it is very hard to run it on consumer GPUs. However, we can use quantization to reduce the size of the model. Currently, we have support for 4 different architectures, including Llama, Qwen, Lfm, and Hunyuan.

From the result of testing, we found that fine-tuned LLM for light-novel style text generation is much better than the base model. They are more context-aware and can generate more coherent text according to the conversation.

Dynamic Loading

While candle is good, it is not enough for us. We need to support dynamic loading of CUDA runtime. This means we need to be able to load CUDA dylibs at runtime. This is good to ship a smaller binary and download the CUDA runtime on demand.

We created koharu-runtime, which is a crate to download and manage CUDA runtime. It downloads the CUDA runtime from pypi and uses the RECORD of zip file to verify the integrity of each dylib. It also provides a simple API to load the dylibs and use them.

Future

Future work is to optimize the workflow of pre-processing. The ML models are SOTA, but the pre-processing is not. We need to optimize the pre-processing to make it more accurate and improve the user experience.

By the way, Koharu is open source, welcome to contribute!