From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36806C25B76 for ; Wed, 5 Jun 2024 11:02:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 675616B0082; Wed, 5 Jun 2024 07:02:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6257C6B0083; Wed, 5 Jun 2024 07:02:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 514976B0085; Wed, 5 Jun 2024 07:02:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 34FFC6B0082 for ; Wed, 5 Jun 2024 07:02:50 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id ABA7E1A1277 for ; Wed, 5 Jun 2024 11:02:49 +0000 (UTC) X-FDA: 82196547258.30.D58C15E Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf01.hostedemail.com (Postfix) with ESMTP id 8C26C40017 for ; Wed, 5 Jun 2024 11:02:47 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=dubeyko-com.20230601.gappssmtp.com header.s=20230601 header.b=XlF68ifQ; spf=pass (imf01.hostedemail.com: domain of slava@dubeyko.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=slava@dubeyko.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717585367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=cEYhMckwB2/P4xzPW88npOsZyYkpk9RU9J5XlEXZ8BE=; b=ctCe3JPVIXSq+4pw3Bzbw06cnNzZ2vxOagJuICpZvUVC/8XrPkBDTVJl9lA6qC4djDN6OF X0U7wS/x2a/cDkoSsYOOjqSBmI8yHwXJiSqdwe1AQkbi8Aeb8NrxI/nKju4vmsl0BIcPKZ k//juHSbuxdUBFwrfAnmsHUmirwuKHg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=dubeyko-com.20230601.gappssmtp.com header.s=20230601 header.b=XlF68ifQ; spf=pass (imf01.hostedemail.com: domain of slava@dubeyko.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=slava@dubeyko.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717585367; a=rsa-sha256; cv=none; b=21DrNNdEP9aA4Z4yNj4tJ+GWXD9+PnEa1bXFkXu0s6GRiPoKaEQQYqa1V1Af7NPfUjblny RSacFad1uSK1wXnH6kOYx47Z2SFuKUMHmcyFk9wIrz1flnbgZqSSYPI25UdX8ILCYxASGH 7AyM4ALpgxM4pxGRPJLJlLHnt+ZxTEU= Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-52b950aa47bso2967492e87.1 for ; Wed, 05 Jun 2024 04:02:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20230601.gappssmtp.com; s=20230601; t=1717585366; x=1718190166; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=cEYhMckwB2/P4xzPW88npOsZyYkpk9RU9J5XlEXZ8BE=; b=XlF68ifQWB8y5K5s1fK2aP9SrSVSfXzkvuATyeRPmSuJV8bCW9We/AghErRpcdaHya km0Di2bDwrm4bogcx4ZI2S3eTfpToVCMWNkwzcng5I8XSg9nFtPaRbqxYpqGgXvfYooq KYcdQZHjNSAuRYU2QIhMjvXPNlkV5l9yZPnn5c4+ZZER96XXEBCqYfQTEJ9zE/ciX1Sa 2GkPRatWrfgcVF2E4Yd6kJdH+K7CI3c7ksmD+L9j6MdD7VeQJS3SHizl3kHP687OMwKw Hec+ZXaPsH9yhwhLztSKTv+9g8eijvr89wkaM5Sgv/ALHweHx4jskyJ+0UgJ2ulb5ZTY jp/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717585366; x=1718190166; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cEYhMckwB2/P4xzPW88npOsZyYkpk9RU9J5XlEXZ8BE=; b=Qsuw+U3WhWe1pNSpPgq6l5PobyjO4gC091NtSo1jQ9V8keaRzBqnyGzF8IocaQGLk2 TkivbLWbnsiWXHukBcyTUfJ8SUmgtz7bLUoOAt7qqX6djqXA4sSWDzh75L8vNmRHf3fp DeOhh2GgU+SwEwv3cNmdkOYy3v1Z/uxo+omcr0kiqiRONGMuAsc2+iHTwFOe3rVYV15M FnWSITz6pPcJ+CFMn1Y8JHwpEYvLbG1/J8kAOkI5cX2qKk6Oq1TDBo1iig6KF8Ou+Gnf yAuHl/kYG8Xzn4gB33JZzBot39ieYbm45IUEalvRD1aze1guInAYWwv2FunOCRlggBQ6 bKww== X-Forwarded-Encrypted: i=1; AJvYcCXMIDJ149zaP9ou7uaSvvDc0FP2u5aV/UXfCMz5vLW/1WoKCoTQV+D8IomdgrKBXMz4CFjBDQC3cBKzTBqed8/4aAY= X-Gm-Message-State: AOJu0YxwEoxLNJy5FKI8hAEHYF2eGhwlR8sxj59UjDF/VujnyL6BTiV9 BGT078DpHrTzVeC4FcWyY8IKu05w85ft5xyyyzm01Vi4BSDGXWYOAbr6l22pgug= X-Google-Smtp-Source: AGHT+IFTiG3JX/p/DawD0Ae2OAwaEN0LLyKy41m3I3xduQ2hQnmGquYk6KqGSPjPVeNVFNBYMBPhsA== X-Received: by 2002:a05:6512:10d2:b0:51e:11d5:bca5 with SMTP id 2adb3069b0e04-52bab502b5cmr1623187e87.54.1717585365502; Wed, 05 Jun 2024 04:02:45 -0700 (PDT) Received: from ssdfs-test-0070.sigma.sbrf.ru ([84.252.147.254]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-52b84d7ff10sm1762868e87.210.2024.06.05.04.02.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 04:02:44 -0700 (PDT) From: Viacheslav Dubeyko To: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org Cc: slava@dubeiko.com, Viacheslav Dubeyko Subject: [RFC] ML infrastructure in Linux kernel Date: Wed, 5 Jun 2024 14:02:19 +0300 Message-Id: <20240605110219.7356-1-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: 3963u9jo5fre94zfcdtxdmmgod8nphae X-Rspamd-Queue-Id: 8C26C40017 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1717585367-538978 X-HE-Meta: U2FsdGVkX1/hzz5hYWTHvD/aS+p6ByTqpOm9xei+7hUAvcft0erRmWBu9LcrmScJ3wDCiiGhWdsrAflBpLWTMfKodJof43s6nexYJQcuR45+aBjUnx1mTZNQfNrPJf4YAqjw4Nqe8YocX4sP4jjnP36UP67gwooPAhhKD2qhgmZx2Ot6xh7VY+u7pygHK8BtCuThH13fnIGhUba3hhge3fgCnhz0IzYfqGvR4PwhrBzge8918k3wzzF91JrS7EZoZhJzG8g/DWopNTTltKYCR2CG0TcwPtBnMjF/rOMNdjQKwetdZFj3vX9xQAgLHdR0bhaXeOstYrSIsSO4FB/sv/4aPTkX+gSlUYcSV5xBsR4wlwcyg4aJrrPuwf+T0mEaN2FI8gSo/sq9p9zKhRp82Hh5qfnfYZASS1zhOp8DQGQrk8jVP8Z0KIPH3wcBigyfrpTaSUokFop9a+uFC/9x6biiTTbYs8r/FDhQK/ipWWFh2HnEDsEC1ziBD8kfk4/+sMyA7vuKp7qaavzHqoYZB9y5myBtkSq9sCp167YJJd1TCRMcOswGuEjWAM0SJ3NQUpo7B+p2ldMS7xmyiDdvwRvrs/BtNPeOZbjNpzjrbJ4p8JqxdZLNKl7/YoG5+K84ckJgXsZARJ4Xf20Vk4EDREalXyXz463LiWu/81kOma30/H1J//jjlYKjEicb9+6wJjkBN6pgoYTqWP45+4VTVIieH7afOvRJPvcT8GGnqkJt0VlmcmiLeTa1FwUG+rb4AbR4B8QUqw6cnR3qjxlrU3Wd7kSa2EtaJ+2KHXpoFEy77fHxqFMR6qj0uz5XOGgvLrObeuYPSNLxOasgxVzQ7qweAvQ7WJE3v+EYWmQ8By0EnhPwn9bJMPJomJ1fl6Muo0d9PVDW+eilrW1AQNaVJ8Xo/0XhPkXfL7nqMKOttrg+wkk8J6ejwunrfHr2DmXPD5VK12r/vTQqxPiTiV1 rIS7m0eR OqDzHZJ9P+8Uyk+nVFRy5hHMUftkBSmXa0tnF6g29gsVpJzbdx6Zx1ouIYKZ5OYfYxGvfEQ4BJ5UP3acOFRGtiJSebd+/rX3842XwIAIaXZzcFDJ0Mk5xzOlUlNuh3wMHGeQMqV6vtigmiUHtxum6Gceoyi4uXypaT7q6b4t5N6hLOU2INJskvkkkzhprvDlzIejYg1pXymunruZrsWiRFiFTAxZ7QfNGHqixkoccuIVRl36MnNdkV8c98aps+UnnQO/BZhkZuENAdnP9aTvtf3EjhaauSEpK4hwuaBF9cH006dDworJMS4YIBygJ31xP+ZYj1CWFlbdgCDq0nczzeY1Fq3CKssQlpGeox2QTPG0ZgLWcE4gA/PAjtkqWta/zoSvbauySLqmdCRBZXEYigAcx2Vo1pUs6hIqLObRR8h0OGi//I0hKv2pX/7/OZWQblqPVr1E+rPxPZOOv+8Hy7Gbg/FWcuLYuBTSX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello, I would like to initiate a discussion related to an unified infrastructure for ML workloads and user-space drivers. [PROBLEM STATEMENT] Last several years have revealed two important trends: (1) moving kernel-space functionality into user-space drivers (for example, SPDK, DPDK, ublk); (2) significant number of efforts of using ML models for various real life applications (for example, tuning kernel parameters, storage device failure prediction, fail slow drive detection, and so on). Both trends represent significant importance for the evolution of the Linux kernel. From one point of view, user-space drivers represent the way of decreasing the latency and improving the performance of operations. However, from another point of view, the approach of bypassing the Linux kernel introduces security and efficiency risks, potential synchronization issues of user-space threads, and breaking the Linux kernel architecture’s paradigm. Generally speaking, direct implementation of ML approaches in Linux kernel-space is very hard, inefficient, and problematic because of practical unavailability of floating point operations in the Linux kernel, and the computational power hungry nature of ML algorithms (especially, during training phase). It is possible to state that Linux kernel needs to introduce and to unify an infrastructure as for ML approaches as for user-space drivers. [WHY DO WE NEED ML in LINUX KERNEL?] Do we really need a ML infrastructure in the Linux kernel? First of all, it is really easy to imagine a lot of down to earth applications of ML algorithms for automation of routine operations during working with Linux kernel. Moreover, potentially, the ML subsystem could be used for automated research and statistics gathering on the whole fleet of running Linux kernels. Also, the ML subsystem is capable of writing documentation, tuning kernel parameters on the fly, kernel recompilation, and even automated reporting about bugs and crashes. Generally speaking, the ML subsystem potentially can extend the Linux kernel capabilities. The main question is how? [POTENTIAL INFRASTRUCTURE VISION] Technically speaking , both cases (user-space driver and ML subsystem) require a user-space functionality that can be considered as user-space extension of Linux kernel functionality. Such approach is similar to microkernel architecture by the minimal functionality on kernel side and the main functionality on user-space side with the mandatory minimization the number of context switches between kernel-space and user-space. The key responsibility of kernel-side agent (or subsystem) is the accounting of user-space extensions, synchronization of their access to shared resources or metadata on kernel side, statistics gathering and sharing it through the sysfs or specialized log file (likewise to syslog). For example, such specialized log file(s) can be used by ML user-space extensions for executing the ML algorithms with the goal of analyzing data and available statistics. Generally speaking, the main ML logic can be executed by extension(s) on the user-space side. This ML logic can elaborate some “recommendations”, for example, that can be shared with an ML agent on the kernel side. As a result, the kernel-space ML agent can check the shared “recommendations” and to apply the valid “recommendations” by means of Linux kernel tuning, recompilation, “hot” restart and so on. Technically speaking, the user-space driver requires pretty much the same architecture as the simple kernel-space agent/subsystem and user-space extension(s). The main functionality is on the user-space side and kernel-space side delivers only accounting the user-space extensions, allocating necessary resources, synchronizing access to shared resources, and gathering statistics. Generally speaking, such an approach implies the necessity of registering a specialized driver class that could represent an ML agent or user-space driver on kernel side. Then, it will be possible to use a modprobe-like model to create an instance of ML agent or user-space driver. Finally, we will have the kernel-space agent that is connected to the user-space extension. The key point here is that the user-space extension can directly communicate with a hardware device, but the kernel-space side can account for the activity of the user-space extension and allocates resources. It is possible to suggest an unified architecture of the kernel-side agent that will be specialized by the logic of user-space extension. But the logic of the kernel-space agent should be minimal, simple, and unified as much as possible. Technically speaking, the logic of kernel-space agent can be defined by the eBPF program and eBPF arena (or shared memory between kernel-space and user-space) can be used for interaction between the kernel-space agent and the user-space extension. And such interaction could be implemented through submission and completion queues, for example. As a summary, described architecture is capable of implementing ML infrastructure in Linux kernel and unification of user-space drivers architecture. Any opinion on this? How feasible could be such vision? Thanks, Slava.