* [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel
@ 2026-02-06 19:38 Viacheslav Dubeyko
2026-02-06 23:28 ` Hillf Danton
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Viacheslav Dubeyko @ 2026-02-06 19:38 UTC (permalink / raw)
To: lsf-pc
Cc: Viacheslav Dubeyko, linux-mm, Pavan Rallabhandi, linux-fsdevel,
linux-kernel, bpf
Hello,
Machine Learning (ML) is approach/area of learning from data,
finding patterns, and making predictions without implementing algorithms
by developers. The number of areas of ML applications is growing
with every day. Generally speaking, ML can introduce a self-evolving and
self-learning capability in Linux kernel. There are already research works
and industry efforts to employ ML approaches for configuration and
optimization the Linux kernel. However, introduction of ML approaches
in Linux kernel is not so simple and straightforward way. There are multiple
problems and unanswered questions on this road. First of all, any ML model
requires the floating-point operations (FPU) for running. But there is
no direct use of FPUs in kernel space. Also, ML model requires training phase
that can be a reason of significant performance degradation of Linux kernel.
Even inference phase could be problematic from the performance point of view
on kernel side. The using of ML approaches in Linux kernel is inevitable step.
But, how can we use ML approaches in Linux kernel? Which infrastructure
do we need to adopt ML models in Linux kernel?
What is the goal of using ML models in Linux kernel? The main goal is
to employ ML models for elaboration of a logic of particular Linux kernel
subsystem based on processing data or/and an efficient subsystem
configuration based on internal state of subsystem. As a result, it needs:
(1) collect data for training, (2) execute ML model training phase,
(3) test trained ML model, (4) use ML model for executing the inference phase.
The ML model inference can be used for recommendation of Linux kernel
subsystem configuration or/and for injecting a synthesized subsystem logic
into kernel space (for example, eBPF logic).
How ML infrastructure can be designed in Linux kernel? It needs to introduce
in Linux kernel a special ML library that can implement a generalized
interface of interaction between ML model’s thread in user-space and kernel
subsystem. Likewise interface requires to have the means:
(1) create/initialize/destroy ML model proxy in kernel subsystem,
(2) start/stop ML model proxy, (3) get/preprocess/publish data sets
from kernel space, (4) receive/preprocess/apply ML model recommendation(s)
from user-space, (5) execute synthesized logic/recommendations in kernel-space,
(6) estimate efficiency of synthesized logic/recommendations,
(7) execute error back-propagation with the goal of correction ML model
on user-space side.
The create and initialize logic can be executed by kernel subsystem during
module load or Linux kernel start (oppositely, module unload or kernel
shutdown will execute destroy of ML model proxy logic). ML model thread
in user-space will be capable to re-initialize and to execute
the start/stop logic of ML model proxy on kernel side. First of all,
ML model needs to be trained by data from kernel space. The data can be
requested by ML model from user-space or data can be published by ML model
proxy from kernel-space. The sysfs interface can be used to orchestrate
this interaction. As a result, ML model in user-space should be capable
to extract data set(s) from kernel space through sysfs, FUSE or character
device. Extracted data can be stored in persistent storage and, finally,
ML model can be trained in user-space by accessing these data.
The continuous learning model can be adopted during training phase.
It implies that kernel subsystem can receive ML model recommendations
even during training phase. ML model proxy on kernel side can estimate
the current kernel subsystem state, tries to apply the ML model
recommendations, and estimate the efficiency of applied recommendations.
Generally speaking, ML model proxy on kernel side can consider several
modes of interaction with ML model recommendations: (1) emergency mode,
(2) learning mode, (3) collaboration mode, (4) recommendation mode.
The emergency mode is the mode when kernel subsystem is in critical state
and it is required to work as efficient as possible without capability of
involving the ML model recommendations (for example, ML model
recommendations are completely inadequate or load is very high).
The learning mode implies that kernel subsystem can try to apply
the ML model recommendations for some operations with the goal of
estimation the maturity of ML model. Also, ML model proxy can degrade
the mode to learning state if ML model recommendations becomes inefficient.
The collaboration mode has the goal of using ML recommendations in
50% of operations with the goal of achieving mature state of ML model.
And, finally, ML model proxy can convert kernel subsystem in recommendation
mode if ML model is mature enough and efficiency of applying
the ML recommendations is higher than using human-made algorithms.
The back-propagation approach can be used to correct the ML model
by means of sharing feedback of efficiency estimation from kernel
to user-space side.
I would like to discuss the approach of ML library in Linux kernel
that can provide the way to run/employ ML models in Linux kernel.
Thanks,
Slava.
[REFERENCES]
[1]
https://lore.kernel.org/linux-fsdevel/20240605110219.7356-1-slava@dubeyko.com/
[2] https://www.youtube.com/watch?v=E7q0SKeniXU
[3] https://github.com/kernel-ml-lib/ml-lib
[4] https://github.com/kernel-ml-lib/ml-lib-linux
[5]
https://lore.kernel.org/linux-fsdevel/20260206191136.2609767-1-slava@dubeyko.com/T/#t
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-06 19:38 [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel Viacheslav Dubeyko @ 2026-02-06 23:28 ` Hillf Danton 2026-02-09 10:03 ` Chris Li 2026-02-09 10:25 ` Barry Song 2 siblings, 0 replies; 17+ messages in thread From: Hillf Danton @ 2026-02-06 23:28 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: lsf-pc, Viacheslav Dubeyko, linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, bpf On Fri, 6 Feb 2026 19:38:28 +0000 Viacheslav Dubeyko wrote: > Hello, > > Machine Learning (ML) is approach/area of learning from data, > finding patterns, and making predictions without implementing algorithms > by developers. The number of areas of ML applications is growing > with every day. Generally speaking, ML can introduce a self-evolving and > self-learning capability in Linux kernel. There are already research works > and industry efforts to employ ML approaches for configuration and > optimization the Linux kernel. However, introduction of ML approaches > in Linux kernel is not so simple and straightforward way. There are multiple > problems and unanswered questions on this road. First of all, any ML model > requires the floating-point operations (FPU) for running. But there is > no direct use of FPUs in kernel space. Also, ML model requires training phase > that can be a reason of significant performance degradation of Linux kernel. > Even inference phase could be problematic from the performance point of view > on kernel side. The using of ML approaches in Linux kernel is inevitable step. > But, how can we use ML approaches in Linux kernel? Which infrastructure > do we need to adopt ML models in Linux kernel? > Given the short list, eevdf, slab, ext4, IP stack, usb bus and kvm, ML is not needed before the second half in 2027, because it wastes minutes to make either liver or pancreas intelligent. By intelligent I mean liver can edit ppt in Russian. Perhaps Cerebellum is an exception. Can you build bot to fix syzbot reports before 2028? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-06 19:38 [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel Viacheslav Dubeyko 2026-02-06 23:28 ` Hillf Danton @ 2026-02-09 10:03 ` Chris Li 2026-02-09 22:28 ` Viacheslav Dubeyko 2026-02-09 10:25 ` Barry Song 2 siblings, 1 reply; 17+ messages in thread From: Chris Li @ 2026-02-09 10:03 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: lsf-pc, Viacheslav Dubeyko, linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, bpf, Chris Mason On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > Hello, > > Machine Learning (ML) is approach/area of learning from data, > finding patterns, and making predictions without implementing algorithms > by developers. The number of areas of ML applications is growing > with every day. Generally speaking, ML can introduce a self-evolving and > self-learning capability in Linux kernel. There are already research works > and industry efforts to employ ML approaches for configuration and > optimization the Linux kernel. However, introduction of ML approaches > in Linux kernel is not so simple and straightforward way. There are multiple > problems and unanswered questions on this road. First of all, any ML model > requires the floating-point operations (FPU) for running. But there is > no direct use of FPUs in kernel space. Also, ML model requires training phase > that can be a reason of significant performance degradation of Linux kernel. > Even inference phase could be problematic from the performance point of view > on kernel side. The using of ML approaches in Linux kernel is inevitable step. > But, how can we use ML approaches in Linux kernel? Which infrastructure > do we need to adopt ML models in Linux kernel? I think there are two different things, I think you want the latter but I am not sure 1) using ML model to help kernel development, code reviews, generate patches by descriptions etc. For example, Chris Mason has a kernel review repo on github and he is sharing his review finding the mailing list: https://github.com/masoncl/review-prompts/tree/main It is kernel development related, but the ML agent code is running in the user space. The actual ML computation might run GPU/TPUs. That does not seem to be what you have in mind. 2) Run the ML model computation in the kernel space. Can you clarify if this is what you have in mind? You mention kernel FPU usage in the kernel for ML model. It is only relevant if you need to run the FP in the kernel CPU instructions. Most ML computations are not run in CPU instructions. They run on GPUs/TPUs. Why not keep the ML program (PyTorch/agents) in the user space and pass the data to the GPU/TPU driver to run? There will be some kernel instructure like VFIO/IOMMU involved with the GPU/TPU driver. For the most part the kernel is just facilitating the data passing to/from the GPU/TPU driver then to the GPU/TPU hardware. The ML hardware is doing the heavy lifting. > What is the goal of using ML models in Linux kernel? The main goal is > to employ ML models for elaboration of a logic of particular Linux kernel > subsystem based on processing data or/and an efficient subsystem > configuration based on internal state of subsystem. As a result, it needs: > (1) collect data for training, (2) execute ML model training phase, > (3) test trained ML model, (4) use ML model for executing the inference phase. As far as I can tell, a lot of those don't need to be in the kernel's business. It is more of a GPU/TPU driver user space interface thing, might be easier to allow the driver to convert their own kernel/user space API then expose common user space library API. Are you trying to define something like Nvidia CUDA at the kernel level? > The ML model inference can be used for recommendation of Linux kernel > subsystem configuration or/and for injecting a synthesized subsystem logic > into kernel space (for example, eBPF logic). That again sounds very much like a userspace issue, the above 1) usage case. > How ML infrastructure can be designed in Linux kernel? It needs to introduce > in Linux kernel a special ML library that can implement a generalized > interface of interaction between ML model’s thread in user-space and kernel > subsystem. Likewise interface requires to have the means: > (1) create/initialize/destroy ML model proxy in kernel subsystem, > (2) start/stop ML model proxy, (3) get/preprocess/publish data sets > from kernel space, (4) receive/preprocess/apply ML model recommendation(s) > from user-space, (5) execute synthesized logic/recommendations in kernel-space, > (6) estimate efficiency of synthesized logic/recommendations, > (7) execute error back-propagation with the goal of correction ML model > on user-space side. Unfortunately a lot of those will be tight to the internal implementation of the GPU/TPU. The model needs to be compiled into GPU/TPU machine instructions. So forcing a common interface will be hard because the lower interface requirement might be very different. Maybe having some common user space library or ML description language is better than forcing a kernel interface. > The create and initialize logic can be executed by kernel subsystem during > module load or Linux kernel start (oppositely, module unload or kernel > shutdown will execute destroy of ML model proxy logic). ML model thread > in user-space will be capable to re-initialize and to execute > the start/stop logic of ML model proxy on kernel side. First of all, > ML model needs to be trained by data from kernel space. The data can be > requested by ML model from user-space or data can be published by ML model > proxy from kernel-space. The sysfs interface can be used to orchestrate > this interaction. As a result, ML model in user-space should be capable > to extract data set(s) from kernel space through sysfs, FUSE or character > device. Extracted data can be stored in persistent storage and, finally, > ML model can be trained in user-space by accessing these data. Currently a lot of those are happening in the GPU/TPU drivers and user space library. One challenging aspect is the hardware interface is very different between GPUs/TPUs, and might be challenging to expose common interfaces. > The continuous learning model can be adopted during training phase. > It implies that kernel subsystem can receive ML model recommendations > even during training phase. ML model proxy on kernel side can estimate > the current kernel subsystem state, tries to apply the ML model > recommendations, and estimate the efficiency of applied recommendations. > Generally speaking, ML model proxy on kernel side can consider several > modes of interaction with ML model recommendations: (1) emergency mode, That sounds like user space interaction again. Not sure it is for the kernel space. Chris ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-09 10:03 ` Chris Li @ 2026-02-09 22:28 ` Viacheslav Dubeyko 2026-02-10 13:47 ` [Lsf-pc] " Jan Kara 0 siblings, 1 reply; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-02-09 22:28 UTC (permalink / raw) To: chrisl Cc: clm, linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, lsf-pc, bpf On Mon, 2026-02-09 at 02:03 -0800, Chris Li wrote: > On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko > <Slava.Dubeyko@ibm.com> wrote: > > > > Hello, > > > > Machine Learning (ML) is approach/area of learning from data, > > finding patterns, and making predictions without implementing algorithms > > by developers. The number of areas of ML applications is growing > > with every day. Generally speaking, ML can introduce a self-evolving and > > self-learning capability in Linux kernel. There are already research works > > and industry efforts to employ ML approaches for configuration and > > optimization the Linux kernel. However, introduction of ML approaches > > in Linux kernel is not so simple and straightforward way. There are multiple > > problems and unanswered questions on this road. First of all, any ML model > > requires the floating-point operations (FPU) for running. But there is > > no direct use of FPUs in kernel space. Also, ML model requires training phase > > that can be a reason of significant performance degradation of Linux kernel. > > Even inference phase could be problematic from the performance point of view > > on kernel side. The using of ML approaches in Linux kernel is inevitable step. > > But, how can we use ML approaches in Linux kernel? Which infrastructure > > do we need to adopt ML models in Linux kernel? > > I think there are two different things, I think you want the latter > but I am not sure > > 1) using ML model to help kernel development, code reviews, generate > patches by descriptions etc. For example, Chris Mason has a kernel > review repo on github and he is sharing his review finding the mailing > list: > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_masoncl_review-2Dprompts_tree_main&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vvrDPxyw_JXPrkC8BjzA2kEtwdPfwV2gBMEXG7ZveXM4LhS01LfoGwqhEyUZpPe4&s=rqNez5_rmiEuE7in5e_7MfyUzzqzaA6Gk46WWvmN3yk&e= > It is kernel development related, but the ML agent code is running in > the user space. The actual ML computation might run GPU/TPUs. That > does not seem to be what you have in mind. > > 2) Run the ML model computation in the kernel space. > Can you clarify if this is what you have in mind? You mention kernel > FPU usage in the kernel for ML model. It is only relevant if you need > to run the FP in the kernel CPU instructions. Most ML computations are > not run in CPU instructions. They run on GPUs/TPUs. Why not keep the > ML program (PyTorch/agents) in the user space and pass the data to the > GPU/TPU driver to run? There will be some kernel instructure like > VFIO/IOMMU involved with the GPU/TPU driver. For the most part the > kernel is just facilitating the data passing to/from the GPU/TPU > driver then to the GPU/TPU hardware. The ML hardware is doing the > heavy lifting. The idea is to have ML model running in user-space and kernel subsystem can interact with ML model in user-space. As the next step, I am considering two real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON approach. So, for example, GC can be represented by ML model in user-space. GC can request data (segments state) from kernel-space and ML model in user-space can do training or/and inference. As a result, ML model in user-space can select victim segments and instruct kernel-space logic of moving valid data from victim segment(s) into clean/current one(s). > > > What is the goal of using ML models in Linux kernel? The main goal is > > to employ ML models for elaboration of a logic of particular Linux kernel > > subsystem based on processing data or/and an efficient subsystem > > configuration based on internal state of subsystem. As a result, it needs: > > (1) collect data for training, (2) execute ML model training phase, > > (3) test trained ML model, (4) use ML model for executing the inference phase. > > As far as I can tell, a lot of those don't need to be in the kernel's > business. It is more of a GPU/TPU driver user space interface thing, > might be easier to allow the driver to convert their own kernel/user > space API then expose common user space library API. Are you trying to > define something like Nvidia CUDA at the kernel level? > > > The ML model inference can be used for recommendation of Linux kernel > > subsystem configuration or/and for injecting a synthesized subsystem logic > > into kernel space (for example, eBPF logic). > > That again sounds very much like a userspace issue, the above 1) usage case. > > > How ML infrastructure can be designed in Linux kernel? It needs to introduce > > in Linux kernel a special ML library that can implement a generalized > > interface of interaction between ML model’s thread in user-space and kernel > > subsystem. Likewise interface requires to have the means: > > (1) create/initialize/destroy ML model proxy in kernel subsystem, > > (2) start/stop ML model proxy, (3) get/preprocess/publish data sets > > from kernel space, (4) receive/preprocess/apply ML model recommendation(s) > > from user-space, (5) execute synthesized logic/recommendations in kernel-space, > > (6) estimate efficiency of synthesized logic/recommendations, > > (7) execute error back-propagation with the goal of correction ML model > > on user-space side. > > Unfortunately a lot of those will be tight to the internal > implementation of the GPU/TPU. The model needs to be compiled into > GPU/TPU machine instructions. So forcing a common interface will be > hard because the lower interface requirement might be very different. > Maybe having some common user space library or ML description language > is better than forcing a kernel interface. > > > The create and initialize logic can be executed by kernel subsystem during > > module load or Linux kernel start (oppositely, module unload or kernel > > shutdown will execute destroy of ML model proxy logic). ML model thread > > in user-space will be capable to re-initialize and to execute > > the start/stop logic of ML model proxy on kernel side. First of all, > > ML model needs to be trained by data from kernel space. The data can be > > requested by ML model from user-space or data can be published by ML model > > proxy from kernel-space. The sysfs interface can be used to orchestrate > > this interaction. As a result, ML model in user-space should be capable > > to extract data set(s) from kernel space through sysfs, FUSE or character > > device. Extracted data can be stored in persistent storage and, finally, > > ML model can be trained in user-space by accessing these data. > > Currently a lot of those are happening in the GPU/TPU drivers and user > space library. One challenging aspect is the hardware interface is > very different between GPUs/TPUs, and might be challenging to expose > common interfaces. > > > The continuous learning model can be adopted during training phase. > > It implies that kernel subsystem can receive ML model recommendations > > even during training phase. ML model proxy on kernel side can estimate > > the current kernel subsystem state, tries to apply the ML model > > recommendations, and estimate the efficiency of applied recommendations. > > Generally speaking, ML model proxy on kernel side can consider several > > modes of interaction with ML model recommendations: (1) emergency mode, > > That sounds like user space interaction again. Not sure it is for the > kernel space. Thanks a lot for sharing all your thoughts. :) I think I need to point out that: ML model running in user-space and kernel subsystem can interact with ML model in user-space. :) This is the main idea. The goal of ML library is to implement generalized interface/functionality that can give the capability for any kernel subsystem to be extended by ML model in user-space. And I believe that we can provide this in generic way. And you can check the patchset [1] to see the vision of potential implementation of the idea. Thanks, Slava. [1] https://lore.kernel.org/linux-fsdevel/20260206191136.2609767-1-slava@dubeyko.com/T/#t ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-09 22:28 ` Viacheslav Dubeyko @ 2026-02-10 13:47 ` Jan Kara 2026-02-10 14:20 ` Chris Mason 2026-02-10 21:02 ` Viacheslav Dubeyko 0 siblings, 2 replies; 17+ messages in thread From: Jan Kara @ 2026-02-10 13:47 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: chrisl, clm, linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, lsf-pc, bpf On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote: > On Mon, 2026-02-09 at 02:03 -0800, Chris Li wrote: > > On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko > > <Slava.Dubeyko@ibm.com> wrote: > > > > > > Hello, > > > > > > Machine Learning (ML) is approach/area of learning from data, > > > finding patterns, and making predictions without implementing algorithms > > > by developers. The number of areas of ML applications is growing > > > with every day. Generally speaking, ML can introduce a self-evolving and > > > self-learning capability in Linux kernel. There are already research works > > > and industry efforts to employ ML approaches for configuration and > > > optimization the Linux kernel. However, introduction of ML approaches > > > in Linux kernel is not so simple and straightforward way. There are multiple > > > problems and unanswered questions on this road. First of all, any ML model > > > requires the floating-point operations (FPU) for running. But there is > > > no direct use of FPUs in kernel space. Also, ML model requires training phase > > > that can be a reason of significant performance degradation of Linux kernel. > > > Even inference phase could be problematic from the performance point of view > > > on kernel side. The using of ML approaches in Linux kernel is inevitable step. > > > But, how can we use ML approaches in Linux kernel? Which infrastructure > > > do we need to adopt ML models in Linux kernel? > > > > I think there are two different things, I think you want the latter > > but I am not sure > > > > 1) using ML model to help kernel development, code reviews, generate > > patches by descriptions etc. For example, Chris Mason has a kernel > > review repo on github and he is sharing his review finding the mailing > > list: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_masoncl_review-2Dprompts_tree_main&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vvrDPxyw_JXPrkC8BjzA2kEtwdPfwV2gBMEXG7ZveXM4LhS01LfoGwqhEyUZpPe4&s=rqNez5_rmiEuE7in5e_7MfyUzzqzaA6Gk46WWvmN3yk&e= > > It is kernel development related, but the ML agent code is running in > > the user space. The actual ML computation might run GPU/TPUs. That > > does not seem to be what you have in mind. > > > > 2) Run the ML model computation in the kernel space. > > Can you clarify if this is what you have in mind? You mention kernel > > FPU usage in the kernel for ML model. It is only relevant if you need > > to run the FP in the kernel CPU instructions. Most ML computations are > > not run in CPU instructions. They run on GPUs/TPUs. Why not keep the > > ML program (PyTorch/agents) in the user space and pass the data to the > > GPU/TPU driver to run? There will be some kernel instructure like > > VFIO/IOMMU involved with the GPU/TPU driver. For the most part the > > kernel is just facilitating the data passing to/from the GPU/TPU > > driver then to the GPU/TPU hardware. The ML hardware is doing the > > heavy lifting. > > The idea is to have ML model running in user-space and kernel subsystem can > interact with ML model in user-space. As the next step, I am considering two > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON > approach. So, for example, GC can be represented by ML model in user-space. GC > can request data (segments state) from kernel-space and ML model in user-space > can do training or/and inference. As a result, ML model in user-space can select > victim segments and instruct kernel-space logic of moving valid data from victim > segment(s) into clean/current one(s). To be honest I'm skeptical about how generic this can be. Essentially you're describing a generic interface to offload arbitrary kernel decision to userspace. ML is a userspace bussiness here and not really relevant for the concept AFAICT. And we already have several ways of kernel asking userspace to do something for it and unless it is very restricted and well defined it is rather painful, prone to deadlocks, security issues etc. So by all means if you want to do GC decisions for your filesystem in userspace by ML, be my guest, it does make some sense although I'd be wary of issues where we need to writeback dirty pages to free memory which may now depend on your userspace helper to make a decision which may need the memory to do the decision... But I don't see why you need all the ML fluff around it when it seems like just another way to call userspace helper and why some of the existing methods would not suffice. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-10 13:47 ` [Lsf-pc] " Jan Kara @ 2026-02-10 14:20 ` Chris Mason 2026-02-10 22:36 ` Viacheslav Dubeyko 2026-02-10 21:02 ` Viacheslav Dubeyko 1 sibling, 1 reply; 17+ messages in thread From: Chris Mason @ 2026-02-10 14:20 UTC (permalink / raw) To: Jan Kara, Viacheslav Dubeyko Cc: chrisl, linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, lsf-pc, bpf On 2/10/26 8:47 AM, Jan Kara wrote: > On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote: >> On Mon, 2026-02-09 at 02:03 -0800, Chris Li wrote: >>> On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko >>> <Slava.Dubeyko@ibm.com> wrote: >>>> >>>> Hello, >>>> >>>> Machine Learning (ML) is approach/area of learning from data, >>>> finding patterns, and making predictions without implementing algorithms >>>> by developers. The number of areas of ML applications is growing >>>> with every day. Generally speaking, ML can introduce a self-evolving and >>>> self-learning capability in Linux kernel. There are already research works >>>> and industry efforts to employ ML approaches for configuration and >>>> optimization the Linux kernel. However, introduction of ML approaches >>>> in Linux kernel is not so simple and straightforward way. There are multiple >>>> problems and unanswered questions on this road. First of all, any ML model >>>> requires the floating-point operations (FPU) for running. But there is >>>> no direct use of FPUs in kernel space. Also, ML model requires training phase >>>> that can be a reason of significant performance degradation of Linux kernel. >>>> Even inference phase could be problematic from the performance point of view >>>> on kernel side. The using of ML approaches in Linux kernel is inevitable step. >>>> But, how can we use ML approaches in Linux kernel? Which infrastructure >>>> do we need to adopt ML models in Linux kernel? >>> >>> I think there are two different things, I think you want the latter >>> but I am not sure >>> >>> 1) using ML model to help kernel development, code reviews, generate >>> patches by descriptions etc. For example, Chris Mason has a kernel >>> review repo on github and he is sharing his review finding the mailing >>> list: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_masoncl_review-2Dprompts_tree_main&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vvrDPxyw_JXPrkC8BjzA2kEtwdPfwV2gBMEXG7ZveXM4LhS01LfoGwqhEyUZpPe4&s=rqNez5_rmiEuE7in5e_7MfyUzzqzaA6Gk46WWvmN3yk&e= >>> It is kernel development related, but the ML agent code is running in >>> the user space. The actual ML computation might run GPU/TPUs. That >>> does not seem to be what you have in mind. >>> >>> 2) Run the ML model computation in the kernel space. >>> Can you clarify if this is what you have in mind? You mention kernel >>> FPU usage in the kernel for ML model. It is only relevant if you need >>> to run the FP in the kernel CPU instructions. Most ML computations are >>> not run in CPU instructions. They run on GPUs/TPUs. Why not keep the >>> ML program (PyTorch/agents) in the user space and pass the data to the >>> GPU/TPU driver to run? There will be some kernel instructure like >>> VFIO/IOMMU involved with the GPU/TPU driver. For the most part the >>> kernel is just facilitating the data passing to/from the GPU/TPU >>> driver then to the GPU/TPU hardware. The ML hardware is doing the >>> heavy lifting. >> >> The idea is to have ML model running in user-space and kernel subsystem can >> interact with ML model in user-space. As the next step, I am considering two >> real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON >> approach. So, for example, GC can be represented by ML model in user-space. GC >> can request data (segments state) from kernel-space and ML model in user-space >> can do training or/and inference. As a result, ML model in user-space can select >> victim segments and instruct kernel-space logic of moving valid data from victim >> segment(s) into clean/current one(s). > > To be honest I'm skeptical about how generic this can be. Essentially > you're describing a generic interface to offload arbitrary kernel decision > to userspace. ML is a userspace bussiness here and not really relevant for > the concept AFAICT. And we already have several ways of kernel asking > userspace to do something for it and unless it is very restricted and well > defined it is rather painful, prone to deadlocks, security issues etc. > > So by all means if you want to do GC decisions for your filesystem in > userspace by ML, be my guest, it does make some sense although I'd be wary > of issues where we need to writeback dirty pages to free memory which may > now depend on your userspace helper to make a decision which may need the > memory to do the decision... But I don't see why you need all the ML fluff > around it when it seems like just another way to call userspace helper and > why some of the existing methods would not suffice. Looking through the description (not the code, apologies), it really feels like we're reinventing BPF here: - introspection into what the kernel is currently doing - communications channel with applications - a mechanism to override specific kernel functionality - fancy applications arbitrating decisions. My feedback during plumbers and also today is that you can get 99% of what you're looking for with some BPF code. It may or may not be perfect for your needs, but it's a much faster path to generate community and collaboration around the goals. After that, it's a lot easier to justify larger changes in the kernel. If this becomes an LSF/MM topic, my bar for discussion would be: - extensive data collected about some kernel component (Damon, scheduling etc) - working proof of concept that improved on decisions made in the kernel - discussion of changes needed to improve or enable the proof of concept In other words, I don't think we need a list of ways ML might be used. I think we need specific examples of a way that ML was used and why it's better than what the kernel is already doing. -chris ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-10 14:20 ` Chris Mason @ 2026-02-10 22:36 ` Viacheslav Dubeyko 2026-02-11 1:30 ` SeongJae Park 0 siblings, 1 reply; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-02-10 22:36 UTC (permalink / raw) To: jack, clm Cc: bpf, linux-mm, chrisl, Pavan Rallabhandi, linux-kernel, linux-fsdevel, lsf-pc On Tue, 2026-02-10 at 09:20 -0500, Chris Mason wrote: > On 2/10/26 8:47 AM, Jan Kara wrote: > > On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote: > > > On Mon, 2026-02-09 at 02:03 -0800, Chris Li wrote: > > > > On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko > > > > <Slava.Dubeyko@ibm.com> wrote: > > > > > > > > > > Hello, > > > > > > > > > > Machine Learning (ML) is approach/area of learning from data, > > > > > finding patterns, and making predictions without implementing algorithms > > > > > by developers. The number of areas of ML applications is growing > > > > > with every day. Generally speaking, ML can introduce a self-evolving and > > > > > self-learning capability in Linux kernel. There are already research works > > > > > and industry efforts to employ ML approaches for configuration and > > > > > optimization the Linux kernel. However, introduction of ML approaches > > > > > in Linux kernel is not so simple and straightforward way. There are multiple > > > > > problems and unanswered questions on this road. First of all, any ML model > > > > > requires the floating-point operations (FPU) for running. But there is > > > > > no direct use of FPUs in kernel space. Also, ML model requires training phase > > > > > that can be a reason of significant performance degradation of Linux kernel. > > > > > Even inference phase could be problematic from the performance point of view > > > > > on kernel side. The using of ML approaches in Linux kernel is inevitable step. > > > > > But, how can we use ML approaches in Linux kernel? Which infrastructure > > > > > do we need to adopt ML models in Linux kernel? > > > > > > > > I think there are two different things, I think you want the latter > > > > but I am not sure > > > > > > > > 1) using ML model to help kernel development, code reviews, generate > > > > patches by descriptions etc. For example, Chris Mason has a kernel > > > > review repo on github and he is sharing his review finding the mailing > > > > list: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_masoncl_review-2Dprompts_tree_main&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vvrDPxyw_JXPrkC8BjzA2kEtwdPfwV2gBMEXG7ZveXM4LhS01LfoGwqhEyUZpPe4&s=rqNez5_rmiEuE7in5e_7MfyUzzqzaA6Gk46WWvmN3yk&e= > > > > It is kernel development related, but the ML agent code is running in > > > > the user space. The actual ML computation might run GPU/TPUs. That > > > > does not seem to be what you have in mind. > > > > > > > > 2) Run the ML model computation in the kernel space. > > > > Can you clarify if this is what you have in mind? You mention kernel > > > > FPU usage in the kernel for ML model. It is only relevant if you need > > > > to run the FP in the kernel CPU instructions. Most ML computations are > > > > not run in CPU instructions. They run on GPUs/TPUs. Why not keep the > > > > ML program (PyTorch/agents) in the user space and pass the data to the > > > > GPU/TPU driver to run? There will be some kernel instructure like > > > > VFIO/IOMMU involved with the GPU/TPU driver. For the most part the > > > > kernel is just facilitating the data passing to/from the GPU/TPU > > > > driver then to the GPU/TPU hardware. The ML hardware is doing the > > > > heavy lifting. > > > > > > The idea is to have ML model running in user-space and kernel subsystem can > > > interact with ML model in user-space. As the next step, I am considering two > > > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON > > > approach. So, for example, GC can be represented by ML model in user-space. GC > > > can request data (segments state) from kernel-space and ML model in user-space > > > can do training or/and inference. As a result, ML model in user-space can select > > > victim segments and instruct kernel-space logic of moving valid data from victim > > > segment(s) into clean/current one(s). > > > > To be honest I'm skeptical about how generic this can be. Essentially > > you're describing a generic interface to offload arbitrary kernel decision > > to userspace. ML is a userspace bussiness here and not really relevant for > > the concept AFAICT. And we already have several ways of kernel asking > > userspace to do something for it and unless it is very restricted and well > > defined it is rather painful, prone to deadlocks, security issues etc. > > > > So by all means if you want to do GC decisions for your filesystem in > > userspace by ML, be my guest, it does make some sense although I'd be wary > > of issues where we need to writeback dirty pages to free memory which may > > now depend on your userspace helper to make a decision which may need the > > memory to do the decision... But I don't see why you need all the ML fluff > > around it when it seems like just another way to call userspace helper and > > why some of the existing methods would not suffice. > > Looking through the description (not the code, apologies), it really > feels like we're reinventing BPF here: > > - introspection into what the kernel is currently doing > - communications channel with applications > - a mechanism to override specific kernel functionality > - fancy applications arbitrating decisions. > > My feedback during plumbers and also today is that you can get 99% of > what you're looking for with some BPF code. I see your point. And I can agree with you that eBPF could be used as a communication channel. I don't try to invent a new communication channel. My point here that ML library should be the unified means of extending kernel subsystem by ML model(s) in user-space. So, eBPF could be the one of (or, maybe, only one) possible communication mechanism. ML library should provide the unified framework and workflow for easy adding and using ML model(s) in user- space by kernel subsystems. > > It may or may not be perfect for your needs, but it's a much faster path > to generate community and collaboration around the goals. After that, > it's a lot easier to justify larger changes in the kernel. > Yeah, makes sense. My current patchset is exploring the API that ML library should provide. And eBPF could be communication channel between ML model in user-space and kernel subsystem. > If this becomes an LSF/MM topic, my bar for discussion would be: > - extensive data collected about some kernel component (Damon, > scheduling etc) Exactly, ML-based DAMON approach by using ML library is my next implementation/exploring step. > - working proof of concept that improved on decisions made in the kernel Also, I am considering GC of LFS file system like low-hanging fruit for checking the ML library approach. Especially, because, for example, NILFS2 has GC as user-space process and it requires elaboration of efficient GC policy. So, it could be potential proof of concept for the whole idea. Ideally, several use- cases should benefit from the idea. > - discussion of changes needed to improve or enable the proof of concept Makes sense. This is why I've shared the patchset with initial vision of ML library API. The goal is to hear all possible critics and to check the capability of idea (and me) to survive. :) > > In other words, I don't think we need a list of ways ML might be used. > I think we need specific examples of a way that ML was used and why it's > better than what the kernel is already doing. > Yes, as the next step, I am going to explore: (1) GC of LFS file system use- case, (2) ML-based DAMON approach. I hope to have enough time enough time to implement it before May and to share some numbers/results. Thanks, Slava. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-10 22:36 ` Viacheslav Dubeyko @ 2026-02-11 1:30 ` SeongJae Park 2026-02-11 20:29 ` Viacheslav Dubeyko 0 siblings, 1 reply; 17+ messages in thread From: SeongJae Park @ 2026-02-11 1:30 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: SeongJae Park, jack, clm, bpf, linux-mm, chrisl, Pavan Rallabhandi, linux-kernel, linux-fsdevel, lsf-pc On Tue, 10 Feb 2026 22:36:35 +0000 Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > Exactly, ML-based DAMON approach by using ML library is my next > implementation/exploring step. Glad to hear this. If you find any question or need help for DAMON while doing this, please feel free to reach out. I will be more than happy to help :) Thanks, SJ [...] ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-11 1:30 ` SeongJae Park @ 2026-02-11 20:29 ` Viacheslav Dubeyko 0 siblings, 0 replies; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-02-11 20:29 UTC (permalink / raw) To: sj Cc: jack, linux-fsdevel, linux-mm, linux-kernel, lsf-pc, chrisl, bpf, Pavan Rallabhandi, clm On Tue, 2026-02-10 at 17:30 -0800, SeongJae Park wrote: > On Tue, 10 Feb 2026 22:36:35 +0000 Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > > Exactly, ML-based DAMON approach by using ML library is my next > > implementation/exploring step. > > Glad to hear this. If you find any question or need help for DAMON while doing > this, please feel free to reach out. I will be more than happy to help :) > > Sounds good! Let me start my implementation efforts and I'll share my questions. :) Thanks, Slava. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-10 13:47 ` [Lsf-pc] " Jan Kara 2026-02-10 14:20 ` Chris Mason @ 2026-02-10 21:02 ` Viacheslav Dubeyko 2026-02-11 9:55 ` Jan Kara 1 sibling, 1 reply; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-02-10 21:02 UTC (permalink / raw) To: jack Cc: linux-mm, linux-fsdevel, linux-kernel, lsf-pc, chrisl, bpf, Pavan Rallabhandi, clm On Tue, 2026-02-10 at 14:47 +0100, Jan Kara wrote: > On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote: > > On Mon, 2026-02-09 at 02:03 -0800, Chris Li wrote: > > > On Fri, Feb 6, 2026 at 11:38 AM Viacheslav Dubeyko > > > <Slava.Dubeyko@ibm.com> wrote: > > > > > > > > Hello, > > > > > > > > Machine Learning (ML) is approach/area of learning from data, > > > > finding patterns, and making predictions without implementing algorithms > > > > by developers. The number of areas of ML applications is growing > > > > with every day. Generally speaking, ML can introduce a self-evolving and > > > > self-learning capability in Linux kernel. There are already research works > > > > and industry efforts to employ ML approaches for configuration and > > > > optimization the Linux kernel. However, introduction of ML approaches > > > > in Linux kernel is not so simple and straightforward way. There are multiple > > > > problems and unanswered questions on this road. First of all, any ML model > > > > requires the floating-point operations (FPU) for running. But there is > > > > no direct use of FPUs in kernel space. Also, ML model requires training phase > > > > that can be a reason of significant performance degradation of Linux kernel. > > > > Even inference phase could be problematic from the performance point of view > > > > on kernel side. The using of ML approaches in Linux kernel is inevitable step. > > > > But, how can we use ML approaches in Linux kernel? Which infrastructure > > > > do we need to adopt ML models in Linux kernel? > > > > > > I think there are two different things, I think you want the latter > > > but I am not sure > > > > > > 1) using ML model to help kernel development, code reviews, generate > > > patches by descriptions etc. For example, Chris Mason has a kernel > > > review repo on github and he is sharing his review finding the mailing > > > list: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_masoncl_review-2Dprompts_tree_main&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vvrDPxyw_JXPrkC8BjzA2kEtwdPfwV2gBMEXG7ZveXM4LhS01LfoGwqhEyUZpPe4&s=rqNez5_rmiEuE7in5e_7MfyUzzqzaA6Gk46WWvmN3yk&e= > > > It is kernel development related, but the ML agent code is running in > > > the user space. The actual ML computation might run GPU/TPUs. That > > > does not seem to be what you have in mind. > > > > > > 2) Run the ML model computation in the kernel space. > > > Can you clarify if this is what you have in mind? You mention kernel > > > FPU usage in the kernel for ML model. It is only relevant if you need > > > to run the FP in the kernel CPU instructions. Most ML computations are > > > not run in CPU instructions. They run on GPUs/TPUs. Why not keep the > > > ML program (PyTorch/agents) in the user space and pass the data to the > > > GPU/TPU driver to run? There will be some kernel instructure like > > > VFIO/IOMMU involved with the GPU/TPU driver. For the most part the > > > kernel is just facilitating the data passing to/from the GPU/TPU > > > driver then to the GPU/TPU hardware. The ML hardware is doing the > > > heavy lifting. > > > > The idea is to have ML model running in user-space and kernel subsystem can > > interact with ML model in user-space. As the next step, I am considering two > > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON > > approach. So, for example, GC can be represented by ML model in user-space. GC > > can request data (segments state) from kernel-space and ML model in user-space > > can do training or/and inference. As a result, ML model in user-space can select > > victim segments and instruct kernel-space logic of moving valid data from victim > > segment(s) into clean/current one(s). > > To be honest I'm skeptical about how generic this can be. Essentially > you're describing a generic interface to offload arbitrary kernel decision > to userspace. ML is a userspace bussiness here and not really relevant for > the concept AFAICT. And we already have several ways of kernel asking > userspace to do something for it and unless it is very restricted and well > defined it is rather painful, prone to deadlocks, security issues etc. Scepticism is normal reaction. :) So, nothing wrong is to be sceptical. I believe it can be pretty generic from the data flow point of view. Probably, different kernel subsystems could require different ways of interaction with user-space. However, if we are talking about data flow but NOT execution flow, then it could be generic enough. And if it can be generic, then we can suggest generic way of extending any kernel subsystem by ML support. I don't think that we need to consider the ML library appraoch like "kernel asking userspace to do something". Rather it needs to consider the model like "kernel share data with user-space and user-space recommends something to kernel". So, user-space agent (ML model) can request data from kernel space or kernel subsystem can notify the user-space agent that data is available. And it's up to kernel subsystem implementation which data could be shared with user- space. So, ML model can be trained in user-space and, then, share recommendations (or eBPF code, for example) with kernel space. Finally, it's up to kernel subsystem how and when to apply these recommendations on kernel side. > > So by all means if you want to do GC decisions for your filesystem in > userspace by ML, be my guest, it does make some sense although I'd be wary > of issues where we need to writeback dirty pages to free memory which may > now depend on your userspace helper to make a decision which may need the > memory to do the decision... But I don't see why you need all the ML fluff > around it when it seems like just another way to call userspace helper and > why some of the existing methods would not suffice. > OK. I see. :) You understood GC like a subsystem that helps to kernel memory subsystem to manage the writeback dirty memory pages. :) It's potential direction and I like your suggestion. :) But I meant something different because I consider of LFS file system's GC subsystem. So, if we are using Copy-On-Write (COW) policy, then we have segments or erase blocks with a mixture of valid and invalid logical blocks after update operations. And we need GC subsystem to clean old segments by means of moving valid logical blocks from exhausted segments into clean/current ones. The problem here is to find an efficient algorithm of selecting victim segments with smallest amount of valid blocks with the goal of decreasing write amplification. So, file system needs to share the metadata details (segments state, for example), ML model can share the recommendations, and kernel code of file system can finally move valid blocks in the background. I don't want to say that ML is a miracle that can solve all our problems. And it cannot work efficiently for all possible problems. But it can help us to solve some complicated issues and it makes sense to elaborate some generic framework for ML adoption into Linux kernel. Thanks, Slava. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-10 21:02 ` Viacheslav Dubeyko @ 2026-02-11 9:55 ` Jan Kara 2026-02-12 0:53 ` Viacheslav Dubeyko 0 siblings, 1 reply; 17+ messages in thread From: Jan Kara @ 2026-02-11 9:55 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: jack, linux-mm, linux-fsdevel, linux-kernel, lsf-pc, chrisl, bpf, Pavan Rallabhandi, clm On Tue 10-02-26 21:02:12, Viacheslav Dubeyko wrote: > On Tue, 2026-02-10 at 14:47 +0100, Jan Kara wrote: > > On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote: > > > The idea is to have ML model running in user-space and kernel subsystem can > > > interact with ML model in user-space. As the next step, I am considering two > > > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON > > > approach. So, for example, GC can be represented by ML model in user-space. GC > > > can request data (segments state) from kernel-space and ML model in user-space > > > can do training or/and inference. As a result, ML model in user-space can select > > > victim segments and instruct kernel-space logic of moving valid data from victim > > > segment(s) into clean/current one(s). > > > > To be honest I'm skeptical about how generic this can be. Essentially > > you're describing a generic interface to offload arbitrary kernel decision > > to userspace. ML is a userspace bussiness here and not really relevant for > > the concept AFAICT. And we already have several ways of kernel asking > > userspace to do something for it and unless it is very restricted and well > > defined it is rather painful, prone to deadlocks, security issues etc. > > Scepticism is normal reaction. :) So, nothing wrong is to be sceptical. > > I believe it can be pretty generic from the data flow point of view. Probably, > different kernel subsystems could require different ways of interaction with > user-space. However, if we are talking about data flow but NOT execution flow, > then it could be generic enough. And if it can be generic, then we can suggest > generic way of extending any kernel subsystem by ML support. > > I don't think that we need to consider the ML library appraoch like "kernel > asking userspace to do something". Rather it needs to consider the model like > "kernel share data with user-space and user-space recommends something to > kernel". So, user-space agent (ML model) can request data from kernel space or > kernel subsystem can notify the user-space agent that data is available. And > it's up to kernel subsystem implementation which data could be shared with user- > space. So, ML model can be trained in user-space and, then, share > recommendations (or eBPF code, for example) with kernel space. Finally, it's up > to kernel subsystem how and when to apply these recommendations on kernel side. I guess I have to see some examples. Because so far it sounds so generic that I'm failing to see a value in this :) > > So by all means if you want to do GC decisions for your filesystem in > > userspace by ML, be my guest, it does make some sense although I'd be wary > > of issues where we need to writeback dirty pages to free memory which may > > now depend on your userspace helper to make a decision which may need the > > memory to do the decision... But I don't see why you need all the ML fluff > > around it when it seems like just another way to call userspace helper and > > why some of the existing methods would not suffice. > > > > OK. I see. :) You understood GC like a subsystem that helps to kernel > memory subsystem to manage the writeback dirty memory pages. :) It's > potential direction and I like your suggestion. :) But I meant something > different because I consider of LFS file system's GC subsystem. So, if we > are using Copy-On-Write (COW) policy, then we have segments or erase > blocks with a mixture of valid and invalid logical blocks after update > operations. And we need GC subsystem to clean old segments by means of > moving valid logical blocks from exhausted segments into clean/current > ones. The problem here is to find an efficient algorithm of selecting > victim segments with smallest amount of valid blocks with the goal of > decreasing write amplification. So, file system needs to share the > metadata details (segments state, for example), ML model can share the > recommendations, and kernel code of file system can finally move valid > blocks in the background. No, I actually meant the LFS file system GC as you talk about it. But I was just too terse about my concerns: As you said an LFS with COW needs to select a new position to write each block. When there is no free block available, it has to select partially used erase block (some logical blocks in it became invalid) to reuse. And for this selection you want to use ML AFAIU. Hence we have a dependency folio writeback -> COW block allocation -> GC to make some block free -> ML decision. And now you have to be really careful so that "ML decision" doesn't even indirectly depend on folio writeback to complete. And bear in mind that e.g. if the code doing "ML decision" dirties some mmaped file pages it *will* block waiting for page writeback to complete to get the system below the limit of dirty pages. This is the kind of deadlock I'm talking about that is hard to avoid when offloading kernel decisions to userspace (and yes, I've seen these kind of deadlocks in practice in various shapes and forms with various methods when kernel depended on userspace to make forward progress). Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-11 9:55 ` Jan Kara @ 2026-02-12 0:53 ` Viacheslav Dubeyko 2026-02-12 11:02 ` Jan Kara 0 siblings, 1 reply; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-02-12 0:53 UTC (permalink / raw) To: jack Cc: linux-mm, linux-fsdevel, linux-kernel, lsf-pc, chrisl, bpf, clm, Pavan Rallabhandi On Wed, 2026-02-11 at 10:55 +0100, Jan Kara wrote: > On Tue 10-02-26 21:02:12, Viacheslav Dubeyko wrote: > > On Tue, 2026-02-10 at 14:47 +0100, Jan Kara wrote: > > > On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote: > > > > The idea is to have ML model running in user-space and kernel subsystem can > > > > interact with ML model in user-space. As the next step, I am considering two > > > > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON > > > > approach. So, for example, GC can be represented by ML model in user-space. GC > > > > can request data (segments state) from kernel-space and ML model in user-space > > > > can do training or/and inference. As a result, ML model in user-space can select > > > > victim segments and instruct kernel-space logic of moving valid data from victim > > > > segment(s) into clean/current one(s). > > > > > > To be honest I'm skeptical about how generic this can be. Essentially > > > you're describing a generic interface to offload arbitrary kernel decision > > > to userspace. ML is a userspace bussiness here and not really relevant for > > > the concept AFAICT. And we already have several ways of kernel asking > > > userspace to do something for it and unless it is very restricted and well > > > defined it is rather painful, prone to deadlocks, security issues etc. > > > > Scepticism is normal reaction. :) So, nothing wrong is to be sceptical. > > > > I believe it can be pretty generic from the data flow point of view. Probably, > > different kernel subsystems could require different ways of interaction with > > user-space. However, if we are talking about data flow but NOT execution flow, > > then it could be generic enough. And if it can be generic, then we can suggest > > generic way of extending any kernel subsystem by ML support. > > > > I don't think that we need to consider the ML library appraoch like "kernel > > asking userspace to do something". Rather it needs to consider the model like > > "kernel share data with user-space and user-space recommends something to > > kernel". So, user-space agent (ML model) can request data from kernel space or > > kernel subsystem can notify the user-space agent that data is available. And > > it's up to kernel subsystem implementation which data could be shared with user- > > space. So, ML model can be trained in user-space and, then, share > > recommendations (or eBPF code, for example) with kernel space. Finally, it's up > > to kernel subsystem how and when to apply these recommendations on kernel side. > > I guess I have to see some examples. Because so far it sounds so generic > that I'm failing to see a value in this :) I completely see your point. And I am not going to push anything abstract one. I am going to implement ML-based approach for several real-life use-cases. So, I will have something real or I will fail. :) > > > > So by all means if you want to do GC decisions for your filesystem in > > > userspace by ML, be my guest, it does make some sense although I'd be wary > > > of issues where we need to writeback dirty pages to free memory which may > > > now depend on your userspace helper to make a decision which may need the > > > memory to do the decision... But I don't see why you need all the ML fluff > > > around it when it seems like just another way to call userspace helper and > > > why some of the existing methods would not suffice. > > > > > > > OK. I see. :) You understood GC like a subsystem that helps to kernel > > memory subsystem to manage the writeback dirty memory pages. :) It's > > potential direction and I like your suggestion. :) But I meant something > > different because I consider of LFS file system's GC subsystem. So, if we > > are using Copy-On-Write (COW) policy, then we have segments or erase > > blocks with a mixture of valid and invalid logical blocks after update > > operations. And we need GC subsystem to clean old segments by means of > > moving valid logical blocks from exhausted segments into clean/current > > ones. The problem here is to find an efficient algorithm of selecting > > victim segments with smallest amount of valid blocks with the goal of > > decreasing write amplification. So, file system needs to share the > > metadata details (segments state, for example), ML model can share the > > recommendations, and kernel code of file system can finally move valid > > blocks in the background. > > No, I actually meant the LFS file system GC as you talk about it. But I was > just too terse about my concerns: As you said an LFS with COW needs to > select a new position to write each block. When there is no free block > available, it has to select partially used erase block (some logical blocks > in it became invalid) to reuse. > I assume that you imply F2FS here. Because, I cannot imagine how LFS file system (like NILFS2) can do something like this. If it's LFS file system, then you add logs into the current segment(s). Even if some logical blocks have been invalidated into this segment, then you add another log into the head/tail of current segment until complete exhaustion of it. And it needs to allocate the completely clean/free segment to be current and receive the logs. So, you need to take completely exhausted segment for cleaning by GC. If you have pure COW file system, then you cannot write anything in likewise segment until complete invalidation + "erase"/clean. So, GC moves valid blocks from completely exhausted segment into the current one(s). It's responsibility of GC to guarantee that file system is not running out of free physical space if file system still has free logical blocks. And if we are running out free physical space, then operation stops because of GC failure to keep enough clean segments. > And for this selection you want to use ML > AFAIU. Hence we have a dependency folio writeback -> COW block allocation -> > GC to make some block free -> ML decision. > Usually, GC works in the background. So, ML model in user-space get segments state metadata from file system. Then, it selects one or several segments and recommends to file system of moving valid blocks for the selected segment(s) ID + maximal amount of valid blocks for single operation. Background process of file system checks that these logical blocks of exhausted segment are still valid and initiates operation of moving into the current segment by adding another log. Finally, we have two flows: (1) regular file system operations: folio writeback -> COW block allocation -> add log into current segment; (2) GC operations: ML GC decision -> recommendation of moving valid blocks for segment -> check that logical block is still valid -> read block content (ignore logical block if we have folio in page cache) -> add log into current segment -> update metadata. Thanks, Slava. > And now you have to be really > careful so that "ML decision" doesn't even indirectly depend on folio > writeback to complete. And bear in mind that e.g. if the code doing "ML > decision" dirties some mmaped file pages it *will* block waiting for page > writeback to complete to get the system below the limit of dirty pages. > This is the kind of deadlock I'm talking about that is hard to avoid when > offloading kernel decisions to userspace (and yes, I've seen these kind of > deadlocks in practice in various shapes and forms with various methods when > kernel depended on userspace to make forward progress). > > Honza ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-12 0:53 ` Viacheslav Dubeyko @ 2026-02-12 11:02 ` Jan Kara 0 siblings, 0 replies; 17+ messages in thread From: Jan Kara @ 2026-02-12 11:02 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: jack, linux-mm, linux-fsdevel, linux-kernel, lsf-pc, chrisl, bpf, clm, Pavan Rallabhandi On Thu 12-02-26 00:53:37, Viacheslav Dubeyko wrote: > On Wed, 2026-02-11 at 10:55 +0100, Jan Kara wrote: > > On Tue 10-02-26 21:02:12, Viacheslav Dubeyko wrote: > > > On Tue, 2026-02-10 at 14:47 +0100, Jan Kara wrote: > > > > On Mon 09-02-26 22:28:59, Viacheslav Dubeyko via Lsf-pc wrote: > > > > > The idea is to have ML model running in user-space and kernel subsystem can > > > > > interact with ML model in user-space. As the next step, I am considering two > > > > > real-life use-cases: (1) GC subsystem of LFS file system, (2) ML-based DAMON > > > > > approach. So, for example, GC can be represented by ML model in user-space. GC > > > > > can request data (segments state) from kernel-space and ML model in user-space > > > > > can do training or/and inference. As a result, ML model in user-space can select > > > > > victim segments and instruct kernel-space logic of moving valid data from victim > > > > > segment(s) into clean/current one(s). > > > > > > > > To be honest I'm skeptical about how generic this can be. Essentially > > > > you're describing a generic interface to offload arbitrary kernel decision > > > > to userspace. ML is a userspace bussiness here and not really relevant for > > > > the concept AFAICT. And we already have several ways of kernel asking > > > > userspace to do something for it and unless it is very restricted and well > > > > defined it is rather painful, prone to deadlocks, security issues etc. > > > > > > Scepticism is normal reaction. :) So, nothing wrong is to be sceptical. > > > > > > I believe it can be pretty generic from the data flow point of view. Probably, > > > different kernel subsystems could require different ways of interaction with > > > user-space. However, if we are talking about data flow but NOT execution flow, > > > then it could be generic enough. And if it can be generic, then we can suggest > > > generic way of extending any kernel subsystem by ML support. > > > > > > I don't think that we need to consider the ML library appraoch like "kernel > > > asking userspace to do something". Rather it needs to consider the model like > > > "kernel share data with user-space and user-space recommends something to > > > kernel". So, user-space agent (ML model) can request data from kernel space or > > > kernel subsystem can notify the user-space agent that data is available. And > > > it's up to kernel subsystem implementation which data could be shared with user- > > > space. So, ML model can be trained in user-space and, then, share > > > recommendations (or eBPF code, for example) with kernel space. Finally, it's up > > > to kernel subsystem how and when to apply these recommendations on kernel side. > > > > I guess I have to see some examples. Because so far it sounds so generic > > that I'm failing to see a value in this :) > > I completely see your point. And I am not going to push anything abstract > one. I am going to implement ML-based approach for several real-life > use-cases. So, I will have something real or I will fail. :) OK, good then :) > > > > So by all means if you want to do GC decisions for your filesystem in > > > > userspace by ML, be my guest, it does make some sense although I'd be wary > > > > of issues where we need to writeback dirty pages to free memory which may > > > > now depend on your userspace helper to make a decision which may need the > > > > memory to do the decision... But I don't see why you need all the ML fluff > > > > around it when it seems like just another way to call userspace helper and > > > > why some of the existing methods would not suffice. > > > > > > > > > > OK. I see. :) You understood GC like a subsystem that helps to kernel > > > memory subsystem to manage the writeback dirty memory pages. :) It's > > > potential direction and I like your suggestion. :) But I meant something > > > different because I consider of LFS file system's GC subsystem. So, if we > > > are using Copy-On-Write (COW) policy, then we have segments or erase > > > blocks with a mixture of valid and invalid logical blocks after update > > > operations. And we need GC subsystem to clean old segments by means of > > > moving valid logical blocks from exhausted segments into clean/current > > > ones. The problem here is to find an efficient algorithm of selecting > > > victim segments with smallest amount of valid blocks with the goal of > > > decreasing write amplification. So, file system needs to share the > > > metadata details (segments state, for example), ML model can share the > > > recommendations, and kernel code of file system can finally move valid > > > blocks in the background. > > > > No, I actually meant the LFS file system GC as you talk about it. But I was > > just too terse about my concerns: As you said an LFS with COW needs to > > select a new position to write each block. When there is no free block > > available, it has to select partially used erase block (some logical blocks > > in it became invalid) to reuse. > > > > I assume that you imply F2FS here. Because, I cannot imagine how LFS file system > (like NILFS2) can do something like this. If it's LFS file system, then you add > logs into the current segment(s). Even if some logical blocks have been > invalidated into this segment, then you add another log into the head/tail of > current segment until complete exhaustion of it. And it needs to allocate the > completely clean/free segment to be current and receive the logs. So, you need > to take completely exhausted segment for cleaning by GC. If you have pure COW > file system, then you cannot write anything in likewise segment until complete > invalidation + "erase"/clean. So, GC moves valid blocks from completely > exhausted segment into the current one(s). It's responsibility of GC to > guarantee that file system is not running out of free physical space if file > system still has free logical blocks. And if we are running out free physical > space, then operation stops because of GC failure to keep enough clean segments. Well, details of different filesystem designs are different but they all have the common feature that on an aged filesystem you need GC to do work to be able to write as much as you are supposed to be able to write. > > And for this selection you want to use ML > > AFAIU. Hence we have a dependency folio writeback -> COW block allocation -> > > GC to make some block free -> ML decision. > > Usually, GC works in the background. So, ML model in user-space get > segments state metadata from file system. Then, it selects one or several > segments and recommends to file system of moving valid blocks for the > selected segment(s) ID + maximal amount of valid blocks for single > operation. Background process of file system checks that these logical > blocks of exhausted segment are still valid and initiates operation of > moving into the current segment by adding another log. Sure, background operation is the easy case. I'm speaking about the situation where the filesystem is under such write pressure that GC cannot keep up and all the write activity is basically blocked waiting for GC to make forward progress. And again details for different filesystems differ but all have this property that the speed of GC is one of the limiting factors for writes when the filesystem is aged enough and the write pressure is large enough. And the point I'm trying to get across is that under such pressure consulting userspace for GC decisions is likely to cause deadlocks. So you will have to have some in-kernel fallbacks to avoid such deadlocks and logic for triggering these fallbacks to guarantee forward progress of GC which all gets kind of hairy. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-06 19:38 [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel Viacheslav Dubeyko 2026-02-06 23:28 ` Hillf Danton 2026-02-09 10:03 ` Chris Li @ 2026-02-09 10:25 ` Barry Song 2026-02-09 22:07 ` Viacheslav Dubeyko 2 siblings, 1 reply; 17+ messages in thread From: Barry Song @ 2026-02-09 10:25 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: lsf-pc, Viacheslav Dubeyko, linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, bpf On Sat, Feb 7, 2026 at 3:40 AM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > Hello, > [...] > > The continuous learning model can be adopted during training phase. > It implies that kernel subsystem can receive ML model recommendations > even during training phase. ML model proxy on kernel side can estimate > the current kernel subsystem state, tries to apply the ML model > recommendations, and estimate the efficiency of applied recommendations. > Generally speaking, ML model proxy on kernel side can consider several > modes of interaction with ML model recommendations: (1) emergency mode, > (2) learning mode, (3) collaboration mode, (4) recommendation mode. > The emergency mode is the mode when kernel subsystem is in critical state > and it is required to work as efficient as possible without capability of > involving the ML model recommendations (for example, ML model > recommendations are completely inadequate or load is very high). > The learning mode implies that kernel subsystem can try to apply > the ML model recommendations for some operations with the goal of > estimation the maturity of ML model. Also, ML model proxy can degrade > the mode to learning state if ML model recommendations becomes inefficient. > The collaboration mode has the goal of using ML recommendations in > 50% of operations with the goal of achieving mature state of ML model. > And, finally, ML model proxy can convert kernel subsystem in recommendation > mode if ML model is mature enough and efficiency of applying > the ML recommendations is higher than using human-made algorithms. Hi Slava, Do we have any concrete examples where an ML-based proxy, together with its userspace ML agent, has demonstrated measurable performance improvements over well-designed, human-crafted kernel algorithms? Such examples could be in scheduling, filesystem I/O, or memory reclamation and readahead. I think having a real, data-backed example would be much more helpful for this discussion than reasoning about an abstract framework without a concrete use case. Thanks, Barry ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-09 10:25 ` Barry Song @ 2026-02-09 22:07 ` Viacheslav Dubeyko 2026-02-10 3:06 ` Barry Song 0 siblings, 1 reply; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-02-09 22:07 UTC (permalink / raw) To: 21cnbao Cc: linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, lsf-pc, bpf Hi Barry, On Mon, 2026-02-09 at 18:25 +0800, Barry Song wrote: > On Sat, Feb 7, 2026 at 3:40 AM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > > > Hello, > > > [...] > > > > The continuous learning model can be adopted during training phase. > > It implies that kernel subsystem can receive ML model recommendations > > even during training phase. ML model proxy on kernel side can estimate > > the current kernel subsystem state, tries to apply the ML model > > recommendations, and estimate the efficiency of applied recommendations. > > Generally speaking, ML model proxy on kernel side can consider several > > modes of interaction with ML model recommendations: (1) emergency mode, > > (2) learning mode, (3) collaboration mode, (4) recommendation mode. > > The emergency mode is the mode when kernel subsystem is in critical state > > and it is required to work as efficient as possible without capability of > > involving the ML model recommendations (for example, ML model > > recommendations are completely inadequate or load is very high). > > The learning mode implies that kernel subsystem can try to apply > > the ML model recommendations for some operations with the goal of > > estimation the maturity of ML model. Also, ML model proxy can degrade > > the mode to learning state if ML model recommendations becomes inefficient. > > The collaboration mode has the goal of using ML recommendations in > > 50% of operations with the goal of achieving mature state of ML model. > > And, finally, ML model proxy can convert kernel subsystem in recommendation > > mode if ML model is mature enough and efficiency of applying > > the ML recommendations is higher than using human-made algorithms. > > Hi Slava, > > Do we have any concrete examples where an ML-based proxy, > together with its userspace ML agent, has demonstrated > measurable performance improvements over well-designed, > human-crafted kernel algorithms? > > Such examples could be in scheduling, filesystem I/O, or memory > reclamation and readahead. I think having a real, data-backed > example would be much more helpful for this discussion than > reasoning about an abstract framework without a concrete use > case. > This patchset [1] is the first step of declaring the ML library API with the goal of discussing it. As the next step, I am considering of using ML library API for implementing two real-life use-cases: (1) GC subsystem of LFS file systems (NILFS2, F2FS, SSDFS), (2) ML-based DAMON approach. I see multiple potential real-life use-cases of ML library. But let me start from these two ones and, then, we will able to extend the approach for other use-cases. The goal of this talk is to hear the opinion of the community and to elaborate the proper vision of ML library architecture. Thanks, Slava. [1] https://lore.kernel.org/linux-fsdevel/20260206191136.2609767-1-slava@dubeyko.com/T/#t ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-09 22:07 ` Viacheslav Dubeyko @ 2026-02-10 3:06 ` Barry Song 2026-02-10 19:57 ` Viacheslav Dubeyko 0 siblings, 1 reply; 17+ messages in thread From: Barry Song @ 2026-02-10 3:06 UTC (permalink / raw) To: Viacheslav Dubeyko Cc: linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, lsf-pc, bpf On Tue, Feb 10, 2026 at 6:07 AM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > Hi Barry, > > On Mon, 2026-02-09 at 18:25 +0800, Barry Song wrote: > > On Sat, Feb 7, 2026 at 3:40 AM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > > > > > Hello, > > > > > [...] > > > > > > The continuous learning model can be adopted during training phase. > > > It implies that kernel subsystem can receive ML model recommendations > > > even during training phase. ML model proxy on kernel side can estimate > > > the current kernel subsystem state, tries to apply the ML model > > > recommendations, and estimate the efficiency of applied recommendations. > > > Generally speaking, ML model proxy on kernel side can consider several > > > modes of interaction with ML model recommendations: (1) emergency mode, > > > (2) learning mode, (3) collaboration mode, (4) recommendation mode. > > > The emergency mode is the mode when kernel subsystem is in critical state > > > and it is required to work as efficient as possible without capability of > > > involving the ML model recommendations (for example, ML model > > > recommendations are completely inadequate or load is very high). > > > The learning mode implies that kernel subsystem can try to apply > > > the ML model recommendations for some operations with the goal of > > > estimation the maturity of ML model. Also, ML model proxy can degrade > > > the mode to learning state if ML model recommendations becomes inefficient. > > > The collaboration mode has the goal of using ML recommendations in > > > 50% of operations with the goal of achieving mature state of ML model. > > > And, finally, ML model proxy can convert kernel subsystem in recommendation > > > mode if ML model is mature enough and efficiency of applying > > > the ML recommendations is higher than using human-made algorithms. > > > > Hi Slava, > > > > Do we have any concrete examples where an ML-based proxy, > > together with its userspace ML agent, has demonstrated > > measurable performance improvements over well-designed, > > human-crafted kernel algorithms? > > > > Such examples could be in scheduling, filesystem I/O, or memory > > reclamation and readahead. I think having a real, data-backed > > example would be much more helpful for this discussion than > > reasoning about an abstract framework without a concrete use > > case. > > > > This patchset [1] is the first step of declaring the ML library API with the > goal of discussing it. As the next step, I am considering of using ML library > API for implementing two real-life use-cases: (1) GC subsystem of LFS file > systems (NILFS2, F2FS, SSDFS), (2) ML-based DAMON approach. I see multiple > potential real-life use-cases of ML library. But let me start from these two > ones and, then, we will able to extend the approach for other use-cases. The > goal of this talk is to hear the opinion of the community and to elaborate the > proper vision of ML library architecture. I’m very interested in your real-world use case. If you have any early-stage prototype code that demonstrates the full flow from user space to kernel space—including both the kernel ML proxy and the user-space ML agent (for example, for filesystem garbage collection)—I’d be glad to take a look if you’re able to share it. Thanks Barry ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel 2026-02-10 3:06 ` Barry Song @ 2026-02-10 19:57 ` Viacheslav Dubeyko 0 siblings, 0 replies; 17+ messages in thread From: Viacheslav Dubeyko @ 2026-02-10 19:57 UTC (permalink / raw) To: 21cnbao Cc: linux-mm, Pavan Rallabhandi, linux-fsdevel, linux-kernel, lsf-pc, bpf On Tue, 2026-02-10 at 11:06 +0800, Barry Song wrote: > On Tue, Feb 10, 2026 at 6:07 AM Viacheslav Dubeyko > <Slava.Dubeyko@ibm.com> wrote: > > > > Hi Barry, > > > > On Mon, 2026-02-09 at 18:25 +0800, Barry Song wrote: > > > On Sat, Feb 7, 2026 at 3:40 AM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > > > > > > > Hello, > > > > > > > [...] > > > > > > > > The continuous learning model can be adopted during training phase. > > > > It implies that kernel subsystem can receive ML model recommendations > > > > even during training phase. ML model proxy on kernel side can estimate > > > > the current kernel subsystem state, tries to apply the ML model > > > > recommendations, and estimate the efficiency of applied recommendations. > > > > Generally speaking, ML model proxy on kernel side can consider several > > > > modes of interaction with ML model recommendations: (1) emergency mode, > > > > (2) learning mode, (3) collaboration mode, (4) recommendation mode. > > > > The emergency mode is the mode when kernel subsystem is in critical state > > > > and it is required to work as efficient as possible without capability of > > > > involving the ML model recommendations (for example, ML model > > > > recommendations are completely inadequate or load is very high). > > > > The learning mode implies that kernel subsystem can try to apply > > > > the ML model recommendations for some operations with the goal of > > > > estimation the maturity of ML model. Also, ML model proxy can degrade > > > > the mode to learning state if ML model recommendations becomes inefficient. > > > > The collaboration mode has the goal of using ML recommendations in > > > > 50% of operations with the goal of achieving mature state of ML model. > > > > And, finally, ML model proxy can convert kernel subsystem in recommendation > > > > mode if ML model is mature enough and efficiency of applying > > > > the ML recommendations is higher than using human-made algorithms. > > > > > > Hi Slava, > > > > > > Do we have any concrete examples where an ML-based proxy, > > > together with its userspace ML agent, has demonstrated > > > measurable performance improvements over well-designed, > > > human-crafted kernel algorithms? > > > > > > Such examples could be in scheduling, filesystem I/O, or memory > > > reclamation and readahead. I think having a real, data-backed > > > example would be much more helpful for this discussion than > > > reasoning about an abstract framework without a concrete use > > > case. > > > > > > > This patchset [1] is the first step of declaring the ML library API with the > > goal of discussing it. As the next step, I am considering of using ML library > > API for implementing two real-life use-cases: (1) GC subsystem of LFS file > > systems (NILFS2, F2FS, SSDFS), (2) ML-based DAMON approach. I see multiple > > potential real-life use-cases of ML library. But let me start from these two > > ones and, then, we will able to extend the approach for other use-cases. The > > goal of this talk is to hear the opinion of the community and to elaborate the > > proper vision of ML library architecture. > > I’m very interested in your real-world use case. > If you have any early-stage prototype code that demonstrates the full > flow from user space to kernel space—including both the kernel ML proxy > and the user-space ML agent (for example, for filesystem garbage > collection)—I’d be glad to take a look if you’re able to share it. > > I am going to extend for real-life use-case the early-stage prototype code [1]. The [2] is the Linux kernel with integrated ML library. And [3] is patchset that I've shared recently of this early-stage prototype code. It will be great to hear your opinion. :) Thanks, Slava. [1] https://github.com/kernel-ml-lib/ml-lib [2] https://github.com/kernel-ml-lib/ml-lib-linux [3] https://lore.kernel.org/linux-fsdevel/20260206191136.2609767-1-slava@dubeyko.com/T/#t ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-02-12 11:02 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-02-06 19:38 [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel Viacheslav Dubeyko 2026-02-06 23:28 ` Hillf Danton 2026-02-09 10:03 ` Chris Li 2026-02-09 22:28 ` Viacheslav Dubeyko 2026-02-10 13:47 ` [Lsf-pc] " Jan Kara 2026-02-10 14:20 ` Chris Mason 2026-02-10 22:36 ` Viacheslav Dubeyko 2026-02-11 1:30 ` SeongJae Park 2026-02-11 20:29 ` Viacheslav Dubeyko 2026-02-10 21:02 ` Viacheslav Dubeyko 2026-02-11 9:55 ` Jan Kara 2026-02-12 0:53 ` Viacheslav Dubeyko 2026-02-12 11:02 ` Jan Kara 2026-02-09 10:25 ` Barry Song 2026-02-09 22:07 ` Viacheslav Dubeyko 2026-02-10 3:06 ` Barry Song 2026-02-10 19:57 ` Viacheslav Dubeyko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox