* Re: [PATCH 1/3] soc cache: L3 cache driver for HiSilicon SoC [not found] ` <20260204134020.00002393@huawei.com> @ 2026-02-04 13:44 ` Jonathan Cameron 2026-02-05 2:20 ` SeongJae Park 0 siblings, 1 reply; 2+ messages in thread From: Jonathan Cameron @ 2026-02-04 13:44 UTC (permalink / raw) To: Linus Walleij Cc: Yushan Wang, alexandre.belloni, arnd, fustini, krzk, linus.walleij, will, linux-arm-kernel, linux-kernel, fanghao11, linuxarm, liuyonglong, prime.zeng, wangzhou1, xuwei5, SeongJae Park, linux-mm Fixed linux-mm address that got added a few emails back. On Wed, 4 Feb 2026 13:40:20 +0000 Jonathan Cameron <jonathan.cameron@huawei.com> wrote: > On Wed, 4 Feb 2026 01:10:01 +0100 > Linus Walleij <linusw@kernel.org> wrote: > > > Hi Yushan, > > > > thanks for your patch! > > > > On Tue, Feb 3, 2026 at 5:18 PM Yushan Wang <wangyushan12@huawei.com> wrote: > > > > > > The driver will create a file of `/dev/hisi_l3c` on init, mmap > > > operations to it will allocate a memory region that is guaranteed to be > > > placed in L3 cache. > > > > > > The driver also provides unmap() to deallocated the locked memory. > > > > > > The driver also provides an ioctl interface for user to get cache lock > > > information, such as lock restrictions and locked sizes. > > > > > > Signed-off-by: Yushan Wang <wangyushan12@huawei.com> > > > > The commit message does not say *why* you are doing this? > > > > > +config HISI_SOC_L3C > > > + bool "HiSilicon L3 Cache device driver" > > > + depends on ACPI > > > + depends on ARM64 || COMPILE_TEST > > > + help > > > + This driver provides the functions to lock L3 cache entries from > > > + being evicted for better performance. > > > > Here is the reason though. > > > > Things like this need to be CC to linux-mm@vger.kernel.org. > > > > I don't see why userspace would be so well informed as to make decisions > > about what should be locked in the L3 cache and not? > > > > I see the memory hierarchy as any other hardware: a resource that is > > allocated and arbitrated by the kernel. > > > > The MM subsytem knows which memory is most cache hot. > > Especially when you use DAMON DAMOS, which has the sole > > purpose of executing actions like that. Here is a good YouTube. > > https://www.youtube.com/watch?v=xKJO4kLTHOI > Hi Linus, > > This typically isn't about cache hot. It it were, the data would > be in the cache without this. It's about ensuring something that would > otherwise unlikely to be there is in the cache. > > Normally that's a latency critical region. In general the kernel > has no chance of figuring out what those are ahead of time, only > userspace can know (based on profiling etc) that is per workload. > The first hit matters in these use cases and it's not something > the prefetchers can help with. > > The only thing we could do if this was in kernel would be to > have userspace pass some hints and then let the kernel actually > kick off the process. That just boils down to using a different > interface to do what this driver is doing (and that's the conversaion > this series is trying to get going) It's a finite resource > and you absolutely need userspace to be able to tell if it > got what it asked for or not. > > Damon might be useful for that preanalysis though but it can't do > anything for the infrequent extremely latency sensitive accesses. > Normally this is fleet wide stuff based on intensive benchmarking > of a few nodes. Same sort of approach as the original warehouse > scale computing paper on tuning zswap capacity across a fleet. > Its an extreme form of profile guided optimization (and not > currently automatic I think?). If we are putting code in this > locked region, the program has been carefully recompiled / linked > to group the critical parts so that we can use the minimum number > of these locked regions. Data is a little simpler. > > It's kind of similar to resctl but at a sub process granularity. > > > > > Shouldn't the MM subsystem be in charge of determining, locking > > down and freeing up hot regions in L3 cache? > > > > This looks more like userspace is going to determine that but > > how exactly? By running DAMON? Then it's better to keep the > > whole mechanism in the kernel where it belongs and let the > > MM subsystem adapt locked L3 cache to the usage patterns. > > I haven't yet come up with any plausible scheme by which the MM > subsystem could do this. > > I think what we need here Yushan, is more detail on end to end > use cases for this. Some examples etc as clearer motivation. > > Jonathan > > > > > Yours, > > Linus Walleij > > > ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH 1/3] soc cache: L3 cache driver for HiSilicon SoC 2026-02-04 13:44 ` [PATCH 1/3] soc cache: L3 cache driver for HiSilicon SoC Jonathan Cameron @ 2026-02-05 2:20 ` SeongJae Park 0 siblings, 0 replies; 2+ messages in thread From: SeongJae Park @ 2026-02-05 2:20 UTC (permalink / raw) To: Jonathan Cameron Cc: SeongJae Park, Linus Walleij, Yushan Wang, alexandre.belloni, arnd, fustini, krzk, linus.walleij, will, linux-arm-kernel, linux-kernel, fanghao11, linuxarm, liuyonglong, prime.zeng, wangzhou1, xuwei5, linux-mm On Wed, 4 Feb 2026 13:44:47 +0000 Jonathan Cameron <jonathan.cameron@huawei.com> wrote: > > Fixed linux-mm address that got added a few emails back. > > On Wed, 4 Feb 2026 13:40:20 +0000 > Jonathan Cameron <jonathan.cameron@huawei.com> wrote: > > > On Wed, 4 Feb 2026 01:10:01 +0100 > > Linus Walleij <linusw@kernel.org> wrote: > > > > > Hi Yushan, > > > > > > thanks for your patch! > > > > > > On Tue, Feb 3, 2026 at 5:18 PM Yushan Wang <wangyushan12@huawei.com> wrote: > > > > > > > > The driver will create a file of `/dev/hisi_l3c` on init, mmap > > > > operations to it will allocate a memory region that is guaranteed to be > > > > placed in L3 cache. > > > > > > > > The driver also provides unmap() to deallocated the locked memory. > > > > > > > > The driver also provides an ioctl interface for user to get cache lock > > > > information, such as lock restrictions and locked sizes. > > > > > > > > Signed-off-by: Yushan Wang <wangyushan12@huawei.com> > > > > > > The commit message does not say *why* you are doing this? > > > > > > > +config HISI_SOC_L3C > > > > + bool "HiSilicon L3 Cache device driver" > > > > + depends on ACPI > > > > + depends on ARM64 || COMPILE_TEST > > > > + help > > > > + This driver provides the functions to lock L3 cache entries from > > > > + being evicted for better performance. > > > > > > Here is the reason though. > > > > > > Things like this need to be CC to linux-mm@vger.kernel.org. > > > > > > I don't see why userspace would be so well informed as to make decisions > > > about what should be locked in the L3 cache and not? > > > > > > I see the memory hierarchy as any other hardware: a resource that is > > > allocated and arbitrated by the kernel. > > > > > > The MM subsytem knows which memory is most cache hot. > > > Especially when you use DAMON DAMOS, which has the sole > > > purpose of executing actions like that. Here is a good YouTube. > > > https://www.youtube.com/watch?v=xKJO4kLTHOI Thank you for Cc-ing me, Linus. > > Hi Linus, > > > > This typically isn't about cache hot. It it were, the data would > > be in the cache without this. It's about ensuring something that would > > otherwise unlikely to be there is in the cache. > > > > Normally that's a latency critical region. In general the kernel > > has no chance of figuring out what those are ahead of time, only > > userspace can know (based on profiling etc) that is per workload. > > The first hit matters in these use cases and it's not something > > the prefetchers can help with. > > > > The only thing we could do if this was in kernel would be to > > have userspace pass some hints and then let the kernel actually > > kick off the process. That just boils down to using a different > > interface to do what this driver is doing (and that's the conversaion > > this series is trying to get going) It's a finite resource > > and you absolutely need userspace to be able to tell if it > > got what it asked for or not. And thank you for clarifying, Jonathan. > > > > Damon might be useful for that preanalysis though but it can't do > > anything for the infrequent extremely latency sensitive accesses. I also find no good idea to let DAMON help in this scenario. If I have to make a brain storming idea off the top of my humble head, though. Maybe we can ask DAMON to monitor address ranges that assumed to have the latency sensitive data. And further ask DAMOS to find sub regions of the area that getting colder than desired, and make an access to cache lines of the sub regions so that they can be in the cache for "most cases". It is just a brain storming idea off the top of my head and probably not work for your case, since... It ain't work if there is no good way to know or guarantee the address ranges for the latency sensitive data. It ain't work for extremely latency sensitive case, as DAMON is just a best effort. It ain't work with DAMON of today because DAMOS doesn't support such kind of cache-granularity access generation action. So, it sounds like not a good idea. Nonetheless, if you get any question for DAMON in future, please feel free to reach out :) Thanks, SJ [...] ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-02-05 2:21 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20260203161843.649417-1-wangyushan12@huawei.com>
[not found] ` <20260203161843.649417-2-wangyushan12@huawei.com>
[not found] ` <CAD++jLn+TDfu-aQ+Kfm=unYp4Zjg=endP3GGzZcpuYFR=s1K1g@mail.gmail.com>
[not found] ` <20260204134020.00002393@huawei.com>
2026-02-04 13:44 ` [PATCH 1/3] soc cache: L3 cache driver for HiSilicon SoC Jonathan Cameron
2026-02-05 2:20 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox