* [PATCH 0/4] Track exported dma-buffers with memcg @ 2023-01-09 21:38 T.J. Mercier 2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier 2023-01-10 0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt 0 siblings, 2 replies; 11+ messages in thread From: T.J. Mercier @ 2023-01-09 21:38 UTC (permalink / raw) To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris Cc: daniel.vetter, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux Based on discussions at LPC, this series adds a memory.stat counter for exported dmabufs. This counter allows us to continue tracking system-wide total exported buffer sizes which there is no longer any way to get without DMABUF_SYSFS_STATS, and adds a new capability to track per-cgroup exported buffer sizes. The total (root counter) is helpful for accounting in-kernel dmabuf use (by comparing with the sum of child nodes or with the sum of sizes of mapped buffers or FD references in procfs) in addition to helping identify driver memory leaks when in-kernel use continually increases over time. With per-application cgroups, the per-cgroup counter allows us to quickly see how much dma-buf memory an application has caused to be allocated. This avoids the need to read through all of procfs which can be a lengthy process, and causes the charge to "stick" to the allocating process/cgroup as long as the buffer is alive, regardless of how the buffer is shared (unless the charge is transferred). The first patch adds the counter to memcg. The next two patches allow the charge for a buffer to be transferred across cgroups which is necessary because of the way most dmabufs are allocated from a central process on Android. The fourth patch adds a SELinux hook to binder in order to control who is allowed to transfer buffer charges. [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/ Hridya Valsaraju (1): binder: Add flags to relinquish ownership of fds T.J. Mercier (3): memcg: Track exported dma-buffers dmabuf: Add cgroup charge transfer function security: binder: Add transfer_charge SElinux hook Documentation/admin-guide/cgroup-v2.rst | 5 +++ drivers/android/binder.c | 36 +++++++++++++++-- drivers/dma-buf/dma-buf.c | 54 +++++++++++++++++++++++-- include/linux/dma-buf.h | 5 +++ include/linux/lsm_hook_defs.h | 2 + include/linux/lsm_hooks.h | 6 +++ include/linux/memcontrol.h | 7 ++++ include/linux/security.h | 2 + include/uapi/linux/android/binder.h | 23 +++++++++-- mm/memcontrol.c | 4 ++ security/security.c | 6 +++ security/selinux/hooks.c | 9 +++++ security/selinux/include/classmap.h | 2 +- 13 files changed, 149 insertions(+), 12 deletions(-) base-commit: b7bfaa761d760e72a969d116517eaa12e404c262 -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] memcg: Track exported dma-buffers 2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier @ 2023-01-09 21:38 ` T.J. Mercier 2023-01-10 8:58 ` Michal Hocko 2023-01-10 0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt 1 sibling, 1 reply; 11+ messages in thread From: T.J. Mercier @ 2023-01-09 21:38 UTC (permalink / raw) To: tjmercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton Cc: daniel.vetter, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm When a buffer is exported to userspace, use memcg to attribute the buffer to the allocating cgroup until all buffer references are released. Unlike the dmabuf sysfs stats implementation, this memcg accounting avoids contention over the kernfs_rwsem incurred when creating or removing nodes. Signed-off-by: T.J. Mercier <tjmercier@google.com> --- Documentation/admin-guide/cgroup-v2.rst | 4 ++++ drivers/dma-buf/dma-buf.c | 5 +++++ include/linux/dma-buf.h | 3 +++ include/linux/memcontrol.h | 1 + mm/memcontrol.c | 4 ++++ 5 files changed, 17 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index c8ae7c897f14..538ae22bc514 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1455,6 +1455,10 @@ PAGE_SIZE multiple when read back. Amount of memory used for storing in-kernel data structures. + dmabuf (npn) + Amount of memory used for exported DMA buffers allocated by the cgroup. + Stays with the allocating cgroup regardless of how the buffer is shared. + workingset_refault_anon Number of refaults of previously evicted anonymous pages. diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index e6528767efc7..ac45dd101c4d 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -75,6 +75,8 @@ static void dma_buf_release(struct dentry *dentry) */ BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active); + mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, -dmabuf->size); + mem_cgroup_put(dmabuf->memcg); dma_buf_stats_teardown(dmabuf); dmabuf->ops->release(dmabuf); @@ -673,6 +675,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) if (ret) goto err_dmabuf; + dmabuf->memcg = get_mem_cgroup_from_mm(current->mm); + mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, dmabuf->size); + file->private_data = dmabuf; file->f_path.dentry->d_fsdata = dmabuf; dmabuf->file = file; diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 6fa8d4e29719..1f0ffb8e4bf5 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -22,6 +22,7 @@ #include <linux/fs.h> #include <linux/dma-fence.h> #include <linux/wait.h> +#include <linux/memcontrol.h> struct device; struct dma_buf; @@ -446,6 +447,8 @@ struct dma_buf { struct dma_buf *dmabuf; } *sysfs_entry; #endif + /* The cgroup to which this buffer is currently attributed */ + struct mem_cgroup *memcg; }; /** diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d3c8203cab6c..1c1da2da20a6 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -37,6 +37,7 @@ enum memcg_stat_item { MEMCG_KMEM, MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, + MEMCG_DMABUF, MEMCG_NR_STAT, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ab457f0394ab..680189bec7e0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1502,6 +1502,7 @@ static const struct memory_stat memory_stats[] = { { "unevictable", NR_UNEVICTABLE }, { "slab_reclaimable", NR_SLAB_RECLAIMABLE_B }, { "slab_unreclaimable", NR_SLAB_UNRECLAIMABLE_B }, + { "dmabuf", MEMCG_DMABUF }, /* The memory events */ { "workingset_refault_anon", WORKINGSET_REFAULT_ANON }, @@ -1519,6 +1520,7 @@ static int memcg_page_state_unit(int item) switch (item) { case MEMCG_PERCPU_B: case MEMCG_ZSWAP_B: + case MEMCG_DMABUF: case NR_SLAB_RECLAIMABLE_B: case NR_SLAB_UNRECLAIMABLE_B: case WORKINGSET_REFAULT_ANON: @@ -4042,6 +4044,7 @@ static const unsigned int memcg1_stats[] = { WORKINGSET_REFAULT_ANON, WORKINGSET_REFAULT_FILE, MEMCG_SWAP, + MEMCG_DMABUF, }; static const char *const memcg1_stat_names[] = { @@ -4057,6 +4060,7 @@ static const char *const memcg1_stat_names[] = { "workingset_refault_anon", "workingset_refault_file", "swap", + "dmabuf", }; /* Universal VM events cgroup1 shows, original sort order */ -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] memcg: Track exported dma-buffers 2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier @ 2023-01-10 8:58 ` Michal Hocko 2023-01-10 19:08 ` T.J. Mercier 0 siblings, 1 reply; 11+ messages in thread From: Michal Hocko @ 2023-01-10 8:58 UTC (permalink / raw) To: T.J. Mercier Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Sumit Semwal, Christian König, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, daniel.vetter, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm On Mon 09-01-23 21:38:04, T.J. Mercier wrote: > When a buffer is exported to userspace, use memcg to attribute the > buffer to the allocating cgroup until all buffer references are > released. > > Unlike the dmabuf sysfs stats implementation, this memcg accounting > avoids contention over the kernfs_rwsem incurred when creating or > removing nodes. I am not familiar with dmabuf infrastructure so please bear with me. AFAIU this patch adds a dmabuf specific counter to find out the amount of dmabuf memory used. But I do not see any actual charging implemented for that memory. I have looked at two random users of dma_buf_export cma_heap_allocate and it allocates pages to back the dmabuf (AFAIU) by cma_alloc which doesn't account to memcg, system_heap_allocate uses alloc_largest_available which relies on order_flags which doesn't seem to ever use __GFP_ACCOUNT. This would mean that the counter doesn't represent any actual memory reflected in the overall memory consumption of a memcg. I believe this is rather unexpected and confusing behavior. While some counters overlap and their sum would exceed the charged memory we do not have any that doesn't correspond to any memory (at least not for non-root memcgs). -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] memcg: Track exported dma-buffers 2023-01-10 8:58 ` Michal Hocko @ 2023-01-10 19:08 ` T.J. Mercier 0 siblings, 0 replies; 11+ messages in thread From: T.J. Mercier @ 2023-01-10 19:08 UTC (permalink / raw) To: Michal Hocko Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Sumit Semwal, Christian König, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, daniel.vetter, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm On Tue, Jan 10, 2023 at 12:58 AM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 09-01-23 21:38:04, T.J. Mercier wrote: > > When a buffer is exported to userspace, use memcg to attribute the > > buffer to the allocating cgroup until all buffer references are > > released. > > > > Unlike the dmabuf sysfs stats implementation, this memcg accounting > > avoids contention over the kernfs_rwsem incurred when creating or > > removing nodes. > > I am not familiar with dmabuf infrastructure so please bear with me. > AFAIU this patch adds a dmabuf specific counter to find out the amount > of dmabuf memory used. But I do not see any actual charging implemented > for that memory. > > I have looked at two random users of dma_buf_export cma_heap_allocate > and it allocates pages to back the dmabuf (AFAIU) by cma_alloc > which doesn't account to memcg, system_heap_allocate uses > alloc_largest_available which relies on order_flags which doesn't seem > to ever use __GFP_ACCOUNT. > > This would mean that the counter doesn't represent any actual memory > reflected in the overall memory consumption of a memcg. I believe this > is rather unexpected and confusing behavior. While some counters > overlap and their sum would exceed the charged memory we do not have any > that doesn't correspond to any memory (at least not for non-root memcgs). > > -- > Michal Hocko > SUSE Labs Thank you, that behavior is not intentional. I'm not looking at the overall memcg charge yet otherwise I would have noticed this. I think I understand what's needed for the charging part, but Shakeel mentioned some additional work for "reclaim, OOM and charge context and failure cases" on the cover letter which I need to look into. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Track exported dma-buffers with memcg 2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier 2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier @ 2023-01-10 0:18 ` Shakeel Butt 2023-01-11 22:56 ` Daniel Vetter 1 sibling, 1 reply; 11+ messages in thread From: Shakeel Butt @ 2023-01-10 0:18 UTC (permalink / raw) To: T.J. Mercier Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris, daniel.vetter, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux Hi T.J., On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier <tjmercier@google.com> wrote: > > Based on discussions at LPC, this series adds a memory.stat counter for > exported dmabufs. This counter allows us to continue tracking > system-wide total exported buffer sizes which there is no longer any > way to get without DMABUF_SYSFS_STATS, and adds a new capability to > track per-cgroup exported buffer sizes. The total (root counter) is > helpful for accounting in-kernel dmabuf use (by comparing with the sum > of child nodes or with the sum of sizes of mapped buffers or FD > references in procfs) in addition to helping identify driver memory > leaks when in-kernel use continually increases over time. With > per-application cgroups, the per-cgroup counter allows us to quickly > see how much dma-buf memory an application has caused to be allocated. > This avoids the need to read through all of procfs which can be a > lengthy process, and causes the charge to "stick" to the allocating > process/cgroup as long as the buffer is alive, regardless of how the > buffer is shared (unless the charge is transferred). > > The first patch adds the counter to memcg. The next two patches allow > the charge for a buffer to be transferred across cgroups which is > necessary because of the way most dmabufs are allocated from a central > process on Android. The fourth patch adds a SELinux hook to binder in > order to control who is allowed to transfer buffer charges. > > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/ > I am a bit confused by the term "charge" used in this patch series. From the patches, it seems like only a memcg stat is added and nothing is charged to the memcg. This leads me to the question: Why add this stat in memcg if the underlying memory is not charged to the memcg and if we don't really want to limit the usage? I see two ways forward: 1. Instead of memcg, use bpf-rstat [1] infra to implement the per-cgroup stat for dmabuf. (You may need an additional hook for the stat transfer). 2. Charge the actual memory to the memcg. Since the size of dmabuf is immutable across its lifetime, you will not need to do accounting at page level and instead use something similar to the network memory accounting interface/mechanism (or even more simple). However you would need to handle the reclaim, OOM and charge context and failure cases. However if you are not looking to limit the usage of dmabuf then this option is an overkill. Please let me know if I misunderstood something. [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/ thanks, Shakeel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Track exported dma-buffers with memcg 2023-01-10 0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt @ 2023-01-11 22:56 ` Daniel Vetter 2023-01-12 0:49 ` T.J. Mercier 2023-01-12 7:56 ` Shakeel Butt 0 siblings, 2 replies; 11+ messages in thread From: Daniel Vetter @ 2023-01-11 22:56 UTC (permalink / raw) To: Shakeel Butt Cc: T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris, daniel.vetter, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote: > Hi T.J., > > On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier <tjmercier@google.com> wrote: > > > > Based on discussions at LPC, this series adds a memory.stat counter for > > exported dmabufs. This counter allows us to continue tracking > > system-wide total exported buffer sizes which there is no longer any > > way to get without DMABUF_SYSFS_STATS, and adds a new capability to > > track per-cgroup exported buffer sizes. The total (root counter) is > > helpful for accounting in-kernel dmabuf use (by comparing with the sum > > of child nodes or with the sum of sizes of mapped buffers or FD > > references in procfs) in addition to helping identify driver memory > > leaks when in-kernel use continually increases over time. With > > per-application cgroups, the per-cgroup counter allows us to quickly > > see how much dma-buf memory an application has caused to be allocated. > > This avoids the need to read through all of procfs which can be a > > lengthy process, and causes the charge to "stick" to the allocating > > process/cgroup as long as the buffer is alive, regardless of how the > > buffer is shared (unless the charge is transferred). > > > > The first patch adds the counter to memcg. The next two patches allow > > the charge for a buffer to be transferred across cgroups which is > > necessary because of the way most dmabufs are allocated from a central > > process on Android. The fourth patch adds a SELinux hook to binder in > > order to control who is allowed to transfer buffer charges. > > > > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/ > > > > I am a bit confused by the term "charge" used in this patch series. > From the patches, it seems like only a memcg stat is added and nothing > is charged to the memcg. > > This leads me to the question: Why add this stat in memcg if the > underlying memory is not charged to the memcg and if we don't really > want to limit the usage? > > I see two ways forward: > > 1. Instead of memcg, use bpf-rstat [1] infra to implement the > per-cgroup stat for dmabuf. (You may need an additional hook for the > stat transfer). > > 2. Charge the actual memory to the memcg. Since the size of dmabuf is > immutable across its lifetime, you will not need to do accounting at > page level and instead use something similar to the network memory > accounting interface/mechanism (or even more simple). However you > would need to handle the reclaim, OOM and charge context and failure > cases. However if you are not looking to limit the usage of dmabuf > then this option is an overkill. I think eventually, at least for other "account gpu stuff in cgroups" use case we do want to actually charge the memory. The problem is a bit that with gpu allocations reclaim is essentially "we pass the error to userspace and they get to sort the mess out". There are some exceptions (some gpu drivers to have shrinkers) would we need to make sure these shrinkers are tied into the cgroup stuff before we could enable charging for them? Also note that at least from the gpu driver side this is all a huge endeavour, so if we can split up the steps as much as possible (and get something interim useable that doesn't break stuff ofc), that is practically need to make headway here. TJ has been trying out various approaches for quite some time now already :-/ -Daniel > Please let me know if I misunderstood something. > > [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/ > > thanks, > Shakeel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Track exported dma-buffers with memcg 2023-01-11 22:56 ` Daniel Vetter @ 2023-01-12 0:49 ` T.J. Mercier 2023-01-12 8:13 ` Shakeel Butt 2023-01-12 7:56 ` Shakeel Butt 1 sibling, 1 reply; 11+ messages in thread From: T.J. Mercier @ 2023-01-12 0:49 UTC (permalink / raw) To: Shakeel Butt, T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux Cc: daniel.vetter On Wed, Jan 11, 2023 at 2:56 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote: > > Hi T.J., > > > > On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier <tjmercier@google.com> wrote: > > > > > > Based on discussions at LPC, this series adds a memory.stat counter for > > > exported dmabufs. This counter allows us to continue tracking > > > system-wide total exported buffer sizes which there is no longer any > > > way to get without DMABUF_SYSFS_STATS, and adds a new capability to > > > track per-cgroup exported buffer sizes. The total (root counter) is > > > helpful for accounting in-kernel dmabuf use (by comparing with the sum > > > of child nodes or with the sum of sizes of mapped buffers or FD > > > references in procfs) in addition to helping identify driver memory > > > leaks when in-kernel use continually increases over time. With > > > per-application cgroups, the per-cgroup counter allows us to quickly > > > see how much dma-buf memory an application has caused to be allocated. > > > This avoids the need to read through all of procfs which can be a > > > lengthy process, and causes the charge to "stick" to the allocating > > > process/cgroup as long as the buffer is alive, regardless of how the > > > buffer is shared (unless the charge is transferred). > > > > > > The first patch adds the counter to memcg. The next two patches allow > > > the charge for a buffer to be transferred across cgroups which is > > > necessary because of the way most dmabufs are allocated from a central > > > process on Android. The fourth patch adds a SELinux hook to binder in > > > order to control who is allowed to transfer buffer charges. > > > > > > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@amd.com/ > > > > > > > I am a bit confused by the term "charge" used in this patch series. > > From the patches, it seems like only a memcg stat is added and nothing > > is charged to the memcg. > > > > This leads me to the question: Why add this stat in memcg if the > > underlying memory is not charged to the memcg and if we don't really > > want to limit the usage? > > > > I see two ways forward: > > > > 1. Instead of memcg, use bpf-rstat [1] infra to implement the > > per-cgroup stat for dmabuf. (You may need an additional hook for the > > stat transfer). > > > > 2. Charge the actual memory to the memcg. Since the size of dmabuf is > > immutable across its lifetime, you will not need to do accounting at > > page level and instead use something similar to the network memory > > accounting interface/mechanism (or even more simple). However you > > would need to handle the reclaim, OOM and charge context and failure > > cases. However if you are not looking to limit the usage of dmabuf > > then this option is an overkill. > > I think eventually, at least for other "account gpu stuff in cgroups" use > case we do want to actually charge the memory. > Yes, I've been looking at this today. > The problem is a bit that with gpu allocations reclaim is essentially "we > pass the error to userspace and they get to sort the mess out". There are > some exceptions (some gpu drivers to have shrinkers) would we need to make > sure these shrinkers are tied into the cgroup stuff before we could enable > charging for them? > I'm also not sure that we can depend on the dmabuf being backed at export time 100% of the time? (They are for dmabuf heaps.) If not, that'd make calling the existing memcg folio based functions a bit difficult. > Also note that at least from the gpu driver side this is all a huge > endeavour, so if we can split up the steps as much as possible (and get > something interim useable that doesn't break stuff ofc), that is > practically need to make headway here. TJ has been trying out various > approaches for quite some time now already :-/ > -Daniel > > > Please let me know if I misunderstood something. > > > > [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@google.com/ > > > > thanks, > > Shakeel > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Track exported dma-buffers with memcg 2023-01-12 0:49 ` T.J. Mercier @ 2023-01-12 8:13 ` Shakeel Butt 2023-01-12 8:17 ` Christian König 0 siblings, 1 reply; 11+ messages in thread From: Shakeel Butt @ 2023-01-12 8:13 UTC (permalink / raw) To: T.J. Mercier Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux, daniel.vetter On Wed, Jan 11, 2023 at 04:49:36PM -0800, T.J. Mercier wrote: > [...] > > The problem is a bit that with gpu allocations reclaim is essentially "we > > pass the error to userspace and they get to sort the mess out". There are > > some exceptions (some gpu drivers to have shrinkers) would we need to make > > sure these shrinkers are tied into the cgroup stuff before we could enable > > charging for them? > > > I'm also not sure that we can depend on the dmabuf being backed at > export time 100% of the time? (They are for dmabuf heaps.) If not, > that'd make calling the existing memcg folio based functions a bit > difficult. > Where does the actual memory get allocated? I see the first patch is updating the stat in dma_buf_export() and dma_buf_release(). Does the memory get allocated and freed in those code paths? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Track exported dma-buffers with memcg 2023-01-12 8:13 ` Shakeel Butt @ 2023-01-12 8:17 ` Christian König 0 siblings, 0 replies; 11+ messages in thread From: Christian König @ 2023-01-12 8:17 UTC (permalink / raw) To: Shakeel Butt, T.J. Mercier Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux, daniel.vetter Am 12.01.23 um 09:13 schrieb Shakeel Butt: > On Wed, Jan 11, 2023 at 04:49:36PM -0800, T.J. Mercier wrote: > [...] >>> The problem is a bit that with gpu allocations reclaim is essentially "we >>> pass the error to userspace and they get to sort the mess out". There are >>> some exceptions (some gpu drivers to have shrinkers) would we need to make >>> sure these shrinkers are tied into the cgroup stuff before we could enable >>> charging for them? >>> >> I'm also not sure that we can depend on the dmabuf being backed at >> export time 100% of the time? (They are for dmabuf heaps.) If not, >> that'd make calling the existing memcg folio based functions a bit >> difficult. >> > Where does the actual memory get allocated? I see the first patch is > updating the stat in dma_buf_export() and dma_buf_release(). Does the > memory get allocated and freed in those code paths? Nope, dma_buf_export() just makes the memory available to others. The driver which calls dma_buf_export() is the one allocating the memory. Regards, Christian. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Track exported dma-buffers with memcg 2023-01-11 22:56 ` Daniel Vetter 2023-01-12 0:49 ` T.J. Mercier @ 2023-01-12 7:56 ` Shakeel Butt 2023-01-12 10:25 ` Michal Hocko 1 sibling, 1 reply; 11+ messages in thread From: Shakeel Butt @ 2023-01-12 7:56 UTC (permalink / raw) To: T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Christian König, Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux On Wed, Jan 11, 2023 at 11:56:45PM +0100, Daniel Vetter wrote: > [...] > I think eventually, at least for other "account gpu stuff in cgroups" use > case we do want to actually charge the memory. > > The problem is a bit that with gpu allocations reclaim is essentially "we > pass the error to userspace and they get to sort the mess out". There are > some exceptions (some gpu drivers to have shrinkers) would we need to make > sure these shrinkers are tied into the cgroup stuff before we could enable > charging for them? > No, there is no requirement to have shrinkers or making such memory reclaimable before charging it. Though existing shrinkers and the possible future shrinkers would need to be converted into memcg aware shrinkers. Though there will be a need to update user expectations that if they use memcgs with hard limits, they may start seeing memcg OOMs after the charging of dmabuf. > Also note that at least from the gpu driver side this is all a huge > endeavour, so if we can split up the steps as much as possible (and get > something interim useable that doesn't break stuff ofc), that is > practically need to make headway here. This sounds reasonable to me. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] Track exported dma-buffers with memcg 2023-01-12 7:56 ` Shakeel Butt @ 2023-01-12 10:25 ` Michal Hocko 0 siblings, 0 replies; 11+ messages in thread From: Michal Hocko @ 2023-01-12 10:25 UTC (permalink / raw) To: Shakeel Butt Cc: T.J. Mercier, Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan, Sumit Semwal, Christian König, Roman Gushchin, Muchun Song, Andrew Morton, Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris, android-mm, jstultz, cgroups, linux-doc, linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm, linux-security-module, selinux On Thu 12-01-23 07:56:31, Shakeel Butt wrote: > On Wed, Jan 11, 2023 at 11:56:45PM +0100, Daniel Vetter wrote: > > > [...] > > I think eventually, at least for other "account gpu stuff in cgroups" use > > case we do want to actually charge the memory. > > > > The problem is a bit that with gpu allocations reclaim is essentially "we > > pass the error to userspace and they get to sort the mess out". There are > > some exceptions (some gpu drivers to have shrinkers) would we need to make > > sure these shrinkers are tied into the cgroup stuff before we could enable > > charging for them? > > > > No, there is no requirement to have shrinkers or making such memory > reclaimable before charging it. Though existing shrinkers and the > possible future shrinkers would need to be converted into memcg aware > shrinkers. > > Though there will be a need to update user expectations that if they > use memcgs with hard limits, they may start seeing memcg OOMs after the > charging of dmabuf. Agreed. This wouldn't be the first in kernel memory charged memory that is not directly reclaimable. With a dedicated counter an excessive dmabuf usage would be visible in the oom report because we do print memcg stats. It is definitely preferable to have a shrinker mechanism but if that is to be done in a follow up step then this is acceptable. But leaving out charging from early on sounds like a bad choice to me. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-01-12 10:25 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-09 21:38 [PATCH 0/4] Track exported dma-buffers with memcg T.J. Mercier 2023-01-09 21:38 ` [PATCH 1/4] memcg: Track exported dma-buffers T.J. Mercier 2023-01-10 8:58 ` Michal Hocko 2023-01-10 19:08 ` T.J. Mercier 2023-01-10 0:18 ` [PATCH 0/4] Track exported dma-buffers with memcg Shakeel Butt 2023-01-11 22:56 ` Daniel Vetter 2023-01-12 0:49 ` T.J. Mercier 2023-01-12 8:13 ` Shakeel Butt 2023-01-12 8:17 ` Christian König 2023-01-12 7:56 ` Shakeel Butt 2023-01-12 10:25 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox