From: Jason Gunthorpe <jgg@nvidia.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Yosry Ahmed <yosryahmed@google.com>,
Alistair Popple <apopple@nvidia.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, jhubbard@nvidia.com,
tjmercier@google.com, hannes@cmpxchg.org, surenb@google.com,
mkoutny@suse.com, daniel@ffwll.ch,
"Daniel P . Berrange" <berrange@redhat.com>,
Alex Williamson <alex.williamson@redhat.com>,
Zefan Li <lizefan.x@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 14/19] mm: Introduce a cgroup for pinned memory
Date: Thu, 16 Feb 2023 08:45:38 -0400 [thread overview]
Message-ID: <Y+4lcq4Fge27TQIn@nvidia.com> (raw)
In-Reply-To: <Y+3jcw9vo4ml5p0M@dhcp22.suse.cz>
On Thu, Feb 16, 2023 at 09:04:03AM +0100, Michal Hocko wrote:
> > In most cases the ownship traces back to a file descriptor. When the
> > file is closed the pin goes away.
>
> This assumes a specific use of {un}pin_user_page*, right? IIUC the
> cgroup charging is meant to be used from vm_account but that doesn't
> really tell anything about the lifetime nor the ownership. Maybe this is
> just a matter of documentation update...
Yes documentation.
> > > The interface itself doesn't talk about
> > > anything like that and so it seems perfectly fine to unpin from a
> > > completely different context then pinning.
> >
> > Yes, concievably the close of the FD can be in a totally different
> > process with a different cgroup.
>
> Wouldn't you get an unbalanced charges then? How can admin recover that
> situation?
No, the approach in this patch series captures the cgroup that was
charged and stores it in the FD until uncharge.
This is the same as we do today for rlimit. The user/process that is
charged is captured and the uncharge always applies to user/process
that was charged, not the user/process that happens to be associated
with the uncharging context.
cgroup just add another option so it is user/process/cgroup that can
hold the charge.
It is conceptually similar to how each struct page has the memcg that
its allocation was charged to - we just record this in the FD not the
page.
> > > Another thing that is not really clear to me is how the limit is
> > > actually going to be used in practice. As there is no concept of a
> > > reclaim for pins then I can imagine that it would be quite easy to
> > > reach the hard limit and essentially DoS any further use of pins.
> >
> > Yes, that is the purpose. It is to sandbox pin users to put some limit
> > on the effect they have on the full machine.
> >
> > It replaces the rlimit mess that was doing the same thing.
>
> arguably rlimit has a concept of the owner at least AFAICS. I do realize
> this is not really great wrt a high level resource control though.
rlimit uses either the user or the process as the "owner". In this
model we view a cgroup as the "owner". The lifetime logic is all the
same, you figure out the owner (cgroup/user/process) when the charge
is made and record it, when the uncharge comes the recorded owner is
uncharged.
It never allows unbalanced charge/uncharge because that would be a
security problem even for rlimit cases today.
Jason
next prev parent reply other threads:[~2023-02-16 12:45 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-06 7:47 [PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and " Alistair Popple
2023-02-06 7:47 ` [PATCH 01/19] mm: Introduce vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 02/19] drivers/vhost: Convert to use vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 03/19] drivers/vdpa: Convert vdpa to use the new vm_structure Alistair Popple
2023-02-06 7:47 ` [PATCH 04/19] infiniband/umem: Convert to use vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 05/19] RMDA/siw: " Alistair Popple
2023-02-12 17:32 ` Bernard Metzler
2023-02-06 7:47 ` [PATCH 06/19] RDMA/usnic: convert " Alistair Popple
2023-02-06 7:47 ` [PATCH 07/19] vfio/type1: Charge pinned pages to pinned_vm instead of locked_vm Alistair Popple
2023-02-06 7:47 ` [PATCH 08/19] vfio/spapr_tce: Convert accounting to pinned_vm Alistair Popple
2023-02-06 7:47 ` [PATCH 09/19] io_uring: convert to use vm_account Alistair Popple
2023-02-06 15:29 ` Jens Axboe
2023-02-07 1:03 ` Alistair Popple
2023-02-07 14:28 ` Jens Axboe
2023-02-07 14:55 ` Jason Gunthorpe
2023-02-07 17:05 ` Jens Axboe
2023-02-13 11:30 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 10/19] net: skb: Switch to using vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 11/19] xdp: convert to use vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 12/19] kvm/book3s_64_vio: Convert account_locked_vm() to vm_account_pinned() Alistair Popple
2023-02-06 7:47 ` [PATCH 13/19] fpga: dfl: afu: convert to use vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 14/19] mm: Introduce a cgroup for pinned memory Alistair Popple
2023-02-06 21:01 ` Yosry Ahmed
2023-02-06 21:14 ` Tejun Heo
2023-02-06 22:32 ` Yosry Ahmed
2023-02-06 22:36 ` Tejun Heo
2023-02-06 22:39 ` Yosry Ahmed
2023-02-06 23:25 ` Tejun Heo
2023-02-06 23:34 ` Yosry Ahmed
2023-02-06 23:40 ` Jason Gunthorpe
2023-02-07 0:32 ` Tejun Heo
2023-02-07 12:19 ` Jason Gunthorpe
2023-02-15 19:00 ` Michal Hocko
2023-02-15 19:07 ` Jason Gunthorpe
2023-02-16 8:04 ` Michal Hocko
2023-02-16 12:45 ` Jason Gunthorpe [this message]
2023-02-21 16:51 ` Tejun Heo
2023-02-21 17:25 ` Jason Gunthorpe
2023-02-21 17:29 ` Tejun Heo
2023-02-21 17:51 ` Jason Gunthorpe
2023-02-21 18:07 ` Tejun Heo
2023-02-21 19:26 ` Jason Gunthorpe
2023-02-21 19:45 ` Tejun Heo
2023-02-21 19:49 ` Tejun Heo
2023-02-21 19:57 ` Jason Gunthorpe
2023-02-22 11:38 ` Alistair Popple
2023-02-22 12:57 ` Jason Gunthorpe
2023-02-22 22:59 ` Alistair Popple
2023-02-23 0:05 ` Christoph Hellwig
2023-02-23 0:35 ` Alistair Popple
2023-02-23 1:53 ` Jason Gunthorpe
2023-02-23 9:12 ` Daniel P. Berrangé
2023-02-23 17:31 ` Jason Gunthorpe
2023-02-23 17:18 ` T.J. Mercier
2023-02-23 17:28 ` Jason Gunthorpe
2023-02-23 18:03 ` Yosry Ahmed
2023-02-23 18:10 ` Jason Gunthorpe
2023-02-23 18:14 ` Yosry Ahmed
2023-02-23 18:15 ` Tejun Heo
2023-02-23 18:17 ` Jason Gunthorpe
2023-02-23 18:22 ` Tejun Heo
2023-02-07 1:00 ` Waiman Long
2023-02-07 1:03 ` Tejun Heo
2023-02-07 1:50 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 15/19] mm/util: Extend vm_account to charge pages against the pin cgroup Alistair Popple
2023-02-06 7:47 ` [PATCH 16/19] mm/util: Refactor account_locked_vm Alistair Popple
2023-02-06 7:47 ` [PATCH 17/19] mm: Convert mmap and mlock to use account_locked_vm Alistair Popple
2023-02-06 7:47 ` [PATCH 18/19] mm/mmap: Charge locked memory to pins cgroup Alistair Popple
2023-02-06 21:12 ` Yosry Ahmed
2023-02-06 7:47 ` [PATCH 19/19] selftests/vm: Add pins-cgroup selftest for mlock/mmap Alistair Popple
2023-02-16 11:01 ` [PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y+4lcq4Fge27TQIn@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=apopple@nvidia.com \
--cc=berrange@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=daniel@ffwll.ch \
--cc=hannes@cmpxchg.org \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=tjmercier@google.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox