From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6AD1C61DA3 for ; Tue, 21 Feb 2023 19:45:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B34D6B0075; Tue, 21 Feb 2023 14:45:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5627A6B007B; Tue, 21 Feb 2023 14:45:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42ACB6B007D; Tue, 21 Feb 2023 14:45:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 323006B0075 for ; Tue, 21 Feb 2023 14:45:21 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0305B1C5E23 for ; Tue, 21 Feb 2023 19:45:20 +0000 (UTC) X-FDA: 80492328042.02.8D4033F Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf14.hostedemail.com (Postfix) with ESMTP id 06C9E100019 for ; Tue, 21 Feb 2023 19:45:18 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZEHG7lOn; spf=pass (imf14.hostedemail.com: domain of htejun@gmail.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677008719; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N8Fw863vYfPuhJasko8shrY4FAYlAxC3uA3w425iqK0=; b=LoNJjNwuEAt83vmZamma/u9BFX17+ytwT/OG0cVlK7uuiXCwTxcmr9F2miDUSM3fGYT+bJ q/+g2D8+LJNy6d8rnIpx9WjAEN8YWb5+kIPtHzSV9z7A9sotADdyRjI1aSs27cTfGuS/mk 2Wp3ghFTE0BzZKwlfEgp+D593UnI8NM= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZEHG7lOn; spf=pass (imf14.hostedemail.com: domain of htejun@gmail.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677008719; a=rsa-sha256; cv=none; b=AH/KRtJUbY8F4UgbyfJSge0mstIb8iVPltT59c6KNkUmrwMJPMVhwe0i+4HWBds2pN1iUB swCqU4oPfChv5oiPxar9w2BNznIZmm/I1ht9YigbSli9s4Bu7UhuShg33uKcpWqsllR/rV kuRxVufqMULKyWexLDtELpdTG2QoCb4= Received: by mail-pg1-f170.google.com with SMTP id t1so2717524pgi.2 for ; Tue, 21 Feb 2023 11:45:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=N8Fw863vYfPuhJasko8shrY4FAYlAxC3uA3w425iqK0=; b=ZEHG7lOnQJAZUYa9JABpoxh1o9+wBCbhvsehzjqT+E7quaMrJOPsaccDZe52r0U5eI Wl2Yv3964tGhKOkZTfcjpmJwvuFGvtBi3oy09nbaYT62gAUZDrnR4pSbTuU2SFVNuy1P CmdrtDQ8EmgggBqy4VGb5JBmMaVO6gkqnVpsJYqgG7BDSo115dqSwWqgxH/Ipu6ne0bP GRAvVac7/7B1rzHhTXy/NA4aOz0UfW9JcqKwvCQsKSYY41tQ/mLshdOz6uioAWmSqo72 aawleuhd63JPO8Q7OzYdyMlLKHm5kyK/vjZ8lghkG+sD0zV5fIspvbCUFMrDfyV8F7XO r4TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N8Fw863vYfPuhJasko8shrY4FAYlAxC3uA3w425iqK0=; b=CxK/C9fLXJAX+n5EZSgwcafEbdOcNKNvQjqwnMbVzkbYXahibP6dg1Dj1cVXJf9xC0 cwgfceW8ccDooJciK/Jl79w1Rv80BK0zix89MLXPM+BBiXWKEXC77N24ItUpwKDyhYRL AVL+H8soiGVHCZWBuS5OTNTv4wv7H4ctfOOxLYcO1T4z93ooHseRu/65NzwmJ4lhrWxd irHJPtMuom2Tt7C/AsZ5EFAhtjocfJ4KlDsJY+YW8eKhb6zgjgto2QfEUOX+GXwswsLh ZHEiM0oKt/w5qYm10wU4Nkdwc+2V5d2IBgdhZB/yHnIZAkVdOl3MgLiFhecM+eO0X4QE MsqA== X-Gm-Message-State: AO0yUKXM07PZu7K91SBaGCZRq4cz4NrV974OZtm3yRqK0o2Dl16sScIT yuf7C0ezg7t28ypg4md5w8I= X-Google-Smtp-Source: AK7set8urRSGStm1IPUPo50I4THD3x06nUbSBRBi4r/DqWJckfJiO3y1/5Ab6xlK4XDSli4bc6aj1w== X-Received: by 2002:aa7:9407:0:b0:5a9:ea47:cd00 with SMTP id x7-20020aa79407000000b005a9ea47cd00mr5133411pfo.17.1677008717600; Tue, 21 Feb 2023 11:45:17 -0800 (PST) Received: from localhost (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com. [2603:800c:1a02:1bae:a7fa:157f:969a:4cde]) by smtp.gmail.com with ESMTPSA id k13-20020aa7820d000000b0058e08796e98sm4146160pfi.196.2023.02.21.11.45.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Feb 2023 11:45:16 -0800 (PST) Date: Tue, 21 Feb 2023 09:45:15 -1000 From: Tejun Heo To: Jason Gunthorpe Cc: Michal Hocko , Yosry Ahmed , Alistair Popple , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, jhubbard@nvidia.com, tjmercier@google.com, hannes@cmpxchg.org, surenb@google.com, mkoutny@suse.com, daniel@ffwll.ch, "Daniel P . Berrange" , Alex Williamson , Zefan Li , Andrew Morton Subject: Re: [PATCH 14/19] mm: Introduce a cgroup for pinned memory Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 06C9E100019 X-Stat-Signature: g6rpduwmor1xnu86tt3jqru6x3xoi5sh X-Rspam-User: X-HE-Tag: 1677008718-711768 X-HE-Meta: U2FsdGVkX1/S8MTydyRvxC5wWMuJ3Pqk//KCZ1QTYR/r/0SBwTeFloTaT6cOPtDjxjmEOBepAN81wT5KFKNy+DkiAz8DDUJuuH7XTcjnWhs5jKyhFCiiShYcN9u6hUuhyd6RE+o9mbE1Vq4YXYfWy3ReJoSvr0H22o0dPenKggRrUQW7QRsFhyPPGnW+iFdSS0bxViUUFbur2B3Zp+4bdsrL1IkAOzw5eelXTYGnoyKrR/UOL6/xGcoBzN+f160b89t36FCL04AnJzJUOxqAui1wx7vBUc9sXgvBMeTTDk9Yi6VAEb04MLTMFNGYROlKF/BM0TAlXUyCk/qiyZibvPBc2Ellae2878slNDrVgUvUr39FzzFxH281rmiGIMFpV2lErhcZCc7wo2yopbEcIJvC6t3jAznFf6hlu9NteOe4HleztNGnuiYfJ9jdDJPD0c9nbrk9kEWc/kr5xQBfez+kU/7AoSM2VV23lzvCZONGiFOBZZ8Ry8efOUgX3d9LneAPENAgYltz99yP2qqGvJb39sW4i9njTGqyuS4r2pCQXbQlr6NFsLJ2fq0XP12ecIlMFqa6WeQ09NZMgYW1nKAx3RLTIbwbDu4/PKkXi+M77oXS73L5P8+g8giPeZEWpQW9MLUnC+YxyxeKQT6VNyNKqC9iy6xy9iemCY8ug9NqBZLrT7aZ+YbppOBHDVNYsJe9P+miXFt22SRD7+Je1bcOOkCPxHKtSDGeGujx/wGAHtqPEa/npSasKazgSgw9tFa531VPQp3TkK2bdIheykU705917qujw/lyL4WZi1t3lUxL5fLyOVAJYEW4jg837CI1T1ipsWIVI7ffeenPseLy8mR/x3/qL8XPq83kU9TFR8eXvSEjKoYSeqNAcINmBuNbxEC1uQN2SCIkPDTCTTZaQrKkqNH75PcRh67JvIvpguxbib5HuIu0STtTAN6RAAtHEP59P8B/g0G1GFC Bn34peRp nwV3NUijJMD+CGEPrfMc1MyTca5yh14MSOw5qwpoCL2nhIkEzhzw7pp1nlcKa0SjQcwshHf7j+tofFgJXf4V1S3fEMPGbS53tcC+MvT4AzLLHAQ3ys6awp+MYbGm8hPTkl/+zsey1n7n2aVz2/B0AUX3K4bGC/OUZYbmz8jgomGTf1xR+yiP9liON2TRyexBUP6rQpH2kD1/zOx5dq9EE/rCLpHPSxWykRdfy/UGSobTO5KJej6ISPMnLXn1hr95h63SYLmXCxuHnJGIYul1bDpW4H5AukzT9zls/7N2IW7QHz5JCQdSvFrtQM2nQ41V0xSlaWLrXlZ3HBFLOkOiI/xO4Z0Lz/dleZvlzR3aX46wWdAygFwIGviSvN2SMBu8ZuO3par115EZXXRhEaHcp135uYQewk497YhjTKrf+JTqxB4oKJASwg6kV/o/L8HKEHBfuxYNstVjEfxGE82BjtXbZLjrGUPZ+ggbDG/Q+zw633GXXzy8Y2Iy5TQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Tue, Feb 21, 2023 at 03:26:33PM -0400, Jason Gunthorpe wrote: > On Tue, Feb 21, 2023 at 08:07:13AM -1000, Tejun Heo wrote: > > > AFAIK there are few real use cases to establish a pin on MAP_SHARED > > > mappings outside your cgroup. However, it is possible, the APIs allow > > > it, and for security sandbox purposes we can't allow a process inside > > > a cgroup to triger a charge on a different cgroup. That breaks the > > > sandbox goal. > > > > It seems broken anyway. Please consider the following scenario: > > Yes, this is broken like this already today - memcg doesn't work > entirely perfectly for MAP_SHARED scenarios, IMHO. It is far from perfect but the existing behavior isn't that broken. e.g. in the same scenario, without pinning, even if the larger cgroup keeps using the same page, the smaller cgroup should be able to evict the pages as they are not pinned and the cgroup is under heavy reclaim pressure. The larger cgroup will refault them back in and end up owning those pages. memcg can't capture the case of the same pages being actively shared by multiple cgroups concurrently (I think those cases should be handled by pushing them to the common parent as discussed elswhere but that's a separate topic) but it can converge when page usage transfers across cgroups if needed. Disassociating ownership and pinning will break that in an irreversible way. > > > > for whatever reason is determining the pinning ownership or should the page > > > > ownership be attributed the same way too? If they indeed need to differ, > > > > that probably would need pretty strong justifications. > > > > > > It is inherent to how pin_user_pages() works. It is an API that > > > establishs pins on existing pages. There is nothing about it that says > > > who the page's memcg owner is. > > > > > > I don't think we can do anything about this without breaking things. > > > > That's a discrepancy in an internal interface and we don't wanna codify > > something like that into userspace interface. Semantially, it seems like if > > pin_user_pages() wanna charge pinning to the cgroup associated with an fd > > (or whatever), it should also claim the ownership of the pages > > themselves. > > Multiple cgroup can pin the same page, so it is not as simple as just > transfering ownership, we need multi-ownership and to really fix the > memcg limitations with MAP_SHARED without an API impact. > > You are right that pinning is really just a special case of > allocation, but there is a reason the memcg was left with weak support > for MAP_SHARED and changing that may be more than just hard but an > infeasible trade off.. > > At least I don't have a good idea how to even approach building a > reasonable datstructure that can track the number of > charges per-cgroup per page. :\ As I wrote above, I don't think the problem here is the case of pages being shared by multiple cgroups concurrently. We can leave that problem for another thread. However, if we want to support accounting and control of pinned memory, we really shouldn't introduce a fundmental discrepancy like the owner and pinner disagreeing with each other. At least conceptually, the solution is rather straight-forward - whoever pins a page should also claim the ownership of it. Thanks. -- tejun