From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98658C636D7 for ; Tue, 21 Feb 2023 19:26:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 325666B0072; Tue, 21 Feb 2023 14:26:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D4C26B0073; Tue, 21 Feb 2023 14:26:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14F276B0074; Tue, 21 Feb 2023 14:26:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 076396B0072 for ; Tue, 21 Feb 2023 14:26:47 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BA76DA07A5 for ; Tue, 21 Feb 2023 19:26:46 +0000 (UTC) X-FDA: 80492281212.14.EB6591C Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2043.outbound.protection.outlook.com [40.107.223.43]) by imf10.hostedemail.com (Postfix) with ESMTP id 3A1ECC000E for ; Tue, 21 Feb 2023 19:26:42 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=pCO+Q6um; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf10.hostedemail.com: domain of jgg@nvidia.com designates 40.107.223.43 as permitted sender) smtp.mailfrom=jgg@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677007603; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qJoetxTueRdwFyG7muRq2iwGc/+2jwQdIe1eeQ2yk2w=; b=3zuIUvt6zPqfqba9YozB3srwDk4zJErGNecM1u9//EIZSq9TWgqbaX6NX9jtUmfI2S+ag/ jpa702AAOB6xi/O3g/r1p6hMa93T1pZpYqab3gvQ7/ivFxH8ZzxxjejBq8AdYeYO2O0XLv dhF9Dvctb/SNnWj3wyZQ+V6Yyc089p4= ARC-Authentication-Results: i=2; imf10.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=pCO+Q6um; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf10.hostedemail.com: domain of jgg@nvidia.com designates 40.107.223.43 as permitted sender) smtp.mailfrom=jgg@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1677007603; a=rsa-sha256; cv=pass; b=JNFdc6cJyvziUfhMaSYbEgX+M74WKxpbESo37qY0vDTXlrGXNW5lpGE2ctzQGmOtDdaxFO oldt6gNJo2tlT7KP4VBG+32FRh7bsRrjmuUFkfv58gduKPRIZ7sWjsyqWz7nMSK7GEiWNe lp6m2/hje0Pfc3SD9AWkJrL3je5ML7A= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cBZPn5OjTfGO8eKjVUOdemhc0tCg5k6EREvyruDv0gL2mI4E1Mrgz4Q8zY3WL3iKMAHKUWH5xJYontvJpzYJomaH6/Sr3Pagcqn+qNyCLBnm6SXZGBEVdZKSIIXWMHmvKjl461v+WQNNkYlzsJHGFEWjVLnJfdqm6Bz5pUjskPvxSpeSoeQRltgaDbD+c3FaIpn6byKK4u+N+i2dT9Zlkx1qR/Caq8+5jObAJYgMal6zedqJElD0jlxGaQkIkVzIVxqfYgN/jc2LUSPcy88ju3YKDxJLY8lkbviqzUXOKqzgkwwFu68lgKh3Pv+WhNky9q9LrMHBRTt+A3WR4e1lOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qJoetxTueRdwFyG7muRq2iwGc/+2jwQdIe1eeQ2yk2w=; b=Hvt6t6K0Cl/vQ8/hlQEIJy14i2p8EoFVgsc0Q4OewX0n3RbDk2RwyXQg4TAB6wuNApAzUSV2sgSZkFTDZBEjm+93/iXNi9MxwxC+4K0hA49L5yWKFqFqkoovpE3aDvI2+oCTd6viKOGlefUhfiziq697xUNPSQ9tyDehhnhXduHWaSxWB6g6tRgo1q4/7c6tCot4AKzG/uGpMrtuT4HKkQr96dpErzaaoJjMI63CqcPWlj3A1kzSVQs1a8iMNL/zogaDWBu7j7LEjbHvBRQN4QO6zwzoZ5SlrLqK3dMxSRsP4Cgt48ZBc1UTVC8dPgYfjnU+XNXxUw+1MAzc2oOsKQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qJoetxTueRdwFyG7muRq2iwGc/+2jwQdIe1eeQ2yk2w=; b=pCO+Q6umjo/+QS+sxMVcUAonGFKpTnHbjq5/jueZF1XWNYdx/k4T62YUpNARyMyjZy5KAQyGHlIsYpnZHAMD4sJZ8m9fwzu3OLUexUcsN4ua3zRAzzHdiCG/UXjfSz+r5Sb8pYt057W2S1uH63WknLc2yjE+drxFV1x9ZyT/qPlzIZb90Cbw8cYEFInLYGLzbady1RgW+woipbPzCracqGCaNJUElag8o2u2rF+XqljScpdXbQ7lvDkyLBJ0ifJiGm9v9xdnpWGZUejmREFEdGXcOpekVlQGeCwAiLby/coavhLN1tCfFoSL3CxLtrtoSQ0Qb35l+a/Q5ojDr/CvmQ== Received: from LV2PR12MB5869.namprd12.prod.outlook.com (2603:10b6:408:176::16) by DS0PR12MB7725.namprd12.prod.outlook.com (2603:10b6:8:136::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6111.21; Tue, 21 Feb 2023 19:26:40 +0000 Received: from LV2PR12MB5869.namprd12.prod.outlook.com ([fe80::3cb3:2fce:5c8f:82ee]) by LV2PR12MB5869.namprd12.prod.outlook.com ([fe80::3cb3:2fce:5c8f:82ee%4]) with mapi id 15.20.6111.021; Tue, 21 Feb 2023 19:26:39 +0000 Date: Tue, 21 Feb 2023 15:26:33 -0400 From: Jason Gunthorpe To: Tejun Heo Cc: Michal Hocko , Yosry Ahmed , Alistair Popple , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, jhubbard@nvidia.com, tjmercier@google.com, hannes@cmpxchg.org, surenb@google.com, mkoutny@suse.com, daniel@ffwll.ch, "Daniel P . Berrange" , Alex Williamson , Zefan Li , Andrew Morton Subject: Re: [PATCH 14/19] mm: Introduce a cgroup for pinned memory Message-ID: References: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: BL1PR13CA0233.namprd13.prod.outlook.com (2603:10b6:208:2bf::28) To LV2PR12MB5869.namprd12.prod.outlook.com (2603:10b6:408:176::16) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV2PR12MB5869:EE_|DS0PR12MB7725:EE_ X-MS-Office365-Filtering-Correlation-Id: 30550aa4-da80-4f47-bc16-08db14418e3f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GLLr87YI+SBC1hzkcZdat6+s0q0VTyiEOC5BT8OjQiJuNKC8OS2Y0A9Gaki0cuTfk2J0SlANU3vvmT10tJSo3vEm1sFD/eAtXjv90JUHkVG8jifeCUeIhUsK0HeTep54K7JoedjlvkEWxPuBoA/N2C1t/9oFx9bOmcEh2JvB043PRdeRjhXeyDzmWpjdKelI8kUquOv/Bc7salXG2tEbtcRgFlCnlEs9QNbiW20z1Xzb+YZwWVPaQ1hC9gw11piaMb0mHB7rAFIrI93g7uOAIp0V3txbM7vFLuq9TmOyPu5Of9wq/2P4OEDev0LyH82Wv6vj9Y5scrhLHjBx/xS7AtXch4mA4ypVS+an38bKIJoMK7MyVKKnLfgda1uGZObiw+zKq2RqD3Ls1uO9HrWVG/Oc92a+7s1HeBej1XxjOecPVsgd39MPzwVyNzawLY+Gw2njbmRsoGH7Rne1tErRrSq8x1pf8hh8RTltKwR5OpZbJlQVzuQWCXYhnh5ruQOqg2FLFXsVqi2QNePxuP/BPSWo1uxNtKz3Ahf+bzrYSrFw3VttU2iQ5WbTLVT8AFgbb/lJ6b18nHMshEPlvymA3xEDzgB0QtvXJAirHxD3djub4dz2sK/8rrv3VWh/e0cO1q3I1H03OEtEFzUA9+F7hQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV2PR12MB5869.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(4636009)(376002)(396003)(136003)(39860400002)(366004)(346002)(451199018)(8936002)(7416002)(5660300002)(83380400001)(41300700001)(2906002)(38100700002)(54906003)(6666004)(2616005)(66476007)(66556008)(66946007)(8676002)(6916009)(4326008)(6486002)(478600001)(186003)(86362001)(6512007)(36756003)(6506007)(26005)(316002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?V1wM8VrubHxKqHUQYvjXFfvTUnnmpjgBy35HRhaVru0yNqT1g8j6IH0ghOl/?= =?us-ascii?Q?oM937/JnMPdBYCnSBeRRpXO8Hsq4rXhy7Z0D2J4rtlFe9ynNdwztLb3wGRnw?= =?us-ascii?Q?Ad+GBlT8k35RQhAmNKLVzkjILeQG1nx3dY4f1zLGivfWUibPYOq4/gBRDXue?= =?us-ascii?Q?iXnY3TdMwwqZhQkMW2Myre+btpOo0yKQ7JEgZtmz7HUU9bKTtjohJ8weNzxg?= =?us-ascii?Q?FhoVdVmxtNoH4dDKCuropBZrGbyYiXOnLx40RNKO1t8RpuvEz27Ee4mBSz5+?= =?us-ascii?Q?hyjoNuLqAb1Xeh0JAiffKEG+R1f5HZa1szLGVSZb1xzJT3X6qkkzW8DWvj9i?= =?us-ascii?Q?xejdgqAIcFOhdb0LPr4VnuoG2xiduTJvvhbw2WIIec/wUqMApC0DeuTaTe1N?= =?us-ascii?Q?NDyWe9s07Sp3O4XMG2etoAv0h/s8eYDngBhIqQT9CYAM29JhmVLzjszfFvEX?= =?us-ascii?Q?KURLzBYY8MfFt6CTIFmADRj+vZ4dPbWdxKIRNpDSTnU5zQWNht2Dt8yDeqe7?= =?us-ascii?Q?Z2If+8qVlv9Mn93tgSzrnkyCOaD1NsJWzLD73j7d5TItom5PY0qcjcOgR3qU?= =?us-ascii?Q?cw4PZ/22jJNOIMFKucu2rBXOrZ8NKBP2qNarLtIvWFDfNnXkLh8ZUzc9hdMP?= =?us-ascii?Q?OaQ2Aw+SiUYOlE1tq1T2Y31B9iWP1mH1gbo2z6PDFkvZYEu0tscgbzkM2qAf?= =?us-ascii?Q?bAXZIY5IETiqJoxHpO3b6/Gvi/5yD0l+uD5V+kPj/2Cd6vuH8OV2ee1LLFE7?= =?us-ascii?Q?mgo77fBpFkiUxdKTQfOnYBZ+w9PZNnuEgJP5XsHdFqB1VkDIQrd18pJRAypN?= =?us-ascii?Q?eLUTQ3LnKfqFLeGG9GVqnFXy+2uuI/lOpPRiiexA7jkOgSb8COTbdMRsi5/y?= =?us-ascii?Q?HZku1P6LKcui9FMTx6ApMWa0uZhpCzgMALoBK+8SAZddMsbbKlMTB7NFIcQZ?= =?us-ascii?Q?0LiVtZm+Fv3Y6/+rb5kwshRu8fW6r9mHu74P8GiNEPUpjZYpmVE/QRb9C3xx?= =?us-ascii?Q?AaBsuF2jnJLtGD538RlPJMApZnBXqb9b6ITXsDDIOoMHjJkWup0C33bUfSdL?= =?us-ascii?Q?2NQuewddcegnv4vfF7MyZdWwXzZA88KWhlQC6zSXM05zdbePqODfIIiG0Mvf?= =?us-ascii?Q?woALinUsmNVhpbP5b0oxOEMT/zDsKgJiu1vgEEVu9vaRUs2W1UQz/R7WQySx?= =?us-ascii?Q?GooIt9lUg7FOPUNskpFpJHF0fNfSsSa14DmwUBIq/irlWEHsiFJLS3eTHxFP?= =?us-ascii?Q?IZzy+3kqkFkMK4ZfD6F2mjvo48/d2Yn7kyrSw99n1KP9zhRrBLndtMVW82GM?= =?us-ascii?Q?AvfDdxFWFkAArNF1r2e/IStq+Y9MsoeoYiqg3ZPG2J9sb1natCH/4hCxVA85?= =?us-ascii?Q?nG5S9APS5zEv+uF0YySWa9HwLhulSdzWE3pbtysfXsILboS0WiuEzw1H1ELG?= =?us-ascii?Q?jqgEI9kvNKh2omri6N9dpuXQTZFC9bgIRpWp0ptDnI3Y1Vo3VUEBGiFRxTIv?= =?us-ascii?Q?70Y8Jgz5D0zgry9+iOqSaRmVBHMhjDYGMSkjYoZ75DJuhVB2INKS6F+OcrED?= =?us-ascii?Q?xUVrerEzD9smz4uhMOSxSdh7dXk2fcI5+4Z718lE?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 30550aa4-da80-4f47-bc16-08db14418e3f X-MS-Exchange-CrossTenant-AuthSource: LV2PR12MB5869.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Feb 2023 19:26:39.6903 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: RH3uxQ3UR/Gi1fj071+f/4LMBRbGcmGKqji0EB3mhXvfJLj7FxcZc52OU51qbluT X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7725 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3A1ECC000E X-Stat-Signature: ecffys5sdjoy1as9xntz14sn8934x9ga X-HE-Tag: 1677007602-483124 X-HE-Meta: U2FsdGVkX18fHMrCZ+tRlYXtyhUjQid2Ey7ilfD6Nqyy1kkrBCyirkjgtEqPXuQ58jr0FiUe1nWo3qAFjkV1YlM+GmwKhdhRdBYWuSnnOWCWn3tP0+PPvP9kHfbkmTvb544wE9lJoDUeW+/t2Jk+bZvrw2WReoXTvEtWnUOgos5d+q8FNoXIKUmhe8rVovhs+ycYDCAtVFut80U55ySS9zFgRuf+3m0qNjZ8tBvpD6jQTiJKVj+QeIJ1T+zW74XKeHs/ZEnMA7REhtjGI9DeA4zVLaO2Tl3ojgeMIh2HWFddCYOjoBvnOc55KlGawm9FRwxJdsVS50fT3F+lqGbSdeXnSCA4KMdQfeA8QQqgNPXOb4j5ZfyHe6ctp7gofLyLBj3Zpb5ufiXfcsTgG+hAq9lIbYpeckHqNPzP/ZL/C/LWb+x9NHkSSLQY689BrPml9V6TPDvcpPG5Z+EQ1t32GdKlUpQq+VxeJxSihujRZg4BmVH1yUuJxVs7xR0tZlIWi4CM5mgJJBkNJioel2tHkPidJAFpaZQ4uJ5+Oc5q+/i5x5l+Yp7Z1t8hHZHeci7rs/mSu2iQFuB6gKrDaDb+r1izi1xwRg8vKijFzHh1Su6Z9CxDau0gtxoDf5+INIe/TFOTsB4xiJ+7r/4RjQONJ4HbZJrq75xluwTN+WBTi7OHwwqgn/Afx/IjQtnbMH+wFTebKm6Pto1nYdOgcekYTg4GpQoDCHrWHZQ3oMginlcxIWPziM5VpaDrp1ve4QR7D8VHoHq8fNJCtHgA5NjF8xvKh0qzWbF3TE/56Amlief9A93Q1ZQAE/FQvKxrJA1vfGLxlkjQUAIC7VTM+ekQ4iVy5hVZEtjs/Cso1xxmYFhdJ80S+URittZSmi+C76sjPWxox5cZoIzKHZw3E/yfZXa+tQFxNE97uFpmm8KSNTcP42bkQ64Uthm/aIC2yFLs00efv+OmWPd8VVMsXyK VRptLkMq /7t70XQ2rbyc18+EcA1Ffk5m8G9UQ4beffUBIZypj6iF1f0v/Khaf9wZRL1UDv3PCgXkMVT6SET8FR3rYUkFud75nvQRNTdFL4XsWcVfaozd5bDS3brTSjMp87fiePb4rs9VoHzn+36+E5wlfigL2YnFbybsVmL+ZAqdGepUg9/ZuPpBNTDQUX9NBIpKhWpFJcH5IV5KSFIA3ef5Fg+umTZooGKUyh5yYBFnQQ+XeZ7q8YGA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 21, 2023 at 08:07:13AM -1000, Tejun Heo wrote: > > AFAIK there are few real use cases to establish a pin on MAP_SHARED > > mappings outside your cgroup. However, it is possible, the APIs allow > > it, and for security sandbox purposes we can't allow a process inside > > a cgroup to triger a charge on a different cgroup. That breaks the > > sandbox goal. > > It seems broken anyway. Please consider the following scenario: Yes, this is broken like this already today - memcg doesn't work entirely perfectly for MAP_SHARED scenarios, IMHO. > > > for whatever reason is determining the pinning ownership or should the page > > > ownership be attributed the same way too? If they indeed need to differ, > > > that probably would need pretty strong justifications. > > > > It is inherent to how pin_user_pages() works. It is an API that > > establishs pins on existing pages. There is nothing about it that says > > who the page's memcg owner is. > > > > I don't think we can do anything about this without breaking things. > > That's a discrepancy in an internal interface and we don't wanna codify > something like that into userspace interface. Semantially, it seems like if > pin_user_pages() wanna charge pinning to the cgroup associated with an fd > (or whatever), it should also claim the ownership of the pages > themselves. Multiple cgroup can pin the same page, so it is not as simple as just transfering ownership, we need multi-ownership and to really fix the memcg limitations with MAP_SHARED without an API impact. You are right that pinning is really just a special case of allocation, but there is a reason the memcg was left with weak support for MAP_SHARED and changing that may be more than just hard but an infeasible trade off.. At least I don't have a good idea how to even approach building a reasonable datstructure that can track the number of charges per-cgroup per page. :\ Jason