From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B610DE77188 for ; Wed, 25 Dec 2024 02:24:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C63006B0082; Tue, 24 Dec 2024 21:24:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C13236B0083; Tue, 24 Dec 2024 21:24:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB4706B0085; Tue, 24 Dec 2024 21:24:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8E0CC6B0082 for ; Tue, 24 Dec 2024 21:24:32 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0633A1C53B9 for ; Wed, 25 Dec 2024 02:24:32 +0000 (UTC) X-FDA: 82931885610.03.17ECE40 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by imf25.hostedemail.com (Postfix) with ESMTP id 98E19A0005 for ; Wed, 25 Dec 2024 02:24:01 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HyTW7YoR; spf=pass (imf25.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.49 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735093424; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qkfwwASUtEHf0EZt/wA4CXw9SwkZyZvebJ/mz+7RFbM=; b=1h5NKcLYNjR0XHZ6TEH7qRfGrv54wKAGMSR31SwaqxABN3m+fzfQuGgpBlREWVcvYgsMMR XufDxFWfLnL23FFujJ0TigXtMpomL/XEEC8AGM9fyn0TVG5Nwb5u9pI9RrDV01pHyYD83I WJKYoGCGtWXxDFhAYTZzc+w6N5vmpMw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735093424; a=rsa-sha256; cv=none; b=YVyX4Ukk9qWCJA6KzpUP2tknpZN8b8RdzqnCrpVPAnFi5T4NFN7L1odFSIQ8wJcOdS2a5f 1zmfnYKwMa1q5uNWcjjbX/wjVXG1ZrAe74bHh5lAjSAew/ta3xTLk1k0VaN79zRmMWK0JH dGeCzWjBizTB7lFm7IFDR8m5S62JLEc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HyTW7YoR; spf=pass (imf25.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.49 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-6d8e8445219so49162466d6.0 for ; Tue, 24 Dec 2024 18:24:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735093469; x=1735698269; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qkfwwASUtEHf0EZt/wA4CXw9SwkZyZvebJ/mz+7RFbM=; b=HyTW7YoRKmBPHC9HIh9da8/pOE5myH7VQ1koueg1D8333Mip775aXQSOQXjocv9VLT FfnkZU3YuY5CtX99b6n5QzkYju4+dU8w9VuLpxOIgqi1XyetG9t2Mgd5KQId0qBvgr73 53ujT5CdKYEna5hXcanj4NSAmLatlnTfNQ7GUf76jqXKo7KyQMno/QhsG6NqhNIO9Gl+ gSizJT+PGpP2pA+2wAP5cbpO12LCFTarAPUuDLyar5c0donA1J67nMe/aHvMhk3iVAXn EPdhrDxmOLy1R9ulNkiWeTgoAXr5avMfXBeceJJY2HY8WZiWJS0L4VurQJ4fhgRmahoc iSnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735093469; x=1735698269; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qkfwwASUtEHf0EZt/wA4CXw9SwkZyZvebJ/mz+7RFbM=; b=AIu6B836w/MekyPjDXgqTVhEJ1UoLV9htgwGMKDnb4rsXIBi8l2yXyufbR2DpUpf6X 9ihInS1tb9A3rKdfJVMocTvW3U6+x1H4+0nNJX8INA48CcAfyjYgJSBVNG+94eOD+JMa jf9nX7ac+Tz1dcAi+gfJbSzV3Qufox3DjpSgQBaQRywNzjp5csVvkuozF+utfeOEl3wf ZwXJp4WvzWGSwVB7kwMul+C8mR/Y9dVqbz8YCyfjb580LL77yC5UXI+/s3IeWI+7MKOM 8z0iJoXVOIF9lNfUo8exRGD5hMh/DZsGFhs/l0MJWKURL57I7fVT8r6Uu42m6/HadZmA enXQ== X-Forwarded-Encrypted: i=1; AJvYcCUfWfrmerTDl3xMUBZfqeBYu5cpk0AjG7J0TbFchibBmzOy5RlQNpZkglpYE/f4hmTalyyqbPZIZg==@kvack.org X-Gm-Message-State: AOJu0YweY49xh2aV1t2qMfJMSAPBTzYwMubE3pQhFUlvqzei9+iKp7d1 T61VkRDn1aBM/lviuY78mkY2Phb56g+174MTzvIxXbmd4klhxBjEeeMViHWFof+uE/P6vOyGUpa kLQFI5qqPuT8nhYK7UsJzeJ38sLM= X-Gm-Gg: ASbGncsNYO9cCmb55JxAQqm6Z0ICbp0S7a1dD0eBJBGP5zU7NYV5dJ1ukxDfOT2fGDL 0U6cjk1oevXVek//W/xpnlrXT4JwB0MOileGwNFPk X-Google-Smtp-Source: AGHT+IHIPUit+jzYGiYWtTIdmi/w1BiY+aSU+7FCggS/rPRd7t0Pw9qFxbc6utgv7/MGv30wrNsyXw3OWXC7a+d3WJ4= X-Received: by 2002:a05:6214:2021:b0:6d8:88fc:c0fb with SMTP id 6a1803df08f44-6dd233accbfmr227603886d6.35.1735093469215; Tue, 24 Dec 2024 18:24:29 -0800 (PST) MIME-Version: 1.0 References: <20241215073415.88961-1-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Wed, 25 Dec 2024 10:23:53 +0800 Message-ID: Subject: Re: [RFC PATCH 0/2] memcg: add nomlock to avoid folios beling mlocked in a memcg To: Michal Hocko Cc: hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 98E19A0005 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: t8bhj6kucbufiq1njhddzs5udjgf18zs X-HE-Tag: 1735093441-703355 X-HE-Meta: U2FsdGVkX1/6pp923SoJg3tji0DPS03jpE2gQuVI+P2/Whrz4opb7udeNEdX2z1hGNammC6h613wh3UQds4jLFiwAFN3v5gegvXwa0orvP64TCNhb0ua0mu8/kFgabkkBwU2bHItdJSaNoSeYeMasSnSUyVgUjs/litXj4x0L6KmoxRV7xpiOq24mhT0H8ZgQ7pS8U/jLUWPzwOk+0n5gFs2T0uyvFvv9L1bAdKPb7KVY6lE0DOvYk2JWP19ny5JG8uMbs4eJIiDnzh8g7jH51+p1Y5310pdvmZ7RqJTZcjznpmQq/jLqelkf69y5nKVWR6QScgZCAzLxarNyeFUTYGhxmDxII4suuEKYu00SpkhIJpotbnrn5OPRwC7SbmF4x8Jp/t11PC+mxTQhzXA35csy1XVg5mPQzY3yZ48l/Xx0l1ydmu6M+UGdkxNA21wTKm+IIOGYX2VaK571fBg1QzR5ZI4V3Sx5tB80RvbKYu1YDcjCosG74a0+Xw4d/KsnWX7KH2bj1ct8TQMh+B2GkBDONuROlVQikxclkisuIDW68pmdsuT5o1WnpntJhnf2qkZJDx7j35a2uplXNWPPr4euzI21U6HOT+xHzqD9giCMKbtr8nwMugpsCoTPk5wMc2+enGpfYBKBw/7GFmp4MLEN0f1zzLaCCGKIkR5gdbnRAiY1BJoQ1sxMTXjSz+7gNrHr4gdRoSz6xcoVGXT+Sww6JcqOL4N/jNP3E714y26bHs4vcY+IAdSOvVAHwOVanTA0nEFYbCqOMSzXqSNEVIe3yBanuelhLD0e41BzZh/t/oA3NE00bRvlxPtfTwQrJvLR5mfohugN8wREu/jsj+nId1U7IB5LcUrKDi9Kd941h6z+9PZSiCozTHR1ZnmuoFTgg1RYEkkOhAKA6iJrYm9RttvmESPB2iugtc0dTgt1weL4I9MQYFRLrfcRu3QS9YC2TrsDOrHEWLTiHf UcIT3cW1 tXJGwEfl05c+d8ddyQ8ZECWP83cYe+7uABPIZeiIzuZNX+v80TpAKLb8LQZuBY9qwccm3ejF5tXBqbAC4UbvBfJwp1K8/8mQ1byzd1l391k3g1YLCi+Co7mJARRHbw7lNTC+/ooF84Jt5HsrOATWwR07FvQXMCbnr6Sp9b9Ipn2ae83aw4DRuNNLqLa3ArdnZ98xtji6bw4udyTiIo5BlaX6ygkPqh4vd4V3ie0O8+3VommtWz5hHX6yDimprhD8JQ4Tr1RRhvWgwtrJf/CCGPfFsCpQiH0DexqAYeCMDbQmUong6MYaoMb2IicbrNZdxG7xG2GwtBvgn6Zsid26yQXbYbxVEjooloe8rylJalV7Pt5YR0rR7igIEMxwW7o36ILLsSIx8ZL5zqkEoxaOySaTDp2YBLqXDV+NxaDNdoOwMaVSymBPvQdmjJw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.079204, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Dec 22, 2024 at 10:34=E2=80=AFAM Yafang Shao = wrote: > > On Sat, Dec 21, 2024 at 3:21=E2=80=AFPM Michal Hocko wr= ote: > > > > On Fri 20-12-24 19:52:16, Yafang Shao wrote: > > > On Fri, Dec 20, 2024 at 6:23=E2=80=AFPM Michal Hocko wrote: > > > > > > > > On Sun 15-12-24 15:34:13, Yafang Shao wrote: > > > > > Implementation Options > > > > > ---------------------- > > > > > > > > > > - Solution A: Allow file caches on the unevictable list to become > > > > > reclaimable. > > > > > This approach would require significant refactoring of the page= reclaim > > > > > logic. > > > > > > > > > > - Solution B: Prevent file caches from being moved to the unevict= able list > > > > > during mlock and ignore the VM_LOCKED flag during page reclaim. > > > > > This is a more straightforward solution and is the one we have = chosen. > > > > > If the file caches are reclaimed from the download-proxy's memc= g and > > > > > subsequently accessed by tasks in the application=E2=80=99s mem= cg, a filemap > > > > > fault will occur. A new file cache will be faulted in, charged = to the > > > > > application=E2=80=99s memcg, and locked there. > > > > > > > > Both options are silently breaking userspace because a non failing = mlock > > > > doesn't give guarantees it is supposed to AFAICS. > > > > > > It does not bypass the mlock mechanism; rather, it defers the actual > > > locking operation to the page fault path. Could you clarify what you > > > mean by "a non-failing mlock"? From what I can see, mlock can indeed > > > fail if there isn=E2=80=99t sufficient memory available. With this ch= ange, we > > > are simply shifting the potential failure point to the page fault pat= h > > > instead. > > > > Your change will cause mlocked pages (as mlock syscall returns success) > > to be reclaimable later on. That breaks the basic mlock contract. > > AFAICS, the mlock() behavior was originally designed with only a > single root memory cgroup in mind. In other words, when mlock() was > introduced, all locked pages were confined to the same memcg. > > However, this changed with the introduction of memcg support. Now, > mlock() can lock pages that belong to a different memcg than the > current task. This behavior is not explicitly defined in the mlock() > documentation, which could lead to confusion. > > To clarify, I propose updating the mlock() documentation as follows: > > When memcg is enabled, the page being locked might reside in a > different memcg than the current task. In such cases, the page might > be reclaimed if mlock() is not permitted in its original memcg. If the > locked page is reclaimed, it could be faulted back into the current > task's memcg and then locked again. > > Additionally, encountering a single page fault during this process > should be acceptable to most users. If your application cannot > tolerate even a single page fault, you likely wouldn=E2=80=99t enable mem= cg in > the first place. > If you insist on not allowing a single page fault, there is an alternative approach, though it may require significantly more complex handling. - Option C: Reparent the mlocked page to a common ancestor Consider the following hierarchical: A / \ B C If B is mlocking a page in C, we can reparent that mlocked page to A, essentially making A the new parent for the mlocked page. A / \ B C / \ \ D E F In this example, if D is mlocking a page in F, we will reparent the mlocked page to A. - Benefits: No user-visible cgroup file setting: This approach avoids introducing or modifying cgroup settings that could be visible or configurable by users. - Downsides: Increased complexity: This option requires significantly more work in terms of managing the reparenting process. --=20 Regards Yafang