From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ADA0E77198 for ; Mon, 6 Jan 2025 13:59:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A88B6B0088; Mon, 6 Jan 2025 08:59:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 956BC6B0089; Mon, 6 Jan 2025 08:59:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F9646B008A; Mon, 6 Jan 2025 08:59:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5E70D6B0088 for ; Mon, 6 Jan 2025 08:59:48 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E6E9F1C85AB for ; Mon, 6 Jan 2025 13:59:47 +0000 (UTC) X-FDA: 82977185214.25.12FB843 Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf15.hostedemail.com (Postfix) with ESMTP id 24F1CA0009 for ; Mon, 6 Jan 2025 13:59:45 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XSG8eOe0; spf=pass (imf15.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736171986; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GHJ3Wa/etd9gVOdrNp89Zi5CJMwJ1FUqZOWTW3bqKz0=; b=v8KP0VLgWFAra0P51N/iMwvKuOO6ip52xFD50q/WqC+BJ3xoROvf8gkJP0k0oQGSf6If8C Bz7JxYCHCmGVZ/Gi/MB5k8Af/HyTchyFrUAL3yqSYl6zOvSQF/T8EPfb9loANCWj9DFuFI a82hzsc73js9ZNgdYRuRzQA089CgCv0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736171986; a=rsa-sha256; cv=none; b=HfDqMWwBfKbpwhC1uJDsdNfJHbeFlhoe+IayZevE6xhkNF6ubR6xTIM7Ot0XLFCVpKLikk yX0mSR8AbQmcboaO4KmO1XE9DYPz7EucCUlTzYnleXW40JAhIgx3hASokl6YfI8tVm1wWF Pr1wh4v02uzFWK51evxJotNiNK5gFoc= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XSG8eOe0; spf=pass (imf15.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f47.google.com with SMTP id 6a1803df08f44-6dd0d09215aso114979236d6.2 for ; Mon, 06 Jan 2025 05:59:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736171985; x=1736776785; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GHJ3Wa/etd9gVOdrNp89Zi5CJMwJ1FUqZOWTW3bqKz0=; b=XSG8eOe0iZVVU3NPcibbZERhPVpt3ULrmAJapS+NUbARHYavvc3sv6zeoNpcdyuShF aKj9TT7r8sFZ2cV8DtHHg1oOab/bbeg32BNEHXffaCEjdy8YzYv/Y/5CXjaxFWvZJg2m Rw/UL13mvvNx0+0pMga5+9Y2nwuh04Rl5GGxmSCJmG71wK16NGxo8N1PMrFX+6X3SgaW TpiwJJNWBgEpSm3YmUxbzMsxd7lCCdOWV5iXHVlqb20cgHWgmaSQ184C4ZsrkddBBtNm icEOJ+xT159CIKXZL0o8EGrpLglsdL4we1vQQ2k111Tg5rcGe9VMOnoC4pB/YpvjhJiK slyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736171985; x=1736776785; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GHJ3Wa/etd9gVOdrNp89Zi5CJMwJ1FUqZOWTW3bqKz0=; b=oClDz+RWuH+nGkwPRDPOkKFk3fAErR3Rrm7QQXUvxZH4gPOOvIT7TpCYFbW9y4Ah5r oFQCcQ2OgHBQI6YOSODSRqAE9pxiFsiuw9vBW9yGv0aobWo1nBalClB+r0IrLzFWeF4y gHnB4bNYJY4He/FoeYFquUFIYofU0TJsHR6gj3Cl5mJhVBDIQNOnKQdjAI2qPfG7DnT8 SajZzSOc4fw1jTcNhN3COeo3IInUslUNtqMLdpsbpOgZsJ+eUh6OERIN9vYUKUSgRciJ wJ1ZoOOj5Yy5+ekCaeOjBR+LmqnPEXOEg/wHMSi8y7W4uFM89Az4QQliUk9jfh33QRhN yToA== X-Forwarded-Encrypted: i=1; AJvYcCU/q9Fcr2AeC42B9F1HLh97BZaNRHPtPituoBAt0n01Jj4fMn3s2OqExowpXg5WiS63dhmOElOd6w==@kvack.org X-Gm-Message-State: AOJu0YwUGaQoepywudAZNIbOX5RZD5NcWrtOfRNxOFd2rj5dQ39fbs90 +/1NYCPj77XMWEBlHmVkRi4Tq11KxjFB+mK0tIS071r31aKCrFAAK55xYNsBAKJ51AHhih0arsg 2FwBJDB1cpSHhRO1eV10IiYDPL1s= X-Gm-Gg: ASbGnctndZqwiuNCp/MjGJYZG2T/rsvuS1v6TMnMdsGri6vXJJ/QkpRs06cHVc7oE1z AiqqlV8sT2HTRGlYA26f3aR8g0edURcJ67z7fB1fE X-Google-Smtp-Source: AGHT+IE24pALqf8B7EVKCqmVp5r/fSmHmjbPK/2MZaQOKV/VsOivpWWVe2RsIdO9NTe2w0vtCbpAdWJAa2PpcZyvmfs= X-Received: by 2002:a05:6214:2a46:b0:6d8:b292:1e17 with SMTP id 6a1803df08f44-6dd2333adf9mr978970556d6.15.1736171985214; Mon, 06 Jan 2025 05:59:45 -0800 (PST) MIME-Version: 1.0 References: <20241215073415.88961-1-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Mon, 6 Jan 2025 21:59:07 +0800 Message-ID: Subject: Re: [RFC PATCH 0/2] memcg: add nomlock to avoid folios beling mlocked in a memcg To: Michal Hocko Cc: hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 24F1CA0009 X-Stat-Signature: hs4anp93xx7ij5s5yzwfm6phm5h5764g X-Rspam-User: X-HE-Tag: 1736171985-484986 X-HE-Meta: U2FsdGVkX1/ihts7wadgohsfh3U5v7Gvkj3Bqlu22l/nMBHe1mDQvNaFwsuVg7WXS2gXmqhBT7nYuBtV9zkibUQ1j0GSci3vkPSScb7AzPY6UcZLmWlRdmfym+26sv8KfNRxZgf9ef5B/om+NwtAOYPNn7g1s4H3U2nFx3PT4ytNlen/R5miqqLV6q9+v9XOU0xBX2AZJiHbsTtil8nKf/J1X6YRnvo3sXKg5pVhSiHxAWKPJonOQHXslwlmUH2feM1yRfPFK0iCzzlbhYEd1DmT+OkuPkCPJJl9Go4drJzKS1KMFev5lAtm6fVtSsCjO0UIdV1+8C7Uk1CNvckGOqa/7MSfQ6IOW3LAVKKoZpzVVybmzlagXtGEysNKpAKzskvifLEyTeBW4lTIleI3TgQlc8epMKIK74SK7imEcwe8znQfsDWJgOXen1FX5R9e/dmOZMUpThI8ahD82jnjlrZ+TgqcgqM4Xn6RcWaLb0MLAVyRzJawTxitIGVzVHIRPhriaM9pBmEyehkwd/L2ObytE+Og1Z050ya0UBKv1l/RiDiGj0gQQgG1POg3gGudaDjTigTQ/TPxY0IaoWO9E5q4upzmoik6izCqEW/M9VzBp+2FSq0gbcErNrxt05hXO2S9/vOMRer2Ea1pyjnQUiblYidP0Pqdn4IRBMh7XOzn+6IlNeNAw/uLkqJkl0aU8/xvt9ABWP6j+XldjCJ8DScaEloLtbAXV0Ldzl3Dhl4s1CyhBNm/SW7GLFgqigbtad4kG6SUeqReoJGv87uo+cS54iyuIpxMS0NmBsG2B6JAGqU/wRhwRKa77j61hFa1jdjmuTXd2FMVUTHGVvEMB2dZiNMSDu2Z3SeX8vhXgfGuKmq7fWnR4I3AWAZyIk9nGnJw+cpYzz+V3p3S7uEVLMDYS1KmEYW6AOd9uYIpKdLRkCokPkrJCMCSENQWcsStjOl5guiUFhzRVZl1ozU zwg0e58l /maY5WU0MGzAxyEEEOZWI7UBedZ/5EGxH3xychO5p2UM+vBxcTfyzCUti5WPAKvEL4y7zN0ChmQQ96TN8J60pIJvv+6ix8+BGQuon4Exwk2dIMVFJ3tb4aK4pMfjPTp5nLW1WRdyK76zFBifxam9sz/G11Ucp6JdgXDBn35/WDU7/znqkpV8/XTuygGsKGtsItmvf5ks+9tND4gk3pIT+r895AW5aTPaipE3qLB4jbrBR7ck3sEakva25xUYI6oh9dyd2JksGJY+3nW8hs4h2duhIpagP66vANkoL4cvnF4/XkBLug147yTo+fxh/7wwaMZ1h+M3AiB8q5F+JmszLUZ5oVA8krPfayHL4edE6yOtCHxxoSvCi9J4wkTRkYimr9xCYgbddSFH+BsqU0yJK04xFHioRyEAuQkE/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.002904, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 6, 2025 at 8:28=E2=80=AFPM Michal Hocko wrote= : > > On Sun 22-12-24 10:34:12, Yafang Shao wrote: > > On Sat, Dec 21, 2024 at 3:21=E2=80=AFPM Michal Hocko = wrote: > > > > > > On Fri 20-12-24 19:52:16, Yafang Shao wrote: > > > > On Fri, Dec 20, 2024 at 6:23=E2=80=AFPM Michal Hocko wrote: > > > > > > > > > > On Sun 15-12-24 15:34:13, Yafang Shao wrote: > > > > > > Implementation Options > > > > > > ---------------------- > > > > > > > > > > > > - Solution A: Allow file caches on the unevictable list to beco= me > > > > > > reclaimable. > > > > > > This approach would require significant refactoring of the pa= ge reclaim > > > > > > logic. > > > > > > > > > > > > - Solution B: Prevent file caches from being moved to the unevi= ctable list > > > > > > during mlock and ignore the VM_LOCKED flag during page reclai= m. > > > > > > This is a more straightforward solution and is the one we hav= e chosen. > > > > > > If the file caches are reclaimed from the download-proxy's me= mcg and > > > > > > subsequently accessed by tasks in the application=E2=80=99s m= emcg, a filemap > > > > > > fault will occur. A new file cache will be faulted in, charge= d to the > > > > > > application=E2=80=99s memcg, and locked there. > > > > > > > > > > Both options are silently breaking userspace because a non failin= g mlock > > > > > doesn't give guarantees it is supposed to AFAICS. > > > > > > > > It does not bypass the mlock mechanism; rather, it defers the actua= l > > > > locking operation to the page fault path. Could you clarify what yo= u > > > > mean by "a non-failing mlock"? From what I can see, mlock can indee= d > > > > fail if there isn=E2=80=99t sufficient memory available. With this = change, we > > > > are simply shifting the potential failure point to the page fault p= ath > > > > instead. > > > > > > Your change will cause mlocked pages (as mlock syscall returns succes= s) > > > to be reclaimable later on. That breaks the basic mlock contract. > > > > AFAICS, the mlock() behavior was originally designed with only a > > single root memory cgroup in mind. In other words, when mlock() was > > introduced, all locked pages were confined to the same memcg. > > yes and this is the case to any other syscalls that might have an impact > on the memory consumption. This is by design. Memory cgroup controller > aims to provide a completely transparent resource control without any > modifications to applications. This is the case for all other cgroup > controllers. If memcg (or other controller) affects a specific syscall > behavior then this has to be communicated explicitly to the caller. > > The purpose of mlock syscall is to _guarantee_ memory to be resident > (never swapped out). There might be additional constrains to prevent > from mlock succeeding - e.g. rlimit or if memcg aims to control amount > of the mlocked memory but those failures need to be explicitly > communicated via syscall failure. Returning an error code like EBUSY to userspace is straightforward when attempting to mlock a page that is charged to a different memcg. > > > However, this changed with the introduction of memcg support. Now, > > mlock() can lock pages that belong to a different memcg than the > > current task. This behavior is not explicitly defined in the mlock() > > documentation, which could lead to confusion. > > This is more of a problem of the cgroup configurations where different > resource domains are sharing resources. This is not much diffent when > other resources (e.g. shmem) are shared accross unrelated cgroups. However, we have yet to address even a single one of these issues or reach a consensus on a solution, correct? > > > To clarify, I propose updating the mlock() documentation as follows: > > This is not really possible because you are effectively breaking an > existing userspace. This behavior is neither mandatory nor the default. You are not obligated to use it if you prefer not to. --=20 Regards Yafang