From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB0DDE77188 for ; Mon, 6 Jan 2025 12:28:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3956C6B0082; Mon, 6 Jan 2025 07:28:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3455D6B0088; Mon, 6 Jan 2025 07:28:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20C506B0089; Mon, 6 Jan 2025 07:28:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 036B06B0082 for ; Mon, 6 Jan 2025 07:28:30 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A315714036D for ; Mon, 6 Jan 2025 12:28:30 +0000 (UTC) X-FDA: 82976955180.10.DBAA922 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf24.hostedemail.com (Postfix) with ESMTP id 7B6A8180003 for ; Mon, 6 Jan 2025 12:28:28 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=MzGfZsBw; spf=pass (imf24.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736166508; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kTiTkAiAXJ44B1lH8wBq6HEWo3nqbvuGo4/F0jBeNtg=; b=U0gd9s5aR7ocsoS9R9fsWgY/V5PnEYPVrga731TmIzZP6L5AlPIoip0F4DcNa79YZWuwzZ mvuZGy5NZctH2mT2r9BkOES+YpB9x55n2EpKBPuHcWKjUGEn7/Zc6nEMqwcw3+04uhuEu8 Gipnv2pzm5zf+ignWAm/xIR5JpLsnkE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736166508; a=rsa-sha256; cv=none; b=RObQZ9fj9uen0Uhrc2edTCiwhORGIFNIH9Na/ckyK4p1pO2SsUjv2e5JbePSv++xs04Q8Q oaRRqFXqXQ9umCQL5VNsvYd8OBx3rHpGQWBXwGJriv1Dtybp4+99j3irhdft18koQg6Xp/ PRHsX5OV8BXH2GFbUgH90H9dPJR4NoM= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=MzGfZsBw; spf=pass (imf24.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-aab925654d9so2520468966b.2 for ; Mon, 06 Jan 2025 04:28:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1736166507; x=1736771307; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=kTiTkAiAXJ44B1lH8wBq6HEWo3nqbvuGo4/F0jBeNtg=; b=MzGfZsBwm5FPPyKSCsDNRv4/SeDmS53mYVJCIJ5yYYSmldqRkgACGdwEr+JVraEDz/ k19nYGaX1tM373JLAEE0nBrNMNBpZ+N+9OHqnWZ5W00g8JXiS3WAo1PLPxdhe4km9/5E r6LdFX3vEP5WnY61nwWRDD7pLmHzl/4S+6liRxDnJnUsUFRqCwM96xuqgdBxcfinrBOM 98WeC/paW9um5Zg0Am6ZzppAQrV426sh2MzbVqrYB7t6OJqcXrI8taNsMpLikkFBmgsL rvXWFVqDkFGXkTwbx6CEL5f3oGSCiXNeDsb+ns4CbPY4k8pFLrCYZ2LhbIZDmKMsYLQ7 Lrqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736166507; x=1736771307; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kTiTkAiAXJ44B1lH8wBq6HEWo3nqbvuGo4/F0jBeNtg=; b=YP059VJoadqzpwM1IPv5RlVrkv/RYC9DimiL8o0dbMAxXcs/8wF/by8sjVBFauLCnL v5YU5HghGaT7Ffse+7OSwjmYXVOdEFz98RqWQWbJfi6xWixOxw3SZQspMyKoyeqF2OeQ J8fMZH+ln/7qCWM5lCKZXlsZ+bFIz+9HnXrY2P1tnusRq9/cI6uvXT6raT9ziuM6MgJI vtoDrbp6e73D32rz2vIEYqI+FP0c3lm2fhoveR9ElYB7O7OoQKioRrZUF8Tkxy51FI7A 9xgiFvndhk3XK5mhRfChmvBGCFbSyyOlnDpUgTAKc0sqfE1CcC4hQ0TykafCMpuzbdQf 4OCw== X-Forwarded-Encrypted: i=1; AJvYcCUU1Ia00/jxcE1PQkdScT57AWF+SnA0P5uPgr+DNGyQ516owvog6ceAsmSsoH8vYmLE+Uc6c3yy5Q==@kvack.org X-Gm-Message-State: AOJu0YzgZ4tZki5rOLPyLUAHyWBH0zsH8Nvw1yt9txWE5eWB01bHMGSV wNDEtet4+APXnbAE/Sz3aPiU1koIaSC8H/UQF3a6ZPcqge9aqWP1CUzy9ua0fq4= X-Gm-Gg: ASbGncuQE6l7Xo+COqdmFLCta4ly9KfGCP9OPNnE4z4rUhNRsMf408RPD0gevY3NOiA McLFf23wOojwt1vRDgc1hxXAbC9Y+GUlowOgA7eDxTrUxwTrof5Oegxthmj4kyD0p6dRpw6FSlj F2L/eKn/r+vFuI2mGoqXcf+Q9/LlKGCMPckPEGXIMKlNEc03tSXcTL52YMKlNrb8UAWViThYDDx UjRVRr8sKojsFuejBnpX9Zlcr1u4yNMDGrD4c019VhsrDZslt6NtBNJiyRbbBINKCpkHA== X-Google-Smtp-Source: AGHT+IHYX07sMFZ0d/+yxbxY0ODqib/PPtcAroJiJwh8nQMwcMH8nzEJPDhmhXmHIVqyxjndYCQyZA== X-Received: by 2002:a17:907:3ea8:b0:aa6:800a:128c with SMTP id a640c23a62f3a-aac28749357mr4574010166b.11.1736166506157; Mon, 06 Jan 2025 04:28:26 -0800 (PST) Received: from localhost (109-81-95-200.rct.o2.cz. [109.81.95.200]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aac0f016e2fsm2237989666b.160.2025.01.06.04.28.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jan 2025 04:28:25 -0800 (PST) Date: Mon, 6 Jan 2025 13:28:24 +0100 From: Michal Hocko To: Yafang Shao Cc: hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, linux-mm@kvack.org Subject: Re: [RFC PATCH 0/2] memcg: add nomlock to avoid folios beling mlocked in a memcg Message-ID: References: <20241215073415.88961-1-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7B6A8180003 X-Stat-Signature: w9rb8nntwsffqhpozaya9paguqyeqg8y X-Rspam-User: X-HE-Tag: 1736166508-145564 X-HE-Meta: U2FsdGVkX19dZkMHXN7X2OS2Gr7zg9lxy/tR2PuJpBlkvg9xIGXFvY/VdUYEZ26FlF1nh34OV5/63Qlu3dX8mlzocxa/qp8+yEIjoGuM/0+5VPVV/7RUeg/MZO8uvd0a2ktcd2cKXGXX5m0xdoncxf9PiIsnh2xKYrNoCbucGoaFDs9mCPREQL2L4JcX4otcSIH+NjzO18/uvkL9vwtgG2hahRIXUPjyYcziXotCp4AmADgWhOt0woBJzvA1GLuOirFEIDfn0BfycTWQMHQGlfGKMt6JUDgBF0YZn8DE/m8zDki3Mi0ouyydkCQyp3NjrdOxht2RQpuJWoD1vWaVlBcKS8oeKfKZETJtHKuzDmLbmqqb8aOH4PrwwzGaknu3hlggjm7EUt+wFXFpeRuPHJJp6vB8uM9fMux2DDwGG4a3yuoPDtpUPuyV4psjHbAhAdB/Cwx0nPfkuD2c6/cL9gezbCKsoNXWA7o74Z2ztWQ2U5OEXCFFDp6aGwlr4di8CNDPc67mH99XAnIfFoEo6ZJ3Jb38CjoJF+plWM/tzC1/rqC4lRSrxBS0iwO1yRq7avfOC99lzTyzytWCUalX64vKU1/5UX+wBtpjkWEVmTvoPRwiZHw7gbGZl4KbibCRJAhYMcObWhNw6Rxw4R5lSxB5ffX0jg65h2h/aNjP0ZX51+gz68BtpfrY4pKw6NmXv7aks5mhspMsSVe8+1jx46S+urxpNLXQcfGgdAgEbJ3cjB8REXBRHVqbSXRk/JLoxEPZMLSd1AC78m7n3hz/h13yIfhBY0gF5rMLpiUEVxrVy/wWDT/Cm0LU02NWhJ6mrAsm/UiZBNzLwjfG5d3CM+f2mccG+q5g1SiqlHvcmsIYG+4nFojWyA0oqlNRJQr/YxgpxNYk875fZTwSnnaPS37KaAV/Lksw1Qv6gg6+DiSjif104vo5lDko1dhWFIK3rVBKBmjndJbwGUOXf7m LCu5p/jO jAX+SwR2z49PadZmKp3mBDMe3O2NxfUFRYx/mAoCVoy0AuLQ0t01fBBR0XZYBq7saNqP5UW/3LmgYWDtQOhOTqPdm8F48cQXFpN8X39dY8D15zjdQLmadsHoUq1mfcdEF64bXoDWAes9g+y45B0SH9MaGUBxtuNV5w9PsM3Q8D+fUUf/S9PuAwaseD7qAB/rehy102T7Ihh3OBQ4SwDP9xrmWS5WxbVDQrmID14PSQsaawKVdjMbOt+SO4LzoLVTH8ZHNbQRLvS2uvIDjlEelqBYGKHFsWrRzg6s8dK1fGkSrMxmjokWYPiIwyTZ/6uKMxNE8w2gU2gSXyOXiwn2akTq0/CwevsIWVXjweESfD2O1SgfyvilEJUpjX5tjT6wH13w9UUoC1o02NyeiiKcXQR0YVWK6bZqoxPIaNTt9ZVMDdQbxn/OnoYISyzPHLdLHrO9mGPKaGJzuj6BIyEsiJtGJWe59gaBByaQESctc8ZY7pS2qa8DpUayTIewnOYVumPT1Vl2imXNvurd35oAzIhdWOrO13pzBXhIjT26KS94+/uzCt/njLUDPww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun 22-12-24 10:34:12, Yafang Shao wrote: > On Sat, Dec 21, 2024 at 3:21 PM Michal Hocko wrote: > > > > On Fri 20-12-24 19:52:16, Yafang Shao wrote: > > > On Fri, Dec 20, 2024 at 6:23 PM Michal Hocko wrote: > > > > > > > > On Sun 15-12-24 15:34:13, Yafang Shao wrote: > > > > > Implementation Options > > > > > ---------------------- > > > > > > > > > > - Solution A: Allow file caches on the unevictable list to become > > > > > reclaimable. > > > > > This approach would require significant refactoring of the page reclaim > > > > > logic. > > > > > > > > > > - Solution B: Prevent file caches from being moved to the unevictable list > > > > > during mlock and ignore the VM_LOCKED flag during page reclaim. > > > > > This is a more straightforward solution and is the one we have chosen. > > > > > If the file caches are reclaimed from the download-proxy's memcg and > > > > > subsequently accessed by tasks in the application’s memcg, a filemap > > > > > fault will occur. A new file cache will be faulted in, charged to the > > > > > application’s memcg, and locked there. > > > > > > > > Both options are silently breaking userspace because a non failing mlock > > > > doesn't give guarantees it is supposed to AFAICS. > > > > > > It does not bypass the mlock mechanism; rather, it defers the actual > > > locking operation to the page fault path. Could you clarify what you > > > mean by "a non-failing mlock"? From what I can see, mlock can indeed > > > fail if there isn’t sufficient memory available. With this change, we > > > are simply shifting the potential failure point to the page fault path > > > instead. > > > > Your change will cause mlocked pages (as mlock syscall returns success) > > to be reclaimable later on. That breaks the basic mlock contract. > > AFAICS, the mlock() behavior was originally designed with only a > single root memory cgroup in mind. In other words, when mlock() was > introduced, all locked pages were confined to the same memcg. yes and this is the case to any other syscalls that might have an impact on the memory consumption. This is by design. Memory cgroup controller aims to provide a completely transparent resource control without any modifications to applications. This is the case for all other cgroup controllers. If memcg (or other controller) affects a specific syscall behavior then this has to be communicated explicitly to the caller. The purpose of mlock syscall is to _guarantee_ memory to be resident (never swapped out). There might be additional constrains to prevent from mlock succeeding - e.g. rlimit or if memcg aims to control amount of the mlocked memory but those failures need to be explicitly communicated via syscall failure. > However, this changed with the introduction of memcg support. Now, > mlock() can lock pages that belong to a different memcg than the > current task. This behavior is not explicitly defined in the mlock() > documentation, which could lead to confusion. This is more of a problem of the cgroup configurations where different resource domains are sharing resources. This is not much diffent when other resources (e.g. shmem) are shared accross unrelated cgroups. > To clarify, I propose updating the mlock() documentation as follows: This is not really possible because you are effectively breaking an existing userspace. -- Michal Hocko SUSE Labs