From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BE00E7717D for ; Wed, 11 Dec 2024 17:49:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C81446B008A; Wed, 11 Dec 2024 12:49:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C096A6B0095; Wed, 11 Dec 2024 12:49:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD17D6B009A; Wed, 11 Dec 2024 12:49:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8C2B46B008A for ; Wed, 11 Dec 2024 12:49:51 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3A0001C7B9F for ; Wed, 11 Dec 2024 17:49:51 +0000 (UTC) X-FDA: 82883415426.10.3B9D35C Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf10.hostedemail.com (Postfix) with ESMTP id 6C5B4C0004 for ; Wed, 11 Dec 2024 17:49:38 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733939366; a=rsa-sha256; cv=none; b=OWU8/jFT44JE+L8/dGVio1rgVW5WmVTIMHqfJLLOkUjgzdsBIvqAWc7vUwslCq/20MuOUy dsMjjElAISMOe9xzeK+2dH1/pFlLRvvc/xnply1p6zliffxhuu0gHtifMvRgNVmIXc+ee9 BQfeKXVWNweGGluLiDCvYmlkhd3eets= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733939366; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Zbxk7SMv9dvH+dCvhBsL+l2jdoBHxcBW9kSrVlqpGSE=; b=H7bbmYTwwzu4CzET9APTiG3aBne/R71z5ltf+LcbN3kt8HuKR34uGv6Vu+DjI3sjOWNi2Z tRU22/DMOHOsNIjBIAaRt9A3lBlHTb4GHMmrHTzyipVpnqGPRq4P6YzhLT9z+wHjHvTHlm nNzQE06NSmW6z20btKdPO4r7ag9IwKw= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tLQpx-000000001OJ-0Wh1; Wed, 11 Dec 2024 12:49:25 -0500 Message-ID: <4e72ebcd6c12f0641c8c9040bbcdd7fc4cd54287.camel@surriel.com> Subject: Re: [PATCH] memcg: allow exiting tasks to write back data to swap From: Rik van Riel To: Yosry Ahmed Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Nhat Pham Date: Wed, 11 Dec 2024 12:49:24 -0500 In-Reply-To: References: <20241211105336.380cb545@fangorn> <768a404c6f951e09c4bfc93c84ee1553aa139068.camel@surriel.com> <6bc895883abca3522c9efc0c56189741194581e5.camel@surriel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.1 (3.54.1-1.fc41) MIME-Version: 1.0 X-Rspamd-Queue-Id: 6C5B4C0004 X-Stat-Signature: scrjgodsy7maw84qwch7jj43qg44xhxn X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1733939378-387890 X-HE-Meta: U2FsdGVkX18ZwNeUjddDVhImR8vejjSaGT5O2hftuj8EEXXskgwrR8GeALW9p4CulKopWSPu4lXEcOJ/pNDA8Szjc6qsCZT/XoaIG6RHn6sDsD/nqkXzRH8syQ13DOCjHjfbN/f4IzQzTn+ST2Fj+29azjbGBqMKgvN6XiNpGwa+QlBcH6o99vsVBmFbD4mRw9c3fFcuytQGvg8gq4EnosMz895BubD0JF0Rm2AUdh8344o1Q2jdGDSzDW0xpcyVZAYjN31VB0mP4iYMSPHDx2uUgSCwPliQRPMtZsuYJY+r4gqLuPvg/djbv+DEv4sNcszPTOONHsQvOFWK81UZcfJ+Xze6pJhnZ63qNVDWVkOMxlPj2oou8WWAFtEoVIVWcp4dJnmei8Y1JGms0R97sAocRTeBISmi/oBcZMqFGl7J6OsP3FiFIu//3gXpUs/U110gD/mT7/NWp2Rb/irLUy5t0HXjrIMRO7HezzsvjcSaNqL3IIyPxdkFRF9QeqPEtq68u/FqhOzdK1dXdt9SeGL4sbLL1G0dVoGx3RcBKPb4/BR3v8we/h5AdBIp0t5JbdDa6R8ZFJ7Fht43Xb3gXi/U7HC+bZE07gU5BTMKJ7/PeC9CQc8HMl+MXjCVYvE1ik3AovCxalLY2bL3F+6XCvGzpo/KICp62SHkzYfngxpj+yulOkw46zxCqen01iwxfKYA2Qf+FOepl35hkYHrqC1lMlecKsxe+06N3Vdm0N10llqXET6vnx38bY/ykrak13dyl432aJb1whNsq2yVCeIQYE39azmz5SmkK6kT6gmu7OHpF9NjTWZ5aDIpPfzZdjY52we/5hnTS/pLEQOHiQW6GQ3/euyKC/Esvq5JhW0ekdMWcUnI3kq3uVIVFZxZVUjxvEVmnxfse1h23iYHNJM5buPP5NxPoEgfVc7+S1WyPD6NeIM8XjYfEyWjVdH3joHQp3NDrtsbheoabig kFU57irL /9sJl9Y1p+NWRqQzHbSQivGyiVN7cf7oVJZCLd8SOy3aFt5DuziZ3cfP1F/wpO+pwF50m2qofZ4mBEh6k2ZXfqGlQM/qmYuO4ISPKvnqzPdkekzQQ+PwM6I8dA1A7ZJRttM+gu0vIrrrTdZgloXne9etrWLq5L2XdgF4XLAsvm6KhlSosUK5yWf5XRQnDY1w1J/4O9OgvC8okC/Bw9N1hVzMwa18d8c9t73gqcdDbpo6uOyczxge4Tk+CG5K8nTGJON/k0GkWOLCViVM4ybMdid++AxqulvMJRhFP7WgDfInymqdc0DGbi5dAAXL0wOzwfecX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 2024-12-11 at 09:30 -0800, Yosry Ahmed wrote: > On Wed, Dec 11, 2024 at 9:20=E2=80=AFAM Rik van Riel > wrote: > >=20 > > On Wed, 2024-12-11 at 09:00 -0800, Yosry Ahmed wrote: > > > On Wed, Dec 11, 2024 at 8:34=E2=80=AFAM Rik van Riel > > > wrote: > > > > >=20 > > > > If it is a kernel directed memcg OOM kill, that is > > > > true. > > > >=20 > > > > However, if the exit comes from somewhere else, > > > > like a userspace oomd kill, we might not hit that > > > > code path. > > >=20 > > > Why do we treat dying tasks differently based on the source of > > > the > > > kill? > > >=20 > > Are you saying we should fail allocations for > > every dying task, and add a check for PF_EXITING > > in here? >=20 > I am asking, not really suggesting anything :) >=20 > Does it matter from the kernel perspective if the task is dying due > to > a kernel OOM kill or a userspace SIGKILL? >=20 Currently, it does. I'm not sure it should, but currently it does :/ We are dealing with two conflicting demands here. On the one hand, we want the exit code to be able to access things like futex memory, so it can properly clean up everything the program left behind. On the other hand, we don't want the exiting program to drive up cgroup memory use, especially not with memory that won't be reclaimed by the exit. My patch is an attempt to satisfy both of these demands, in situations where we currently exhibit a rather pathological behavior (glacially slow exit). --=20 All Rights Reversed.