From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CB98E7717D for ; Wed, 11 Dec 2024 17:31:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE8016B00A3; Wed, 11 Dec 2024 12:31:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A97856B00A4; Wed, 11 Dec 2024 12:31:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95FA36B00A5; Wed, 11 Dec 2024 12:31:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6E3856B00A3 for ; Wed, 11 Dec 2024 12:31:05 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C163E439BD for ; Wed, 11 Dec 2024 17:31:04 +0000 (UTC) X-FDA: 82883367168.20.6E841D4 Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf10.hostedemail.com (Postfix) with ESMTP id 43B06C000E for ; Wed, 11 Dec 2024 17:30:52 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=vCguC2Bp; spf=pass (imf10.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733938240; a=rsa-sha256; cv=none; b=8J3CDdGXiTnfe8xkuJVbXr9ya9YWUrm1Lxcc1R0GCbAgEH8Aaz/chYaD9l3oGAfkHkyQnq A/fBNrOzQBV2xU4G7fvtLMbKqezceuuGPp0VGhp0du7S/GalO5Z9Z8Nu4Lbt/HbVv8DYhA wD87HZSupZ/+8/jiL9TXgWSkaqk1Xjo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=vCguC2Bp; spf=pass (imf10.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733938240; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JvlksTTe4afr3/kaJfTO3PgOYby42yPcNt+SJURg6mQ=; b=mskoluOOWFJXxO7Oqy/9w+5W23v4KQ7kyxW39ewCM1sCJNdeWXR/CIWl+eC8vgOJsQK7Q+ L1t1xzXLenFfm+XNxE2jWqlufAHLY/x4HKEetpabGECkdnhfhv1NGEakvdgUyhRfRn0oWW WkMHl9pQA2GsvTR9tlskDywnziv4P94= Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6d900c27af7so45002016d6.2 for ; Wed, 11 Dec 2024 09:31:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733938261; x=1734543061; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JvlksTTe4afr3/kaJfTO3PgOYby42yPcNt+SJURg6mQ=; b=vCguC2BpQ/mMJFXGcqXWhR67SGDEfsE2kS2O/WLdAfsjNBS6ANIVVPP7f0vr+qjhXQ 6R7B44fq5jmrRjEUzUxqAAwBd+xWjeGC8f3sZjMlD3qFrccOK9TI2BzZVtrRVHSNF3QX xNonPPB0POo1Q2+Zz4uHOEabTNRENfPqADV7/pft1bnU9b8p0wyyujHmoQrgWcuthW2p OEdvAAMYE6cO7rlXW9IOWC3NsVMsmQULD5h7TxwD+/nZWrlZDHcKdvICRjcKMWRzxV5l r1ubol2MO5WWHbBfgZ1LLluTZkFW+mAc+PHiBPSWBzL2xaSmFpP5K21JBqFAA/DPhhBE X2mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733938261; x=1734543061; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JvlksTTe4afr3/kaJfTO3PgOYby42yPcNt+SJURg6mQ=; b=I/NJw0095ckxMQPR1axVASE/CfN2BSm/gpdEewIq13NXjhe1+Cf8PoRKHbFQHR+Ph6 16uNJflHJsI+cJPa7+iMLhtssr+wBvODjuD+8dB8LdXeKffVnHBWk2gvHlPQEzo3sQ1k F+Z89IHgkVSQxl5/InGywMFEL1csHfQg5UEozlgSmBPRnw7SifEg/EMe0DMzfGHOqsqO dIwciaa4/moDbnGNnLULlIYhJWP3kHFDqsky/t6HagZ01ZbSZCrGGucGkPBMHNWmGwFb ab8CQBcurbIqSPzXzXoyrGGKLGWRmp3FFPj7fIS7dcn1SMbd0UySObqgZiI3s0pgensE IGZA== X-Forwarded-Encrypted: i=1; AJvYcCWFLeFCXuqCuvRFiIrP2nKYIKRk3UyxsaiZkPvYvcJVANhwEqgtVO19LC/kcL4QvIrVq8HBPYwzOw==@kvack.org X-Gm-Message-State: AOJu0Yyo3eG9a15M5y2fqdbtGJduBdOK2sCUxOoIiMFgsqycBLF12ySi /O0Af0Lp3c0Cb1FqQr5bPDDE9LVCSLtjkERV4b8IUPu1+7vrSKj4UVm7Qw1tXBWDrNT59cKpm/U SZqGRfDyh0iFv/6IZuhwTdukB59YqYlXXG13b X-Gm-Gg: ASbGncustuDaPATt0/xwyfibnJwlMrB/e5KPfYjsxstDEr9BXrUFb3BcHoPtJOKwdhx hVVJf0eaj4HkCDZv5dGfpaC7OYzuumChN X-Google-Smtp-Source: AGHT+IFj9LClCMmZ69TqOKCIC0RBNocfDv/WTyS0fCziCGn3Frk9SIkyrO/1KPe0mOksZkvCPqdSz3OmFIWTxG2+hzA= X-Received: by 2002:a05:6214:1306:b0:6d8:7ed4:3367 with SMTP id 6a1803df08f44-6dae38f40e0mr2118816d6.19.1733938261283; Wed, 11 Dec 2024 09:31:01 -0800 (PST) MIME-Version: 1.0 References: <20241211105336.380cb545@fangorn> <768a404c6f951e09c4bfc93c84ee1553aa139068.camel@surriel.com> <6bc895883abca3522c9efc0c56189741194581e5.camel@surriel.com> In-Reply-To: <6bc895883abca3522c9efc0c56189741194581e5.camel@surriel.com> From: Yosry Ahmed Date: Wed, 11 Dec 2024 09:30:24 -0800 X-Gm-Features: AbW1kvberlqxyKgnnbVtgJmwtiGAslvqeYrjbm8WURKxHBJnjAHFTNfqYnwCkHA Message-ID: Subject: Re: [PATCH] memcg: allow exiting tasks to write back data to swap To: Rik van Riel Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Nhat Pham Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 43B06C000E X-Stat-Signature: 4zohw3mmb94a8rt55ptxicr7org7h16z X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1733938252-684450 X-HE-Meta: U2FsdGVkX1+NI4UQEnwU3LGBuGFkm9wbz3xZ4L8ksJ/ECyoABiExUBqI1DDSOnIpNfPzN/FzTTD2Mln1hPoi12kqzngIGrVInWNX34CjmLaI/FDxlJSC/lnwwhk9v9oQDl4XU4TNspkJBZT8vtoIAN0geTjLzM2TrLR294rwTN1OSwHRXDIGIpOdnm63WYStU7xYKuDP27dq4/wG1NxU6567ukeWzLMkZk3q3Xe9QgGxkyrNSVmUDZ4VDAEvTHUMS5oFWhDrEZiV2+HDBLEB+OfTOGbldquvBwMPzeZYASXb/k2YLv/So8z/Cqv5aNkouHEO7tLVKob8XIWAsL1En4rusLK3EpozuRgoimURhLBGzhlchxoCX/pCpcXeFIKtOUhmBvKQ9fNDsZmplNcZcaYrRqMwqv3/1TPqgboQckOQC63lhd9y4+JYASNmmLy1AwfNRgstQgPvvi8h3inp3e6N8miNpcBfvjv0bp1mbwo8RSE6rE0gpS+9pFNzChnV5031s2J0F66L21SR+oShCBwfKai5O28mPML+ti+6MKAKxnaNSSCwqlnpCTiHVEDuuTktXM0gXZEaYt6Q4o5ju4zIoY0LKgoJMsxSCO1NDJkYuEoL0j/YpIUa1nqVFDBzOYxlcYp2NCmOpCpjb1v5GY0Fo1QAb+4Qbak94jV9b13GAaQqUx6xorLeOBa9dpA5Zuq+iHECIdLEfnLsvBWGo3tlIHysaRnaxngP2KhgSArxUBp6bComAiihlq1C+7uKFA3DvsuckbnmAGXvlGTVwzuanL+gw/oL8qGr2k0CIU+GVngdazMcg1z64dQu6wNtQ9sBp1tf7u/f7Y3GOEjQKHF1UC1+3QSHIc9virD00y6H4s7HWX6LMTpZc5sB2xYpktJwTIrW67COOVSI63hvRfnQ7xbreQ2m0Jk1//++obWPjufH3hFYq2JOKJ0y5NkzKK7xDInpOM8sxU7swGe pzwE8B/r kXrdBqKKdyjb5keIK22ADNbsf4+3JDbG1HDHxvNck9WZo5ImcmnM7AaK37A5QvM/YbkHj/fIBZxVxnxrL0HhcdmMKW1+urAQTW8lFC/vfoLQZwjxobhLOk6e9ST0D5vcCfOIdzSK8Rnxk/vcGc7O0UL5ZoAG/h08TQUkEhvUE3/yk4OmhCQ7n9A5zPs7/W2WZXuaSd1v8HrVasR3CXshr8vgJpfpurZiWrZAMqhFkyiLD/2XL3mpbHIn1IC6w2AFpH4th X-Bogosity: Ham, tests=bogofilter, spamicity=0.000022, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 11, 2024 at 9:20=E2=80=AFAM Rik van Riel wro= te: > > On Wed, 2024-12-11 at 09:00 -0800, Yosry Ahmed wrote: > > On Wed, Dec 11, 2024 at 8:34=E2=80=AFAM Rik van Riel > > wrote: > > > > > > On Wed, 2024-12-11 at 08:26 -0800, Yosry Ahmed wrote: > > > > On Wed, Dec 11, 2024 at 7:54=E2=80=AFAM Rik van Riel > > > > wrote: > > > > > > > > > > +++ b/mm/memcontrol.c > > > > > @@ -5371,6 +5371,15 @@ bool > > > > > mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) > > > > > if (!zswap_is_enabled()) > > > > > return true; > > > > > > > > > > + /* > > > > > + * Always allow exiting tasks to push data to swap. A > > > > > process in > > > > > + * the middle of exit cannot get OOM killed, but may > > > > > need > > > > > to push > > > > > + * uncompressible data to swap in order to get the > > > > > cgroup > > > > > memory > > > > > + * use below the limit, and make progress with the > > > > > exit. > > > > > + */ > > > > > + if ((current->flags & PF_EXITING) && memcg =3D=3D > > > > > mem_cgroup_from_task(current)) > > > > > + return true; > > > > > + > > > > > > > > I have a few questions: > > > > (a) If the task is being OOM killed it should be able to charge > > > > memory > > > > beyond memory.max, so why do we need to get the usage down below > > > > the > > > > limit? > > > > > > > If it is a kernel directed memcg OOM kill, that is > > > true. > > > > > > However, if the exit comes from somewhere else, > > > like a userspace oomd kill, we might not hit that > > > code path. > > > > Why do we treat dying tasks differently based on the source of the > > kill? > > > Are you saying we should fail allocations for > every dying task, and add a check for PF_EXITING > in here? I am asking, not really suggesting anything :) Does it matter from the kernel perspective if the task is dying due to a kernel OOM kill or a userspace SIGKILL? > > > if (unlikely(task_in_memcg_oom(current))) > goto nomem; > > > > > However, we don't know until the attempted zswap write > > > whether the memory is compressible, and whether doing > > > a bunch of zswap writes will help us bring our memcg > > > down below its memory.max limit. > > > > If we are at memory.max (or memory.zswap.max), we can't compress > > pages > > into zswap anyway, regardless of their compressibility. > > > Wait, this is news to me. > > This seems like something we should fix, rather > than live with, since compressing the data to > a smaller size could bring us below memory.max. > > Is this "cannot compress when at memory.max" > behavior intentional, or just a side effect of > how things happen to be? > > Won't the allocations made from zswap_store > ignore the memory.max limit because PF_MEMALLOC > is set? My bad, obj_cgroup_may_zswap() only checks the zswap limit, not memory.max. Please ignore this. The scenario I described where we scan the LRUs needlessly is if the *zswap limit* is hit, and writeback is disabled. I am guessing this is not the case you're running into. So yeah my only outstanding question is the one above about handling userspace OOM kills differently. Thanks for bearing with me.