From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 169C0C433F5 for ; Thu, 6 Oct 2022 05:11:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36CCB6B0071; Thu, 6 Oct 2022 01:11:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 31D3F6B0073; Thu, 6 Oct 2022 01:11:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BD0E6B0074; Thu, 6 Oct 2022 01:11:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 072666B0071 for ; Thu, 6 Oct 2022 01:11:15 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BFD8B140428 for ; Thu, 6 Oct 2022 05:11:14 +0000 (UTC) X-FDA: 79989350868.22.93FC68F Received: from mail-vs1-f49.google.com (mail-vs1-f49.google.com [209.85.217.49]) by imf17.hostedemail.com (Postfix) with ESMTP id 672E64001E for ; Thu, 6 Oct 2022 05:11:14 +0000 (UTC) Received: by mail-vs1-f49.google.com with SMTP id u189so954198vsb.4 for ; Wed, 05 Oct 2022 22:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=5+UVhZeJoX4E+WLpr6yal2+2CsSWy2udLw4VJRvJT/4=; b=L3Xj7CNE0iYbk4Ok4e/ZTGnvZ5gj7ioKBA8XJGpolU3HZEPrufCoVkh86+13rrBvxZ bNByukcF4GRti36tqGuYv5xL0zXlNzm6NykB35agHaw52zNNzujhtdfmxowToQbARytP T0vjAbod/q5jQCPyIiBYR5FA1agmOjCVDgv8a//KLkhbRRpSpjACEZ6Eqd2//6a/2Xwk JxtljFMZ8xTLx+4mTeDoI+5iQQtBCuNkgEHTk2qC6Xu3QH///lykGQiosOyCB31NflMg 3QAOnUICU3t3RaQb7BUzMrMylpHjeR+ZiUHTyN6bRjvr3n5XYBqVGWSb1RasbglRGjZC fgpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=5+UVhZeJoX4E+WLpr6yal2+2CsSWy2udLw4VJRvJT/4=; b=R8vSgE1KXKcmBYUt+0gqt0wXGSVmUZu0x3IO0QFw0RSXAoG8/5UeG2s5TAKOYKGsVe FElhLuQxS/rVgMGJudIZkCEmcrRgV24+67fGpOUhTzRKLfoCHzKAGJ1Yo+GqgMM403iY s3TA765LA04j0I+53T5/fBOWnjJx8cbbtqjehfQAJ+sVuJHCYCmzlk8wuFuWkK2XRIKb fNg/9AYx9I7qokNkOrDqKsuBAqfSus6xahlCsnSBmlUO0Od6qdOuCce+toJ8vOisUh77 G633xq5JNljLruiTstbBDlVgt3OYb9olJ7d6lKicTdWIQLLRhSJYNU5CmGucVbkTIrta F8bA== X-Gm-Message-State: ACrzQf0F8spZcFD0H8m66owqaI28gNTLK5lUPWDTsufe1SnyOitM7lcH 2UWeqHydw+r8Wc5qK0fmcfFs1NcGc/xokRK1loZZ/A== X-Google-Smtp-Source: AMsMyM4hOMbC3IJKXlBxwVnc1CGr2wX2+4f6W3gA4myq3Epr7Of7WPyRKyhnZs6io3NEpfzjRKFs28WgVg1flGFiiRs= X-Received: by 2002:a67:ac08:0:b0:3a5:d34b:ae1 with SMTP id v8-20020a67ac08000000b003a5d34b0ae1mr1305964vse.46.1665033073521; Wed, 05 Oct 2022 22:11:13 -0700 (PDT) MIME-Version: 1.0 References: <20221005173713.1308832-1-yosryahmed@google.com> In-Reply-To: From: Yu Zhao Date: Wed, 5 Oct 2022 23:10:37 -0600 Message-ID: Subject: Re: [PATCH v2] mm/vmscan: check references from all memcgs for swapbacked memory To: Johannes Weiner Cc: Yosry Ahmed , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Greg Thelen , David Rientjes , Cgroups , Linux-MM Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665033074; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5+UVhZeJoX4E+WLpr6yal2+2CsSWy2udLw4VJRvJT/4=; b=TOlDMRZn4ERpV8bu+WJey9828AmLl2JoSJ0dGc6Owf5vxdGeATj7k3RW+AfQoMWifz1TJb d40nYyuUANIcm2scw0Myeg45y+rySHn+wiFlULhD4oekb9wouIoTyI0S/ZB8s63kx+58FZ ulXZEX2fTSDTFhgh557U6G4Iocq9mFk= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L3Xj7CNE; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665033074; a=rsa-sha256; cv=none; b=TsxUrjgkbucBfbKjNjrYeTdm2c/WXMqAevOPuOuiTEyte1n/d6sQjjaLXfwy4Oq0CyCpga CMyFyjsKU6KOl9uL4XqY/VctqV0kUO73JO9Dok14Sdn+z1GfbZQdrV29c+Ou3pQVDVO8sK 0n70KdfKTAw3qV+EEVnhRLlHxyJFIUQ= X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L3Xj7CNE; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.49 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 672E64001E X-Stat-Signature: 64qbeajwrimbzusw9qu8g8gj65d1qysn X-HE-Tag: 1665033074-829914 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 5, 2022 at 10:19 PM Johannes Weiner wrote: > > On Wed, Oct 05, 2022 at 03:13:38PM -0600, Yu Zhao wrote: > > On Wed, Oct 5, 2022 at 3:02 PM Yosry Ahmed wrote: > > > > > > On Wed, Oct 5, 2022 at 1:48 PM Yu Zhao wrote: > > > > > > > > On Wed, Oct 5, 2022 at 11:37 AM Yosry Ahmed wrote: > > > > > > > > > > During page/folio reclaim, we check if a folio is referenced using > > > > > folio_referenced() to avoid reclaiming folios that have been recently > > > > > accessed (hot memory). The rationale is that this memory is likely to be > > > > > accessed soon, and hence reclaiming it will cause a refault. > > > > > > > > > > For memcg reclaim, we currently only check accesses to the folio from > > > > > processes in the subtree of the target memcg. This behavior was > > > > > originally introduced by commit bed7161a519a ("Memory controller: make > > > > > page_referenced() cgroup aware") a long time ago. Back then, refaulted > > > > > pages would get charged to the memcg of the process that was faulting them > > > > > in. It made sense to only consider accesses coming from processes in the > > > > > subtree of target_mem_cgroup. If a page was charged to memcg A but only > > > > > being accessed by a sibling memcg B, we would reclaim it if memcg A is > > > > > is the reclaim target. memcg B can then fault it back in and get charged > > > > > for it appropriately. > > > > > > > > > > Today, this behavior still makes sense for file pages. However, unlike > > > > > file pages, when swapbacked pages are refaulted they are charged to the > > > > > memcg that was originally charged for them during swapping out. Which > > > > > means that if a swapbacked page is charged to memcg A but only used by > > > > > memcg B, and we reclaim it from memcg A, it would simply be faulted back > > > > > in and charged again to memcg A once memcg B accesses it. In that sense, > > > > > accesses from all memcgs matter equally when considering if a swapbacked > > > > > page/folio is a viable reclaim target. > > > > > > > > > > Modify folio_referenced() to always consider accesses from all memcgs if > > > > > the folio is swapbacked. > > > > > > > > It seems to me this change can potentially increase the number of > > > > zombie memcgs. Any risk assessment done on this? > > > > > > Do you mind elaborating the case(s) where this could happen? Is this > > > the cgroup v1 case in mem_cgroup_swapout() where we are reclaiming > > > from a zombie memcg and swapping out would let us move the charge to > > > the parent? > > > > The scenario is quite straightforward: for a page charged to memcg A > > and also actively used by memcg B, if we don't ignore the access from > > memcg B, we won't be able to reclaim it after memcg A is deleted. > > This patch changes the behavior of limit-induced reclaim. There is no > limit reclaim on A after it's been deleted. And parental/global > reclaim has always recognized outside references. We use memory.reclaim to scrape memcgs right before rmdir so that they are unlikely to stick around. Otherwise our job scheduler would see less available memory and become less eager to increase load. This in turn reduces the chance of global reclaim, and deleted memcgs would stick around even longer.