From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9506EC76196 for ; Thu, 6 Apr 2023 17:50:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C0856B0071; Thu, 6 Apr 2023 13:50:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 06FD06B0074; Thu, 6 Apr 2023 13:50:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E527C6B0075; Thu, 6 Apr 2023 13:50:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D12A96B0071 for ; Thu, 6 Apr 2023 13:50:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8972A120803 for ; Thu, 6 Apr 2023 17:50:08 +0000 (UTC) X-FDA: 80651704896.01.97C7261 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 5E99C120013 for ; Thu, 6 Apr 2023 17:50:05 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dJsrrrzW; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680803405; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J+gH6uw8oA2NExpsXjcHJ57Iz7fOEmYQlyHrV2brvqc=; b=st4vfYRpnL+w0HnernY/TwTPoOUU1yZS5jG2yva7etbOb2uuGXS2LzlK5vzhZkXFLz2fUq Q38mW5o7PDedPk02xjRlzs2ccXKA3qv+aa/3Sjfedg4jvnO4ZooL7acOknmE0fdOotArES k+5niLrvwnJQC5YJmJU5g/Zv27/C05o= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dJsrrrzW; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680803405; a=rsa-sha256; cv=none; b=hrMKTU6ScXDDUdJtpIYFFTgxBbbQKCuaz/OcbNpRMpKPU6rfkqFucAS4dNvVaG09SctwKl t8UXTNSSUG/KrCJKgZqDaFWbav7GRLDAzKRcpPW5Ft3rFm+wXGJhRjhLIcZOmE1G9ykORu eWIn1l1VMNeotPIUnvr/On9CIW4N2G4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680803404; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J+gH6uw8oA2NExpsXjcHJ57Iz7fOEmYQlyHrV2brvqc=; b=dJsrrrzWZzbnUUXRyjUYOAEpOmgAUAdcxL88jhdYCBOVKgvH00yDDTybj7xkMid/Ph1mcA 2Tb37v0vL224/mEV/ivfVA2NzIKr+fnLxf4nhb564PCo25R1zQ51O8UA3YEFXDXqjpxlkg xD70mkMFtlK015luvW66ci/PUrYNpWM= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-15-b_kQ6F5fNTulpBi6REpeKg-1; Thu, 06 Apr 2023 13:50:03 -0400 X-MC-Unique: b_kQ6F5fNTulpBi6REpeKg-1 Received: by mail-wm1-f71.google.com with SMTP id d11-20020a05600c34cb00b003ee89ce8cc3so18562071wmq.7 for ; Thu, 06 Apr 2023 10:50:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680803402; x=1683395402; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=J+gH6uw8oA2NExpsXjcHJ57Iz7fOEmYQlyHrV2brvqc=; b=SDj+StYT3zpriznHPGyGcx5H9BgYU1PYXBGmBIC/Hga3cp/M01QnxU1/drDQo6zknL J6XGOXiu/VE13cH+TTKaPBAdeRwDX8kYaLQGAAbI1qR9vGmnAD0djXkdeDU54/69pQ8h tFXt5lK/9vbK6PUAiZX8arVMUxVfNSAw6vSQPZjEZDw0gA9Z+6GuMT2ozWk3CI4V+zIa kF2GO+bI3hTiHaxqSagbzACODNPrGfgMIfNWRh9nA9qwyj9BSsQaZDUZ9tpli/QV1e9E pNFrbPld+9A+rxS1d5TgfLDX3qqBPOxpsmypCSZ8PNFCXjTuxLdKKVub3Wjl6G5fL+H1 y41Q== X-Gm-Message-State: AAQBX9cqefKi5sbUyuEZeDGgNMVpIeOQMV35eF8lMGY8q0f2aUoR9b6O HkkJTMf6iOhGBx0cUR8d80r973pvNwSbPzJ8qeJeOaAsYs8ZpPIwLDywCvvPHhIe4BkIwnj6Eq+ HZBFeXbrje2g= X-Received: by 2002:adf:ffc6:0:b0:2cf:eb5d:70b5 with SMTP id x6-20020adfffc6000000b002cfeb5d70b5mr8002092wrs.15.1680803402208; Thu, 06 Apr 2023 10:50:02 -0700 (PDT) X-Google-Smtp-Source: AKy350ad59zWuMu7whj0wmSURS5JUKE13e/DU1YY3SEyOltrsCwrZk78B+LPq9aeWxcIY9kBuxK5hg== X-Received: by 2002:adf:ffc6:0:b0:2cf:eb5d:70b5 with SMTP id x6-20020adfffc6000000b002cfeb5d70b5mr8002071wrs.15.1680803401790; Thu, 06 Apr 2023 10:50:01 -0700 (PDT) Received: from ?IPV6:2003:cb:c705:6300:a8be:c1ad:41a1:2bf7? (p200300cbc7056300a8bec1ad41a12bf7.dip0.t-ipconnect.de. [2003:cb:c705:6300:a8be:c1ad:41a1:2bf7]) by smtp.gmail.com with ESMTPSA id z14-20020adfd0ce000000b002c55306f6edsm2321939wrh.54.2023.04.06.10.50.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Apr 2023 10:50:01 -0700 (PDT) Message-ID: <14d50ddd-507e-46e7-1a32-72466dec2a40@redhat.com> Date: Thu, 6 Apr 2023 19:49:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH v5 1/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim To: Yosry Ahmed Cc: Andrew Morton , Alexander Viro , "Darrick J. Wong" , Christoph Lameter , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Matthew Wilcox (Oracle)" , Miaohe Lin , Johannes Weiner , Peter Xu , NeilBrown , Shakeel Butt , Michal Hocko , Yu Zhao , Dave Chinner , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org References: <20230405185427.1246289-1-yosryahmed@google.com> <20230405185427.1246289-2-yosryahmed@google.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5E99C120013 X-Stat-Signature: af9t59o63rxjfud4rkwcdse1qpdroq5c X-HE-Tag: 1680803405-314098 X-HE-Meta: U2FsdGVkX18zQFkS2saCiY/sYy/qGVmg9N3vsH8Shxht7uQpye30hSzLSDk11TMbrAl4brDS0Hp4ldHXULGfMpxFTDlcnKeZWsDktF5dMdrDP+MK6+MhRIslymBD+khFMPIiIRq+kNf6WZZDE4xHephildFRl1wKt0aolWulBfgJ46iSST6wMvCHjp3bYoy90m9qzCfaF2AFQp1K2nPARQDAa+L/JgfD3SWnYnRVBZhxMEHeXN2t5jHbRT6UQqgfrlDXU4t9orVVUBhZyRm4MCqrky2Sp81iyHZciCJTlzQsRXfmU3olFBim506pc2TMILUyU5tVBJA8JUbLACNSOyr6fXmLvTGHT4VtwmFEHMjSySFCadbp8naNV+vEKCa8WI8WNkx0rUgQzBk6gmvv91uXoDpLjSrWxVCxUSecdyidQZi6dqGW39wung5B8ua8kCvPSJCIZGmLxbBO7lzFKVA3T3FTKjC6d1fV2im4mTX1YflLQVCosJjFs/js/BZoxT5+Kmzfchhb9pO1KybQ1xu9NQH0yKP3Uf2XWdfhh61g3+TX09exmGfkdYOObl+HvIS2aTk5nRoLcvcmPMSMfMtOq3FVs7FfjwwJJjhgSIflOomyjA5XTH0q9zGXxYoADQUn5gMahNpffM6iD6lsYV66qwrdvzzwRnCQQAfx+KfbRxMw4u5P3W4vokzHBpReV76ANcwENsK5d0dDU81edhMwJgjpzgHid+d4OhmFRGLm48jS/rPh2en8dxSRk5IwuQ5p3RMnMnzWqcSkSiTIRJLbHw2EMpV6Bs9aqowmZKiY9rsSQxM+iF1bbmXdbIfGZBZU4IGtn9NjH7HI4inbO2CmwJ5uso3oZh18+flmnxrlPUKvqf8ofyjAT1yfPn16Q5xgUmYB8sB+dfZqDcH1iXqu0KhjlGW725emzOlZlbg2kGuMb0hQhspomFWSns0JoasY8Ep9kw4u35dHdcZ hmbnpYpP /R4m7ugngmGE7UP9/zO/6lolUxbjRM1wktL+2oEqXZZVtWUrIprGPa0YGdkbWcIMWzAonHERxiyP6y4ayLwzIC6dJj6lZe0IvrS7bAeGNtqkgg4vgIhO6FqKJ8oMi3Ckmk1g4qKUi8XeFVdXpbeS7ThNQvmDAVvHEqv1C76pHhLpKhM+haV1EvaVXidnVOuDV0suLZc6jS0hZtfVlVeSN90ZpT9oUuACk0kpjnsXJ+O8AjEIs/knlbGlUmeUTYwrHxkLHRhkr7Q2xv9a309xwL2ZN2HEnSXJg7up5CuKEzKcZToSgOBgiWcT6pwEBmJhffxS509Fian8tbsvxngpAZ0VZ0ljLqnthjZyB7vjRZzkppmrYPMItsTK/QuHt2Svj/prSM6cdd6CCx0GcBTCpTnsoK/73XTHKYBEKLfjk7D3xRK71+8iVqZWhQnL73frz3CKddv32wbhtUMFG5Zz2nFYr9CrIynUXVzeFB6rGGFGpZWanqoU5klPz83L/J0IzgkfYr7H/qldq8oBwuYHdu0NKNFZzS2bt6xfxwJql3rn1ObW43qvT9J+qBydiaXP0GVOFp68JsdduABo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.04.23 16:07, Yosry Ahmed wrote: > Thanks for taking a look, David! > > On Thu, Apr 6, 2023 at 3:31 AM David Hildenbrand wrote: >> >> On 05.04.23 20:54, Yosry Ahmed wrote: >>> We keep track of different types of reclaimed pages through >>> reclaim_state->reclaimed_slab, and we add them to the reported number >>> of reclaimed pages. For non-memcg reclaim, this makes sense. For memcg >>> reclaim, we have no clue if those pages are charged to the memcg under >>> reclaim. >>> >>> Slab pages are shared by different memcgs, so a freed slab page may have >>> only been partially charged to the memcg under reclaim. The same goes for >>> clean file pages from pruned inodes (on highmem systems) or xfs buffer >>> pages, there is no simple way to currently link them to the memcg under >>> reclaim. >>> >>> Stop reporting those freed pages as reclaimed pages during memcg reclaim. >>> This should make the return value of writing to memory.reclaim, and may >>> help reduce unnecessary reclaim retries during memcg charging. Writing to >>> memory.reclaim on the root memcg is considered as cgroup_reclaim(), but >>> for this case we want to include any freed pages, so use the >>> global_reclaim() check instead of !cgroup_reclaim(). >>> >>> Generally, this should make the return value of >>> try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g. >>> freed a slab page that was mostly charged to the memcg under reclaim), >>> the return value of try_to_free_mem_cgroup_pages() can be underestimated, >>> but this should be fine. The freed pages will be uncharged anyway, and we >> >> Can't we end up in extreme situations where >> try_to_free_mem_cgroup_pages() returns close to 0 although a huge amount >> of memory for that cgroup was freed up. >> >> Can you extend on why "this should be fine" ? >> >> I suspect that overestimation might be worse than underestimation. (see >> my comment proposal below) > > In such extreme scenarios even though try_to_free_mem_cgroup_pages() > would return an underestimated value, the freed memory for the cgroup > will be uncharged. try_charge() (and most callers of > try_to_free_mem_cgroup_pages()) do so in a retry loop, so even if > try_to_free_mem_cgroup_pages() returns an underestimated value > charging will succeed the next time around. > > The only case where this might be a problem is if it happens in the > final retry, but I guess we need to be *really* unlucky for this > extreme scenario to happen. One could argue that if we reach such a > situation the cgroup will probably OOM soon anyway. > >> >>> can charge the memcg the next time around as we usually do memcg reclaim >>> in a retry loop. >>> >>> The next patch performs some cleanups around reclaim_state and adds an >>> elaborate comment explaining this to the code. This patch is kept >>> minimal for easy backporting. >>> >>> Signed-off-by: Yosry Ahmed >>> Cc: stable@vger.kernel.org >> >> Fixes: ? >> >> Otherwise it's hard to judge how far to backport this. > > It's hard to judge. The issue has been there for a while, but > memory.reclaim just made it more user visible. I think we can > attribute it to per-object slab accounting, because before that any > freed slab pages in cgroup reclaim would be entirely charged to that > cgroup. > > Although in all fairness, other types of freed pages that use > reclaim_state->reclaimed_slab cannot be attributed to the cgroup under > reclaim have been there before that. I guess slab is the most > significant among them tho, so for the purposes of backporting I > guess: > > Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > instead of pages") > >> >>> --- >>> >>> global_reclaim(sc) does not exist in kernels before 6.3. It can be >>> replaced with: >>> !cgroup_reclaim(sc) || mem_cgroup_is_root(sc->target_mem_cgroup) >>> >>> --- >>> mm/vmscan.c | 8 +++++--- >>> 1 file changed, 5 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 9c1c5e8b24b8f..c82bd89f90364 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -5346,8 +5346,10 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) >>> vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned, >>> sc->nr_reclaimed - reclaimed); >>> >>> - sc->nr_reclaimed += current->reclaim_state->reclaimed_slab; >>> - current->reclaim_state->reclaimed_slab = 0; >> >> Worth adding a comment like >> >> /* >> * Slab pages cannot universally be linked to a single memcg. So only >> * account them as reclaimed during global reclaim. Note that we might >> * underestimate the amount of memory reclaimed (but won't overestimate >> * it). >> */ >> >> but ... >> >>> + if (global_reclaim(sc)) { >>> + sc->nr_reclaimed += current->reclaim_state->reclaimed_slab; >>> + current->reclaim_state->reclaimed_slab = 0; >>> + } >>> >>> return success ? MEMCG_LRU_YOUNG : 0; >>> } >>> @@ -6472,7 +6474,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) >>> >>> shrink_node_memcgs(pgdat, sc); >>> >> >> ... do we want to factor the add+clear into a simple helper such that we >> can have above comment there? >> >> static void cond_account_reclaimed_slab(reclaim_state, sc) >> { >> /* >> * Slab pages cannot universally be linked to a single memcg. So >> * only account them as reclaimed during global reclaim. Note >> * that we might underestimate the amount of memory reclaimed >> * (but won't overestimate it). >> */ >> if (global_reclaim(sc)) { >> sc->nr_reclaimed += reclaim_state->reclaimed_slab; >> reclaim_state->reclaimed_slab = 0; >> } >> } >> >> Yes, effective a couple LOC more, but still straight-forward for a >> stable backport > > The next patch in the series performs some refactoring and cleanups, > among which we add a helper called flush_reclaim_state() that does > exactly that and contains a sizable comment. I left this outside of > this patch in v5 to make the effective change as small as possible for > backporting. Looks like it can be confusing tho without the comment. > > How about I pull this part to this patch as well for v6? As long as it's a helper similar to what I proposed, I think that makes a lot of sense (and doesn't particularly bloat this patch). -- Thanks, David / dhildenb