From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58F79C77B6E for ; Thu, 6 Apr 2023 10:31:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEC2D6B0075; Thu, 6 Apr 2023 06:31:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A9CC06B0078; Thu, 6 Apr 2023 06:31:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98C1A6B007B; Thu, 6 Apr 2023 06:31:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8BC8C6B0075 for ; Thu, 6 Apr 2023 06:31:05 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 64C76161273 for ; Thu, 6 Apr 2023 10:31:05 +0000 (UTC) X-FDA: 80650598490.02.98EA724 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 2A71E20025 for ; Thu, 6 Apr 2023 10:31:01 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cyfPao4T; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680777062; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DSkVqfN4fVdY5K7jYUBTl5KelyFFapb4DkfrIbimWkQ=; b=kiepHyG2hjswGXam+l9zi9oS4birLJkio+O0ertePvbXWqy2ebAyg9IXZaE8kaMqZWz1UX /nUTbfc6CuZz0JH6THFoo3XI11JQ4ZYj45Y4OGKm5nZwML9YLzic5W1g+rvwtQnXvUkkTO UpR+yn2bJy4e9iwqbdDNjUfSDg0omR0= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cyfPao4T; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680777062; a=rsa-sha256; cv=none; b=YsXRBA67RVZG92Jk/a2yJM46ft/tI4F7WgmHIT/+y4bAVR0yMSlp/K+1FmllS/ZW1ZwHP/ dYtuJugd0rZqZ55AyLQ4QAV9m+DAqJ8EhOzlmVw/wotEUjRrh90ME/sOWJQleXPeid8Gpb 81RT8logezTF4Ktk8RbNGfQlrrFyWwc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680777061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DSkVqfN4fVdY5K7jYUBTl5KelyFFapb4DkfrIbimWkQ=; b=cyfPao4TlFJAFRjpI0/wUZXG/XTxj9VrPUTicUeIvz08RjyQzBrwm8JtPjmA5MFshOdyIY TBLasHxiqS2p+hBA/MZdkghmlmK9qXDvWI9maZx8d+f2YUqM70XHy/LOsmgiCYePKQh4E7 nFH0CR6I2vnlhhe1Vl9omEX7bozXkF0= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-528-SI7IpJIINciCvvyvPDXf4A-1; Thu, 06 Apr 2023 06:31:00 -0400 X-MC-Unique: SI7IpJIINciCvvyvPDXf4A-1 Received: by mail-wm1-f69.google.com with SMTP id m27-20020a05600c3b1b00b003ee502f1b16so18106601wms.9 for ; Thu, 06 Apr 2023 03:31:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680777059; x=1683369059; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DSkVqfN4fVdY5K7jYUBTl5KelyFFapb4DkfrIbimWkQ=; b=MenPrpPnDbDUXYyUFpmZ6uREV6dsZJwz+mvZX6zEeRx94Il+MkV6EO3F0WMSpeLQVK KERhDgxFgYQC/uMmbyaWo/QQA813higF1TVbLWEkaQn8ZoUrdgM3R1BsyukgPDqPy2Mq XjnOHGvYIEg/ucjmuYwR4Vd1I38GiUA1tJINwNUg6zgiB91JlyF+MvQZBsFcu2E/Ifer OgpXYwjc5lzkJbmInUxWOu/giDTaVlpdbEpoClJfcGLrH4bM3RHcpOco/XasLJQckaRR fIh30lm6/f0/hRfWh/dNxd5s7gmC8i6sKF+cFbeWC+GMS7AQJG3gZyGQRDGj3qS4lAx6 QPsA== X-Gm-Message-State: AAQBX9dWjo8tGxWTFSTVOG4CNLivOrTYhj7x75sT3P0srW9JMXZndTv1 OEjNbj6cdHo7qTeqw6apTGkqiDQNuSdM7sJKvp92sh7axFT1j0o7qopm2IBFG+78KfdOlqNqXE6 KdZ6hZ/MikFY= X-Received: by 2002:adf:ce8b:0:b0:2db:43ed:1baa with SMTP id r11-20020adfce8b000000b002db43ed1baamr6300032wrn.24.1680777059259; Thu, 06 Apr 2023 03:30:59 -0700 (PDT) X-Google-Smtp-Source: AKy350bKzMGZCBY5+zcpx2zh++U0Vc78+mn9MAu7M75vPKKNUIUrRAHnNEOxAL/yv6as0YHwbBEW9w== X-Received: by 2002:adf:ce8b:0:b0:2db:43ed:1baa with SMTP id r11-20020adfce8b000000b002db43ed1baamr6299994wrn.24.1680777058780; Thu, 06 Apr 2023 03:30:58 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id r10-20020adfe68a000000b002c7b229b1basm1423645wrm.15.2023.04.06.03.30.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Apr 2023 03:30:58 -0700 (PDT) Message-ID: Date: Thu, 6 Apr 2023 12:30:56 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 To: Yosry Ahmed , Andrew Morton , Alexander Viro , "Darrick J. Wong" , Christoph Lameter , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Matthew Wilcox (Oracle)" , Miaohe Lin , Johannes Weiner , Peter Xu , NeilBrown , Shakeel Butt , Michal Hocko , Yu Zhao , Dave Chinner Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org References: <20230405185427.1246289-1-yosryahmed@google.com> <20230405185427.1246289-2-yosryahmed@google.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v5 1/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim In-Reply-To: <20230405185427.1246289-2-yosryahmed@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: mmmohfwfi33ww6z5pgan733bo9f9xc7o X-Rspam-User: X-Rspamd-Queue-Id: 2A71E20025 X-Rspamd-Server: rspam06 X-HE-Tag: 1680777061-256067 X-HE-Meta: U2FsdGVkX1/MpI9i84cPUNnMBkCIwbkXt0nRAu/ofnSzm8iVY5GrszvC0Y3P4JubaEids03CUUN1nVgX+agWydkb9DHsVt9WKxQ8pr8ssPKll//NdP1jUgr97YeFGkP7VEBXzIpv7E8z4ln+CfoQ0aHMnKmUXbs87j713nTtvuCyAVmwx2QJrkQ5hTrgkpQF0rmCemQ6QREvjrtsLpKkxbq8SFoMO0SuAWJoIO5uKsdyYioz897dfRiFMRKzyqUC4GrgUK/sfKaueHCf69EmpSRny7g+bIFjCHJFBjixhPU5D25Bt97jYC+gpPQalvSqmz3nb9MBeJv5DKMsbVFANihfZW+JSL1Yncbvts6nfv2xN5ywCnTisM1il5HvhhLpSP67oYLVkKUvf4fK3KIvpFeBJ6ZVGaGu6FBD1G6qt6hnvBg0h/9ROVWqexptnkqnGoKOmnBo3Nejo8SFc1CX+GCa9T9vwncorWBua3MtHTu3ks/tzesaHhLbcet4+/kuveYFKOwCG+gWweIazcyoxlGYD48vyEJ9ee+GMKZuQmrx4ESgbYl44C0p0OQC7zyejaw1nO3AuoP23ZDMINilNs9tjSjjnXtPD0Suicfca9z8H5LtxMPlwmX+Gq1XVCqzQpbBVSKx6hbKyCZePpl4HDQXg9K0T1+Zxqfyk7hLjiCBEl521yIJRvRpLg1YxTuOar4th5KoaYsCZVuaRHz1DjaEv7ckwyybPvn+4neMI0od03g6jmWzTHGxiautBpZCGQWluGljBT6sahJu+raoMAWAuyozWr8bTqzQpDIDiv+fZU0hEkC1A0Sw8Yvbx/JSjtRv9nFtnPqeVGBV6lrAHgTvtxu9+xqtUFgdMO/nIGYEoZY7ZGs5ZCpZyAwhnAQrmnjPSvCFsnQRJolTEbGRQlTcBWmZPjIvjGvmumjbxDFX3/de3HtYbuoyEeI7hgq9N9PNlfVflaYGDJzLRph 5oGiZ54T K9oud+6TvGW+l9D30vGvXZm33wm/Ym7fh12zQulKRckXjfRTkH902aejdWQJHsMGG8ak5r6+ui+YZt2ehcFpQ8Ru/udSVjWjbDxQFn20ZnwVv3hupez/7VuEdmPuMtW0WaqgL18SgwYMKdT863bEJ4M+NMR3iP8OOI/RxSO2SLTPMckShfzx6MDBdykUEoVp+8HXmtiWBdzezUkXC91S4PdB4VSesNQvCmGhEnT0vpO7qaO1LgWp7RJQpCX4j0CEE9JZT7a6D+MEF+da73wTJDa1FJ5cfdjX56L/3QsVlcpCV1wHV9aZ44tyJj7FiadcMPI51Xf+/SJPEykYs6HLSrK/nPMJyPI2aV4o5ezXJjc/p8tBNsNr9yVgxTB6AaCS7TPNddai+1Ji6MTN4UStmK+lgx1tzUr2jLUILaoK817AD9/X16VQhXsWkbZRlvBLVWyC7FzemOQkUQ+nSUTgDxX8hqyH8+qIuYue6dC52tuk9hMFGXxERky9BVRjKbjfGHxN9d8p5UtTsUljfqjtZeMzX5xTL+3oNYS7hENi6qmOrP7tmyaBGWxbvIAbsb0CP6Wt7opjzoddJvSk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05.04.23 20:54, Yosry Ahmed wrote: > We keep track of different types of reclaimed pages through > reclaim_state->reclaimed_slab, and we add them to the reported number > of reclaimed pages. For non-memcg reclaim, this makes sense. For memcg > reclaim, we have no clue if those pages are charged to the memcg under > reclaim. > > Slab pages are shared by different memcgs, so a freed slab page may have > only been partially charged to the memcg under reclaim. The same goes for > clean file pages from pruned inodes (on highmem systems) or xfs buffer > pages, there is no simple way to currently link them to the memcg under > reclaim. > > Stop reporting those freed pages as reclaimed pages during memcg reclaim. > This should make the return value of writing to memory.reclaim, and may > help reduce unnecessary reclaim retries during memcg charging. Writing to > memory.reclaim on the root memcg is considered as cgroup_reclaim(), but > for this case we want to include any freed pages, so use the > global_reclaim() check instead of !cgroup_reclaim(). > > Generally, this should make the return value of > try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g. > freed a slab page that was mostly charged to the memcg under reclaim), > the return value of try_to_free_mem_cgroup_pages() can be underestimated, > but this should be fine. The freed pages will be uncharged anyway, and we Can't we end up in extreme situations where try_to_free_mem_cgroup_pages() returns close to 0 although a huge amount of memory for that cgroup was freed up. Can you extend on why "this should be fine" ? I suspect that overestimation might be worse than underestimation. (see my comment proposal below) > can charge the memcg the next time around as we usually do memcg reclaim > in a retry loop. > > The next patch performs some cleanups around reclaim_state and adds an > elaborate comment explaining this to the code. This patch is kept > minimal for easy backporting. > > Signed-off-by: Yosry Ahmed > Cc: stable@vger.kernel.org Fixes: ? Otherwise it's hard to judge how far to backport this. > --- > > global_reclaim(sc) does not exist in kernels before 6.3. It can be > replaced with: > !cgroup_reclaim(sc) || mem_cgroup_is_root(sc->target_mem_cgroup) > > --- > mm/vmscan.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 9c1c5e8b24b8f..c82bd89f90364 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -5346,8 +5346,10 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) > vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned, > sc->nr_reclaimed - reclaimed); > > - sc->nr_reclaimed += current->reclaim_state->reclaimed_slab; > - current->reclaim_state->reclaimed_slab = 0; Worth adding a comment like /* * Slab pages cannot universally be linked to a single memcg. So only * account them as reclaimed during global reclaim. Note that we might * underestimate the amount of memory reclaimed (but won't overestimate * it). */ but ... > + if (global_reclaim(sc)) { > + sc->nr_reclaimed += current->reclaim_state->reclaimed_slab; > + current->reclaim_state->reclaimed_slab = 0; > + } > > return success ? MEMCG_LRU_YOUNG : 0; > } > @@ -6472,7 +6474,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) > > shrink_node_memcgs(pgdat, sc); > ... do we want to factor the add+clear into a simple helper such that we can have above comment there? static void cond_account_reclaimed_slab(reclaim_state, sc) { /* * Slab pages cannot universally be linked to a single memcg. So * only account them as reclaimed during global reclaim. Note * that we might underestimate the amount of memory reclaimed * (but won't overestimate it). */ if (global_reclaim(sc)) { sc->nr_reclaimed += reclaim_state->reclaimed_slab; reclaim_state->reclaimed_slab = 0; } } Yes, effective a couple LOC more, but still straight-forward for a stable backport > - if (reclaim_state) { > + if (reclaim_state && global_reclaim(sc)) { > sc->nr_reclaimed += reclaim_state->reclaimed_slab; > reclaim_state->reclaimed_slab = 0; > } -- Thanks, David / dhildenb