From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3B44C433EF for ; Tue, 12 Apr 2022 02:16:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 615686B0075; Mon, 11 Apr 2022 22:16:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C4236B0078; Mon, 11 Apr 2022 22:16:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 464BB6B007B; Mon, 11 Apr 2022 22:16:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 348A46B0075 for ; Mon, 11 Apr 2022 22:16:35 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id EA6021ED3 for ; Tue, 12 Apr 2022 02:16:34 +0000 (UTC) X-FDA: 79346613108.05.89E03CA Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf21.hostedemail.com (Postfix) with ESMTP id 477281C0006 for ; Tue, 12 Apr 2022 02:16:34 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 107BFB819BF; Tue, 12 Apr 2022 02:16:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5B932C385A3; Tue, 12 Apr 2022 02:16:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1649729791; bh=hxuKFNGBeqS7vQiUwSPiIBI/AqNtviIZC+add1y1ko8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=mopX5Sy7AFPqL3cDVB4gG3iNilwP9Ot+R6JYewM4MOcsc6K2vGgX3GoToffvY9nwC LhzNDzD+Zczh3PgOiZj3/YGSl79ke2DibNlXBhvqkDKYGNN5fiUsbbTuRyc9Geqq1U thiQiWMkVd6JWhndwJJT4g769mJp5YGtwyCk+dAY= Date: Mon, 11 Apr 2022 19:16:27 -0700 From: Andrew Morton To: Yu Zhao Cc: Stephen Rothwell , linux-mm@kvack.org, Andi Kleen , Aneesh Kumar , Barry Song <21cnbao@gmail.com>, Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, page-reclaim@google.com, x86@kernel.org, Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?ISO-8859-1?Q?"Holger_Hoffst=E4tte"?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Subject: Re: [PATCH v10 10/14] mm: multi-gen LRU: kill switch Message-Id: <20220411191627.629f21de83cd0a520ef4a142@linux-foundation.org> In-Reply-To: <20220407031525.2368067-11-yuzhao@google.com> References: <20220407031525.2368067-1-yuzhao@google.com> <20220407031525.2368067-11-yuzhao@google.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 477281C0006 X-Stat-Signature: 93x8kutyqisgxa5ssfebzzpuzfd4mh8r X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=mopX5Sy7; dmarc=none; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1649729794-562401 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 6 Apr 2022 21:15:22 -0600 Yu Zhao wrote: > Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that > can be disabled include: > 0x0001: the multi-gen LRU core > 0x0002: walking page table, when arch_has_hw_pte_young() returns > true > 0x0004: clearing the accessed bit in non-leaf PMD entries, when > CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y > [yYnN]: apply to all the components above > E.g., > echo y >/sys/kernel/mm/lru_gen/enabled > cat /sys/kernel/mm/lru_gen/enabled > 0x0007 > echo 5 >/sys/kernel/mm/lru_gen/enabled > cat /sys/kernel/mm/lru_gen/enabled > 0x0005 I'm shocked that this actually works. How does it work? Existing pages & folios are drained over time or synchrnously? Supporting structures remain allocated, available for reenablement? Why is it thought necessary to have this? Is it expected to be permanent? > NB: the page table walks happen on the scale of seconds under heavy > memory pressure, in which case the mmap_lock contention is a lesser > concern, compared with the LRU lock contention and the I/O congestion. > So far the only well-known case of the mmap_lock contention happens on > Android, due to Scudo [1] which allocates several thousand VMAs for > merely a few hundred MBs. The SPF and the Maple Tree also have > provided their own assessments [2][3]. However, if walking page tables > does worsen the mmap_lock contention, the kill switch can be used to > disable it. In this case the multi-gen LRU will suffer a minor > performance degradation, as shown previously. > > Clearing the accessed bit in non-leaf PMD entries can also be > disabled, since this behavior was not tested on x86 varieties other > than Intel and AMD. > > ... > > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -432,6 +432,18 @@ static inline void cgroup_put(struct cgroup *cgrp) > css_put(&cgrp->self); > } > > +extern struct mutex cgroup_mutex; > + > +static inline void cgroup_lock(void) > +{ > + mutex_lock(&cgroup_mutex); > +} > + > +static inline void cgroup_unlock(void) > +{ > + mutex_unlock(&cgroup_mutex); > +} It's a tad rude to export mutex_lock like this without (apparently) informing its owner (Tejun). And if we're going to wrap its operations via helper fuctions then - presumably all cgroup_mutex operations should be wrapped and - exiting open-coded operations on this mutex should be converted. > > ... > > +static bool drain_evictable(struct lruvec *lruvec) > +{ > + int gen, type, zone; > + int remaining = MAX_LRU_BATCH; > + > + for_each_gen_type_zone(gen, type, zone) { > + struct list_head *head = &lruvec->lrugen.lists[gen][type][zone]; > + > + while (!list_empty(head)) { > + bool success; > + struct folio *folio = lru_to_folio(head); > + > + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > + VM_BUG_ON_FOLIO(folio_is_file_lru(folio) != type, folio); > + VM_BUG_ON_FOLIO(folio_zonenum(folio) != zone, folio); So many new BUG_ONs to upset Linus :( > + success = lru_gen_del_folio(lruvec, folio, false); > + VM_BUG_ON(!success); > + lruvec_add_folio(lruvec, folio); > + > + if (!--remaining) > + return false; > + } > + } > + > + return true; > +} > + > > ... > > +static ssize_t store_enable(struct kobject *kobj, struct kobj_attribute *attr, > + const char *buf, size_t len) > +{ > + int i; > + unsigned int caps; > + > + if (tolower(*buf) == 'n') > + caps = 0; > + else if (tolower(*buf) == 'y') > + caps = -1; > + else if (kstrtouint(buf, 0, &caps)) > + return -EINVAL; See kstrtobool() > + for (i = 0; i < NR_LRU_GEN_CAPS; i++) { > + bool enable = caps & BIT(i); > + > + if (i == LRU_GEN_CORE) > + lru_gen_change_state(enable); > + else if (enable) > + static_branch_enable(&lru_gen_caps[i]); > + else > + static_branch_disable(&lru_gen_caps[i]); > + } > + > + return len; > +} > > ... >