From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12637C7EE23 for ; Fri, 26 May 2023 06:39:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 933D0900004; Fri, 26 May 2023 02:39:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BD48900002; Fri, 26 May 2023 02:39:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 785EB900004; Fri, 26 May 2023 02:39:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 61CEA900002 for ; Fri, 26 May 2023 02:39:07 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 023ADC0D21 for ; Fri, 26 May 2023 06:39:06 +0000 (UTC) X-FDA: 80831453934.15.8AB7433 Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf11.hostedemail.com (Postfix) with ESMTP id 1BA7740012 for ; Fri, 26 May 2023 06:39:04 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=mHpg5DEb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685083145; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aO8WcUZImZYEcXMmvb5XeqF4flNwtTh70ks9aqSmaXk=; b=i2xa9n0a6b05taAPo66cEQ6WQUe9vf3C3FEcO43IFdAd2tZZXlZJk9kpGLVukMyf8AwuT5 hTOoV102Fpqd5hG39yYDDbi0Nqb8mBW6F+ubmBSPJCjorlptMezqyVrR1zbOfaQkVeUuOX DMmVKApaw+/5LyPxc+Ku/pPdrBHktsk= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=mHpg5DEb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685083145; a=rsa-sha256; cv=none; b=srOyViF21g63z3RAQzUPlg3uGeQXGgMbwKUEWFmpDkTmzNdx3a4lRjyQYr8/B1XdXVQQ/T R8D2itt6dfohVwszrqjImS13RAvZ9S+NSGy+Nz8wmXUproJat8j+58aVa5jfqnZfDkEsK7 5iqQ5V15o8laRlUbsjvfeEzQRyZIe9A= Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-4effb818c37so324590e87.3 for ; Thu, 25 May 2023 23:39:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685083142; x=1687675142; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aO8WcUZImZYEcXMmvb5XeqF4flNwtTh70ks9aqSmaXk=; b=mHpg5DEb3GzDpprV5V5IDZq6Js+9J/DwLwh7rGPqTyOA1iUCYHiduK96ziHf8J3ud1 IsBlBQePcevpLOI/1WJHzRxO6uKJfGMCmrO0fTjtbSlJqpcw4dSKh970+24QqB8ihP4U Kip9o8xw3HSXbVr3y/Dxp2pDU3y9m7s8VKpKn8LXvQ2tfRKdNQxRT+bsrv3/DvYH6lKQ /ZhKBjI8ak5qpmW5kB6eegvGWGJxVVnFyyq11MBRg/cbkVQuYYgD+3cYliOhrMPfAkP1 tLieujOrM6SzhoqpMhDu1oFhB12/M9qKsghIWauBNkpfBl54VlqmU8oqsc3wrRrXY3mU sMQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685083142; x=1687675142; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aO8WcUZImZYEcXMmvb5XeqF4flNwtTh70ks9aqSmaXk=; b=W6JOL+Y5SD9nZ9UCCW/2HGsHb9b4ChRsjy4OTbhQj1aZaujDSGg0MsdRkgneJaCODt TtLEQXVxqKZTYBdAVQKOW4NfaC+HyU8epuZcchCB2oTfrl5MqTuhePMAANynENrYdnRv 31VR7rxObaG8vwsu+EXK7SbDSsAnpd934C0DN6SLeSDRurFaXyeS25qOGJx2FulAlwm/ 2jM23fVtySYAc0viLCoBcEC2t2+HNLSCU96fQDj7zzV9PG156qAuzPhAPZf0Mx1q6+mX PTnFJLawxxy0pq8t/R9J9wSMcfplEsU0aR1PZaLUmsWMYCRX/6Ho1rBaHBZYn+f/bzcY eg3Q== X-Gm-Message-State: AC+VfDzTSJN6+9vADOb1JOgrSvW7T0AWglLNjHry2iKgK2RiiNYR+DWv igeBwxWrl4j6Gb888HMyg0xesSDXBxoXAI9uDXk= X-Google-Smtp-Source: ACHHUZ5UoldH91dZI3YzbotUTq6j2XG/Af7GeLCC3BUtAVi76GKznZCJxzzD1sru2vfAwM4zUtc0PICptIDgFXeKA60= X-Received: by 2002:a05:6512:14f:b0:4f3:3eeb:20dc with SMTP id m15-20020a056512014f00b004f33eeb20dcmr206839lfo.16.1685083142008; Thu, 25 May 2023 23:39:02 -0700 (PDT) MIME-Version: 1.0 References: <1684919574-28368-1-git-send-email-zhaoyang.huang@unisoc.com> <20230525135407.GA31865@cmpxchg.org> In-Reply-To: <20230525135407.GA31865@cmpxchg.org> From: Zhaoyang Huang Date: Fri, 26 May 2023 14:38:38 +0800 Message-ID: Subject: Re: [PATCH] mm: deduct the number of pages reclaimed by madvise from workingset To: Johannes Weiner Cc: "zhaoyang.huang" , Andrew Morton , Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, ke.wang@unisoc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1BA7740012 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: wo4j1ufor539dmbd558fb96kad1or7w8 X-HE-Tag: 1685083144-497753 X-HE-Meta: U2FsdGVkX1/i7y2ZwR452Njhm33GXNtWctTiob0i+KeaYchN2EO02529fLRiXL/Y1dd0Do8EJIqv/Kgp/06qSO39kO4IZ1wz5yn8Y+J1Bzw7/+qg7vRcbzPF4sXcBcSixJs0g/IHPn/s+7FUb4sfKId/i6xXIWl6ojeyl/ben/dRhKQ1XeoEMa0guZrUBoA1Q8QnogO15bmOxW+8wSeeeVKI5goLTgW8Nsq3+yKjNLZxGt4gh5DgqC2AR5Vj2KfBNoUJbH07ohtJZqYI3NY/4SND9ui3v3L1V4zJxi56Mk5iUBEXoMphv7lJ5iWsA8jHCp0+FHEWTMRfUr0FSEu/j9H75lfpxHgugS1uxEOUp8++t33u4SQAktRvQyWbUb29M8LfnLVTSkq+q3I0LC+jjmzxHLzzWxV5tu3aQm64alE8EmmdAf5JyUhxbOfE6R4SRijh05hqJatVlheYcROSvuaKp2j7eJPNiLSgDgHeIt0EHS3sPJuxkqDIHpLBnQLfb7dJf8H0NZmycNdlRt2q7LpPQVYr67H4CKJoZAL6ldyzr2lJYvZ6ddpTFxgftp6NWjoFepNoSYZiy2ROccFwTHp41RM6h/wlx9HA6nHRz6tCeSKlexIohlgyCbRoab7Vy89zvhVJ5DkEmY8PbOMUjlmY+OryqDw1ceI6GUIo1I8LjwwDzX/NN5VZr2jGLXSs1y0rfRL7WZuRekMDxniBnkSCxH5f3mKSUsb6int4BR5VOFvOHb3p/ku2agIORrRGSAI/op4RpicTg8DCEdbRXR8EBL5jUsf1NK1xlOPKb7A3ckBtwn83OnPh2yWcixR28e745PL1ISqJ7rl05t9z4Exec7/LDqv+mf3Rajr2J8nf0yGD8dpFG4YrE4iFzhUYmm1IXx8E/iiO06bO7Mb7l9Nv23h3uw1y82vLPhEvpkwzC6JUrNzYPuqRMJ6ODlaP060aPNA65lvGxMYrPL3 jhof0oSF JVIXOXWFrdlE/085/jDG2aNF4rKojNON7G7usZEMet0QM5BPayH4aW+Ca16FWCobNbPFQ1s9snqPi/rnolKe50PPIzXczxKSoeRRQ29n25IwAJXsWajao5x2smstXaShy5t/MeUOG/TQ17jY6uKRbLqX3Ti0YHlo9M9nMn2iq+xxo0sbQHYVb1mTHqwmko8WP6+Lg9gNL6kHqo5UjlQE0ETS5+EUx9DixfCI8N/4X+xNHTVvTV7DQ72L3VvfDFkQTpq+q3uvAMlq4VABxzYIACoTyawVVrTTt/lUBmFGZaCxsfroroCd8IW8nNGJtsnZG7WVcMlUW8Z4Gw+cxRq53iOhR/Q0k2ua0iNF+UqoIapV8YsHxUClyLaH80P8SlHgywsJ48Qy9wI3ANejo4bVvUWt/MUfjH8gMJ07xUrv6Yh9FoXQJjoL1htDZsen15hWdfAu4sUnl766ST3dci1fk0mXeyA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 25, 2023 at 9:54=E2=80=AFPM Johannes Weiner wrote: > > On Wed, May 24, 2023 at 05:12:54PM +0800, zhaoyang.huang wrote: > > From: Zhaoyang Huang > > > > The pages reclaimed by madvise_pageout are made of inactive and dropped= from LRU > > forcefully, which lead to the coming up refault pages possess a large r= efault > > distance than it should be. These could affect the accuracy of thrashin= g when > > madvise_pageout is used as a common way of memory reclaiming as ANDROID= does now. > > This alludes to, but doesn't explain, a real world usecase. More block io(wait_on_page_bit_common) observed during APP start in latest android version where user space memory reclaiming changes from in-kernel PPR to madvise_pageout. We believe that it could be related with inaccuracy of workingset. > > Yes, madvise_pageout() will record non-resident entries today. This > means refault and thrash detection is on for user-driven reclaim. > > So why is that undesirable? Let's raise an extreme scenario, that is, the tail page of LRU could experience a given refault distance without any in-kernel reclaiming and be wrongly deemed as inactive and get less protection. > > Today we measure and report the cost of reclaim and memory pressure > for physical memory shortages, cgroup limits, and user-driven cgroup > reclaim. Why should we not do the same for madv_pageout()? If the > userspace code that drives pageout has a bug and the result is extreme > thrashing, wouldn't you want to know that? Actually, the pages evicted by madv_cold/pageout from active_lru are not marked as WORKINGSET, which will surpass the thrashing account when it faults back and gets struck by IO. I think they should be treated in the same way in terms of SetPageWorkingset and lruvec->non-resident. Please refer to my previous patch "mm: mark folio as workingset in lru_deactivate_fn index 70e2063..4d1c14f 100644" > > Please explain the idea here better. > > > Signed-off-by: Zhaoyang Huang > > --- > > include/linux/swap.h | 2 +- > > mm/madvise.c | 4 ++-- > > mm/vmscan.c | 8 +++++++- > > 3 files changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > index 2787b84..0312142 100644 > > --- a/include/linux/swap.h > > +++ b/include/linux/swap.h > > @@ -428,7 +428,7 @@ extern unsigned long mem_cgroup_shrink_node(struct = mem_cgroup *mem, > > extern int vm_swappiness; > > long remove_mapping(struct address_space *mapping, struct folio *folio= ); > > > > -extern unsigned long reclaim_pages(struct list_head *page_list); > > +extern unsigned long reclaim_pages(struct mm_struct *mm, struct list_h= ead *page_list); > > #ifdef CONFIG_NUMA > > extern int node_reclaim_mode; > > extern int sysctl_min_unmapped_ratio; > > diff --git a/mm/madvise.c b/mm/madvise.c > > index b6ea204..61c8d7b 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -420,7 +420,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, > > huge_unlock: > > spin_unlock(ptl); > > if (pageout) > > - reclaim_pages(&page_list); > > + reclaim_pages(mm, &page_list); > > return 0; > > } > > > > @@ -516,7 +516,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, > > arch_leave_lazy_mmu_mode(); > > pte_unmap_unlock(orig_pte, ptl); > > if (pageout) > > - reclaim_pages(&page_list); > > + reclaim_pages(mm, &page_list); > > cond_resched(); > > > > return 0; > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 20facec..048c10b 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -2741,12 +2741,14 @@ static unsigned int reclaim_folio_list(struct l= ist_head *folio_list, > > return nr_reclaimed; > > } > > > > -unsigned long reclaim_pages(struct list_head *folio_list) > > +unsigned long reclaim_pages(struct mm_struct *mm, struct list_head *fo= lio_list) > > { > > int nid; > > unsigned int nr_reclaimed =3D 0; > > LIST_HEAD(node_folio_list); > > unsigned int noreclaim_flag; > > + struct lruvec *lruvec; > > + struct mem_cgroup *memcg =3D get_mem_cgroup_from_mm(mm); > > > > if (list_empty(folio_list)) > > return nr_reclaimed; > > @@ -2764,10 +2766,14 @@ unsigned long reclaim_pages(struct list_head *f= olio_list) > > } > > > > nr_reclaimed +=3D reclaim_folio_list(&node_folio_list, NO= DE_DATA(nid)); > > + lruvec =3D &memcg->nodeinfo[nid]->lruvec; > > + workingset_age_nonresident(lruvec, -nr_reclaimed); > > nid =3D folio_nid(lru_to_folio(folio_list)); > > } while (!list_empty(folio_list)); > > > > nr_reclaimed +=3D reclaim_folio_list(&node_folio_list, NODE_DATA(= nid)); > > + lruvec =3D &memcg->nodeinfo[nid]->lruvec; > > + workingset_age_nonresident(lruvec, -nr_reclaimed); > > The task might have moved cgroups in between, who knows what kind of > artifacts it will introduce if you wind back the wrong clock. > > If there are reclaim passes that shouldn't participate in non-resident > tracking, that should be plumbed through the stack to __remove_mapping > (which already has that bool reclaimed param to not record entries).