From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 814D2C5478C for ; Fri, 23 Feb 2024 22:20:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B66BA6B0072; Fri, 23 Feb 2024 17:20:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B17026B0074; Fri, 23 Feb 2024 17:20:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DF826B0075; Fri, 23 Feb 2024 17:20:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8DFAC6B0072 for ; Fri, 23 Feb 2024 17:20:50 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6C2D5A1280 for ; Fri, 23 Feb 2024 22:20:50 +0000 (UTC) X-FDA: 81824489460.22.EBA2F9C Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf17.hostedemail.com (Postfix) with ESMTP id ADC154000F for ; Fri, 23 Feb 2024 22:20:48 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FtX4DOvZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708726848; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8tnf0UCnczzqZkJrcJnLBW756ktsgx2tF5puvvkvbzc=; b=MK7nG9WDj9ZCBCIQ5U1M+XliDnAV28U13HUifkrazEevi59XtKL90AGEhWzNCynkpzgNS4 DyWlWzEEaZWHFk3AXPdtjUkAuUkI0OkXJCsYTkVsJMaleOcD/5e93uLmfG8LiMV+mRtoT/ XE+oQxTpxy/JZCyp4I7KEtqofrrZUGk= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FtX4DOvZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708726848; a=rsa-sha256; cv=none; b=O/vNKbH0goX5nLsF8DA5h3BWs9iGZIQSFPAzwTwP5vV6HfbWeWteqBf5SndCWdNJoRZ/FN DaeRqANJPXtwQHe9njQHfMQCoO0YnzvPR7im8ignoTBQ+pVIqtYUT74B+qQTxeP1rhd0ZV B2cf7RqE0xTwGPMyJftgjbg6Ii0L9B0= Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-467e4a04086so400056137.3 for ; Fri, 23 Feb 2024 14:20:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708726848; x=1709331648; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8tnf0UCnczzqZkJrcJnLBW756ktsgx2tF5puvvkvbzc=; b=FtX4DOvZrOMgcYAnU7nYJn3EZiFWXutLjfO2Y5bqFbbh/p80mHAyQ6KZ8pGNlteqSv HyzcmgxCJsAyaKPvEU8DbmfrN3Uv6L4JXi3tvuX16tVZVpAQw2hrAmqn+Nu2SKgon9ww eTBv204wu14cpQx+Qhv37euePkvNzcEwmS57vZAO4MPSNO8HrmcFO00RqavqZFxKYIND 3j1R0EdtrG1fyY46jbEn1UMK3fZ77VdrxBluC7UKiObJzeDAFr0Ma71TWSCLO2WLPFMf sO766TUT5N/Qiz/4aHIQlaNRyiqZGeL5eLekdjjfCCPmlkh+sVVEtR/7hD3MxM2aflnG fWyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708726848; x=1709331648; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8tnf0UCnczzqZkJrcJnLBW756ktsgx2tF5puvvkvbzc=; b=gOv91lbZ33Rrj1Aq0rfZJBA6tPwUUBCNq5WMU9wo4zsXMzEy1ioSB/3bmzCQ8JZfwF zJWiYDGlO6rsq5mLJJ+U/TybChuJs4YXt82Hp7w3Si8s3Q4Kb+EEuQFT0HD2ETlX29kC VV3mm3viwXm5ZY6isBt6iPpzc4q2sT1OSsZQT/njxmifyHbm3/Rn2TFSfHpCpXwAUXR5 mf9TUuN8CvCDEi5CYdX1K2di39QTVkIDksJxUIwz92cir2cI6Pf0ugdvCq1Saq2KWuSn iWk5olGZ34Li06IsdfIvXtXN0HdhoCMM8NDFSNjAUiaf0EVKmMXBP/QDAhja6g0r5Xks +5yw== X-Forwarded-Encrypted: i=1; AJvYcCXH75hHFp9WBzYBi6sxDc+WMwZeRVmjM1OBhFMsITYG2GFRMlhAEM2+zQ9KmMxD1zLDwMZXVM6gjV/4gDkN0j0N1L8= X-Gm-Message-State: AOJu0Yx07CbNqbS261QsNWl4itnj8YN0CCDuYV0bLwBB34kk26DwsBbp G8WPasV7nD8zXNS1tDGLPYv85DsghEMKALdg7W+f7JEKFWXeY08TPhVumjfbjcbnCJPyJm2q5xW AIdniuFYOY98aBwx9Kw+tT7Ih7HFkg3XemtDaig== X-Google-Smtp-Source: AGHT+IEiiRffkuV3Tls4JY0NY5ew2NENgDZ86ddM/QxqIYcfgucw/QBar9EBLUhesx3Qketwm+17dagWsta+mWFLke4= X-Received: by 2002:a05:6102:3964:b0:471:54c:81dd with SMTP id ho4-20020a056102396400b00471054c81ddmr1151432vsb.17.1708726847671; Fri, 23 Feb 2024 14:20:47 -0800 (PST) MIME-Version: 1.0 References: <20240223041550.77157-1-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Sat, 24 Feb 2024 11:20:36 +1300 Message-ID: Subject: Re: [PATCH RFC] mm: madvise: pageout: ignore references rather than clearing young To: Minchan Kim Cc: sj@kernel.org, akpm@linux-foundation.org, damon@lists.linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.com, hannes@cmpxchg.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: ADC154000F X-Stat-Signature: mwe7czewdq47uw7hee9hzs1gcniz8wcr X-HE-Tag: 1708726848-973373 X-HE-Meta: U2FsdGVkX19i5Nn0mBHOexoEjccO02l7Hfgv30Eqg7lF3YN2bhKMGS22OtWDZlopb+CiH6BA8UDPbE9J8IS9eFP/mIAkFj8db/Sh5nbQKP8XIKiM0nq9gkb1NSvUPTfbcm9AIfgrS9ehmxX8GWEXh9qeqbUB1FJefxFAccmlAFEjUCQlQemYeDyZnEiyJ+ZnZzhrZk77MgqzwCa3Q1FE5KUVeT0D2LjaGmIK+BvKqvlfTVXxPnrHWC8cpqzXxL3NfFD1+ju1pK6W90+Ccogix+kNqUNmw5432CQd//Gf43NZp96zRYv9zEkewaIn3TBQuAvl0Fyplvota50AB/fd5CWbdmkqtEZrTma79dqlS35URIPxhCTj5w7atd2C4Og5SQ8YSi5biU/+4w2Cf7ORt7Ui2/tsAjVy6HrMTUOrUiwDDXuAmCZXkLOUdxsHJtFBcbKu6UHz0GOMgPMyIgxrAJpNG1Tq0kMy+X4xbn+n08sOS5Q3HyyErmG3R21jveZj+wg6z9wft+UBb8St3seD2DrcUFdvW5HhefuoLa0CBJ3OiW87V2OltrYL0k0g+9a0lgzPtOMzntBPtFtFTva2NmBVzbpDBR9vTSl3bnQezwDvSyV7FVIypo2d5h+7H8PlI6sZk8FOUbG75iGIzY2jT9UT6CvOsaGETyVX6iwbhJtuf3GurCkX8eE7UfnBWl8wb3yI+apcDJDBPnzPMZhz3pqUrLN54mgX7+DjIOgGIhkkf2fPDReP5Gfxn596UtT/tUD/BTEV4WL73voeuAQLdaNXIdk/zQRxwZtPzfNrVzVnr4HYRNNySTjcYNzTkEpiFjaYtiEQ04o3m63btL79bgDHRA5jbLYpAhwEMSQ7LAoispH/SepKTOmz1GN8FJ4efxOGqMIYjS4jbjPmGfk9UiTwji5ICvh2S05gty9ldJWOa2wSd1nyMm5c5iHweS18f/JvnGKZYQL4MONSxXd xc3xFZab kq9c0PJNvxRxXknmhbR9irR2z0x49HssDo/Xl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Feb 24, 2024 at 11:09=E2=80=AFAM Minchan Kim w= rote: > > Hi Barry, > > On Fri, Feb 23, 2024 at 05:15:50PM +1300, Barry Song wrote: > > From: Barry Song > > > > While doing MADV_PAGEOUT, the current code will clear PTE young > > so that vmscan won't read young flags to allow the reclamation > > of madvised folios to go ahead. > > Isn't it good to accelerate reclaiming? vmscan checks whether the > page was accessed recenlty by the young bit from pte and if it is, > it doesn't reclaim the page. Since we have cleared the young bit > in pte in madvise_pageout, vmscan is likely to reclaim the page > since it wouldn't see the ferencecd_ptes from folio_check_references. right, but the proposal is asking vmscan to skip the folio_check_references if this is a PAGEOUT. so we remove both pte_clear_young and rmap of folio_check_references. > > Could you clarify if I miss something here? guest you missed we are skipping folio_check_references now. we remove both, thus, make MADV_PAGEOUT 6% faster. > > > > It seems we can do it by directly ignoring references, thus we > > can remove tlb flush in madvise and rmap overhead in vmscan. > > > > Regarding the side effect, in the original code, if a parallel > > thread runs side by side to access the madvised memory with the > > thread doing madvise, folios will get a chance to be re-activated > > by vmscan. But with the patch, they will still be reclaimed. But > > this behaviour doing PAGEOUT and doing access at the same time is > > quite silly like DoS. So probably, we don't need to care. > > > > A microbench as below has shown 6% decrement on the latency of > > MADV_PAGEOUT, > > > > #define PGSIZE 4096 > > main() > > { > > int i; > > #define SIZE 512*1024*1024 > > volatile long *p =3D mmap(NULL, SIZE, PROT_READ | PROT_WRITE, > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > > > for (i =3D 0; i < SIZE/sizeof(long); i +=3D PGSIZE / sizeof(long)= ) > > p[i] =3D 0x11; > > > > madvise(p, SIZE, MADV_PAGEOUT); > > } > > > > w/o patch w/ patch > > root@10:~# time ./a.out root@10:~# time ./a.out > > real 0m49.634s real 0m46.334s > > user 0m0.637s user 0m0.648s > > sys 0m47.434s sys 0m44.265s > > > > Signed-off-by: Barry Song > > --- > > mm/damon/paddr.c | 2 +- > > mm/internal.h | 2 +- > > mm/madvise.c | 8 ++++---- > > mm/vmscan.c | 12 +++++++----- > > 4 files changed, 13 insertions(+), 11 deletions(-) > > > > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c > > index 081e2a325778..5e6dc312072c 100644 > > --- a/mm/damon/paddr.c > > +++ b/mm/damon/paddr.c > > @@ -249,7 +249,7 @@ static unsigned long damon_pa_pageout(struct damon_= region *r, struct damos *s) > > put_folio: > > folio_put(folio); > > } > > - applied =3D reclaim_pages(&folio_list); > > + applied =3D reclaim_pages(&folio_list, false); > > cond_resched(); > > return applied * PAGE_SIZE; > > } > > diff --git a/mm/internal.h b/mm/internal.h > > index 93e229112045..36c11ea41f47 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -868,7 +868,7 @@ extern unsigned long __must_check vm_mmap_pgoff(st= ruct file *, unsigned long, > > unsigned long, unsigned long); > > > > extern void set_pageblock_order(void); > > -unsigned long reclaim_pages(struct list_head *folio_list); > > +unsigned long reclaim_pages(struct list_head *folio_list, bool ignore_= references); > > unsigned int reclaim_clean_pages_from_list(struct zone *zone, > > struct list_head *folio_list)= ; > > /* The ALLOC_WMARK bits are used as an index to zone->watermark */ > > diff --git a/mm/madvise.c b/mm/madvise.c > > index abde3edb04f0..44a498c94158 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -386,7 +386,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, > > return 0; > > } > > > > - if (pmd_young(orig_pmd)) { > > + if (!pageout && pmd_young(orig_pmd)) { > > pmdp_invalidate(vma, addr, pmd); > > orig_pmd =3D pmd_mkold(orig_pmd); > > > > @@ -410,7 +410,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, > > huge_unlock: > > spin_unlock(ptl); > > if (pageout) > > - reclaim_pages(&folio_list); > > + reclaim_pages(&folio_list, true); > > return 0; > > } > > > > @@ -490,7 +490,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, > > > > VM_BUG_ON_FOLIO(folio_test_large(folio), folio); > > > > - if (pte_young(ptent)) { > > + if (!pageout && pte_young(ptent)) { > > ptent =3D ptep_get_and_clear_full(mm, addr, pte, > > tlb->fullmm); > > ptent =3D pte_mkold(ptent); > > @@ -524,7 +524,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, > > pte_unmap_unlock(start_pte, ptl); > > } > > if (pageout) > > - reclaim_pages(&folio_list); > > + reclaim_pages(&folio_list, true); > > cond_resched(); > > > > return 0; > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 402c290fbf5a..ba2f37f46a73 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -2102,7 +2102,8 @@ static void shrink_active_list(unsigned long nr_t= o_scan, > > } > > > > static unsigned int reclaim_folio_list(struct list_head *folio_list, > > - struct pglist_data *pgdat) > > + struct pglist_data *pgdat, > > + bool ignore_references) > > { > > struct reclaim_stat dummy_stat; > > unsigned int nr_reclaimed; > > @@ -2115,7 +2116,7 @@ static unsigned int reclaim_folio_list(struct lis= t_head *folio_list, > > .no_demotion =3D 1, > > }; > > > > - nr_reclaimed =3D shrink_folio_list(folio_list, pgdat, &sc, &dummy= _stat, false); > > + nr_reclaimed =3D shrink_folio_list(folio_list, pgdat, &sc, &dummy= _stat, ignore_references); > > while (!list_empty(folio_list)) { > > folio =3D lru_to_folio(folio_list); > > list_del(&folio->lru); > > @@ -2125,7 +2126,7 @@ static unsigned int reclaim_folio_list(struct lis= t_head *folio_list, > > return nr_reclaimed; > > } > > > > -unsigned long reclaim_pages(struct list_head *folio_list) > > +unsigned long reclaim_pages(struct list_head *folio_list, bool ignore_= references) > > { > > int nid; > > unsigned int nr_reclaimed =3D 0; > > @@ -2147,11 +2148,12 @@ unsigned long reclaim_pages(struct list_head *f= olio_list) > > continue; > > } > > > > - nr_reclaimed +=3D reclaim_folio_list(&node_folio_list, NO= DE_DATA(nid)); > > + nr_reclaimed +=3D reclaim_folio_list(&node_folio_list, NO= DE_DATA(nid), > > + ignore_references); > > nid =3D folio_nid(lru_to_folio(folio_list)); > > } while (!list_empty(folio_list)); > > > > - nr_reclaimed +=3D reclaim_folio_list(&node_folio_list, NODE_DATA(= nid)); > > + nr_reclaimed +=3D reclaim_folio_list(&node_folio_list, NODE_DATA(= nid), ignore_references); > > > > memalloc_noreclaim_restore(noreclaim_flag); > > > > -- > > 2.34.1 > > Thanks Barry