From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75C06C46CD2 for ; Tue, 2 Jan 2024 05:37:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7D696B027E; Tue, 2 Jan 2024 00:37:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2E856B027F; Tue, 2 Jan 2024 00:37:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F5F26B0280; Tue, 2 Jan 2024 00:37:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7EB686B027E for ; Tue, 2 Jan 2024 00:37:11 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4F7614073C for ; Tue, 2 Jan 2024 05:37:11 +0000 (UTC) X-FDA: 81633262662.08.19F368B Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf29.hostedemail.com (Postfix) with ESMTP id 4E7FE120005 for ; Tue, 2 Jan 2024 05:37:09 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ork3msdQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704173829; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ITtNDvna4dWzdrbqw+gkqQjeHGVu03qWnYZf7TFHwEY=; b=gepBbq7ttRLv6+Gl7DE37BWYmqKG1PH6xh8ervSv0Ti0NbAn6ZIzify82FSCzgbHHGDpJz X67wlHFmbe+eM8QGWsXDJS2gJSgCKGmkPQ5vKelBk5MbJov8JfF3Dwij2iDRgy+1u7Dfd3 NQ2VwOR8+NBiET4JWj3RmRLXgQdh/eg= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ork3msdQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704173829; a=rsa-sha256; cv=none; b=JX34W9HHbChQ316CklBRSlElnAbs5HzAimkRRdQ4DoTDlBDOPaNcy406LaQX3yQiVhttC8 FIRyKqKXcKxNnaiOdPSrsp91Hgzxp66lJN0H+EbqB1YP4LPOc28bN7jMdwtbkuNuajVf2g xuGp0zyXpNA8PInrwes2VQkMNqj1GFc= Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-50e7dff3e9fso6190492e87.2 for ; Mon, 01 Jan 2024 21:37:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704173827; x=1704778627; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ITtNDvna4dWzdrbqw+gkqQjeHGVu03qWnYZf7TFHwEY=; b=Ork3msdQmvXI0ylZyLCNJh2A1MBZmxtUj3OUfLvyJlaFqNPu36B15M3QkhVLm3s19k hAhZs0mLQfHayRkHkhQ62STaXm+ZTvyF52InTKsLLF52ZHTBO3G8CS6eau/pOzf9YbdI mryjoShzLPt516/585cxAnZZ2yAnSor31EcnbJf2TgIgPF0nmFRQAKJ45sztY7N0BeNS nVDdjrC6M1pnM/wD1zvER0lhZxKOb980eD4a7gfs3h1mQu2ySfQKfMjAHjb2HmxZ6ZCQ gVC3/nZEwh045h93IfVy1gkqY86WBQ+jK7WRCScaxRwkGANbKLLFF7SjlsEDRd9Chjfs cc6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704173827; x=1704778627; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ITtNDvna4dWzdrbqw+gkqQjeHGVu03qWnYZf7TFHwEY=; b=o8SuGwTcDAUuEZH5Yz8VaP9v4zFOpTKpPeq3WcrcTUKvIroEQshnYpRvPef9s74DAL m3ywK/KPcn7Cmr3kjcCLhjhKvoB3TOqBGwiwCGEMbDLkJepUEjcIJfc8NTY/1xAz60Pr 25NOlqhcRzXSl01gd5Vz3iG8Um6gZCTxPhRl1U1lH3PZAPzUGfihMNtlo6kLgZ7FPQcW 2SbfGUC4nwckebQFvk2w4fRioq5BlCGZd57ta/aKv2RDUKFbp2U7/5ujORcYFs9NlLif Sk2qNdXrXw5JzvXOFPuSbpe9oGnVW59h4diJJRnp3+6oL4TAFhEb9/8cTGRSYlPPLhzh lNgg== X-Gm-Message-State: AOJu0YwwJENHfyOKcuLaxpFxbngf2R1PVSzRguYbM1IQudRwp2DF1h74 gHMISiJoCR0srIP5reulZ6SjPsDeC2WB6mK5qYU= X-Google-Smtp-Source: AGHT+IENiGuYGTcIJmnwCw/+D/Z4Do1c/XEczOlo7WqjAn3raCZaCSE8v6WBJCdGxRX1PfzjL1lncLa/fK8NqOlvcPQ= X-Received: by 2002:a05:6512:e9f:b0:50e:7b20:3092 with SMTP id bi31-20020a0565120e9f00b0050e7b203092mr5042850lfb.33.1704173827124; Mon, 01 Jan 2024 21:37:07 -0800 (PST) MIME-Version: 1.0 References: <20231220102948.1963798-1-zhaoyang.huang@unisoc.com> <1703226522421.22653@unisoc.com> <1703238070775.29652@unisoc.com> In-Reply-To: From: Zhaoyang Huang Date: Tue, 2 Jan 2024 13:36:55 +0800 Message-ID: Subject: Re: reply: reply: [RFC PATCH 1/1] mm: mark folio accessed in minor fault To: Yu Zhao Cc: =?UTF-8?B?6buE5pyd6ZizIChaaGFveWFuZyBIdWFuZyk=?= , Matthew Wilcox , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?UTF-8?B?5bq357qq5ruoIChTdGV2ZSBLYW5nKQ==?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4E7FE120005 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 7pe7tj1pq7uichazckgn6dg3s33hktrw X-HE-Tag: 1704173829-111299 X-HE-Meta: U2FsdGVkX199+ebUZ2smm+7Wdjgq0fJt+Pvf7wjy6khus7nsoWJeLDXMubJPNtrEBSI/ZPfGM7Hvz2aZ00MG1hLUDz69nGPn7uWydFoEKLzQrjS8RFKY1IkVcM3vhgcCfHL3RDSjxSFNvn11HntOsKsRbiKJymWDG10oIHy2C1ytOuSmqcsMfhLrlRFebjX5KUufmyqdNJi+PNUWc/3xxLQ0f2ud7bEdJvM43L3z1VcU4zjJEwvJ25RloYFELo+moL0wNDmxybZU68MT8lyIibz75Rz3d2nAn0ibavWM6Toi4qZcf7/e14usGGJ8M/jhqIf4eAT/sT7ep7c54Fmn/hQOxMl+QEtRcPGt5cv3GyFXJUsv6i5cMxnbrBc76MrSZExv8xokOiStnGbjzh2ujSNIRlSFfJjhw3j8kYgX2pJsfTFVsZfjzCPHLGoC9y5h6G2cqP+I4wc3fgifXJesS1xi2olxANuRet1wI7rgJB3GByBTThagezlmHHDwIAi/0yy9Ax601SX1xE3BStLZGaVDf2epnbhV5wBYbJMI5/8C+++a671DaMybsNbx/olomREjbKvArJGJVMy5oRLnSqTIpHuBOQh27FeUxEZ0NaN4suf/fGE9BWJ43N0/JGOkNJ6IDNQTR2xapYuspiahXvk05o1k5ZGEzijcniRR7gAuolpn9vws2W9pvbg01Dq+mgvltTTN5edaqItQMjHJteJA38SYib41GBhpuElrbV++2pg/iNXkb5pwc4X627mi+Lq9rtPka+r1iJ50xpZo4cLf/xaZAhY+o8FXHH/oj4Pm4aUTLk63gfS+OapxMYDIvvUkoROFXRtRkwUIsPWPHMcEYaWU14McKZcLSTu8Y8txADISMqq4/EWjITmZCVJ0A+sLUjOmpopkhs5KUGAcp4/pyIzjdmOht9V7KcyS2Uvme6DxGm8cams1JiLvG0Seyq8cr3LExdseoNb+Zxi 50iuEBej KrLBp4/IkXiyvxr3X85no7ozLKj8uijNEXTTitaNfU6EYpu2fdCsr5LNfPKGGki0fASBJVf0eOX7pAugWvAkAAjx0tofeyFJhJO76quJTEvei3qa8tTNltDHRUjmHpfQnBlA4InSAA41A5SzimSqCYRbeJVDrpUduj8OUGPgWcSJl5KQRAN8N+GB4DxTl0MHVud15166jkpRPiO6PFTmCscdByaw/XT/UHPDIDRbcB1J+KHMch/+31Ycq+KBy8oG+XQlM6FwwE3HugzVSQosAZHIUNam7s9qrgDmw2i1np7qSeop8YFClBpUO+2hwI9/ZuGJ2nj1BFPROxFx4PHwTtiIr2thL3kS5j7+E7lfk6++oze46v5tsIusbsI/4LteEwcaaInbv5OtB7KUYKXEYbftUcr7QuhrhHhzBVq0KXUPXXc0OPbvacOePjmupvyJQpdnRk1QJH0WfN687yhySrP3SwzyuI9U+vENOn903K+4oStwovKGg3FWCB4PXdrN8VAlPo5bnW3sEbwwn+3+NZICmjNmaI7n3daHboSJiUg71lLvUXSFbYQ3uYQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I update the patch on v515 as below[1], which calls mark_page_accessed(mapped_page) in filemap_map_pages and filemap_fault to promote none-single-use pages(by waiving major fault) earlier than the first scan does now. The patch is verified on a 2G RAM based android system by the test script, which loops the following 4 steps[2] for 30 times and measures the numbers of trace event "mm_filemap_add_to_page_cache/mm_filemap_delete_from_page_ca= che" which can be deemed as page cache retention rate(could be thrashing rate also). In terms of the test result, RFC could save 10% of the add_to page in each phase and get a better APP start time(decreased 5%) . [1] diff --git a/mm/filemap.c b/mm/filemap.c index 279380c..308e415 100644 +++ b/mm/filemap.c @@ -3124,6 +3125,7 @@ filemap_invalidate_lock_shared(mapping); mapping_locked =3D true; } + mark_page_accessed(page); } else { /* No page in the page cache at all */ count_vm_event(PGMAJFAULT); @@ -3147,7 +3149,7 @@ goto out_retry; filemap_invalidate_unlock_shared(mapping); return VM_FAULT_OOM; } } if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) @@ -3388,8 +3390,10 @@ /* We're about to handle the fault */ if (vmf->address =3D=3D addr) ret =3D VM_FAULT_NOPAGE; + if (page_mapcount(page)) + mark_page_accessed(page); do_set_pte(vmf, page, addr); /* no need to invalidate: a not-present page won't be cached */ update_mmu_cache(vma, addr, vmf->pte); [2] 1. start an APP 2. malloc and mlock 512MB pages 3. restart the APP 4. kill the APP [3] v515 RFC add_to delete_from add_to delete_from 1. start an APP 41290 88279 32235 79305 2. malloc and mlock 512MB pages 342103 374396 304310 339048 3. restart the APP 46552 162456 42279 176368 4. kill the APP On Sat, Dec 23, 2023 at 10:41=E2=80=AFAM Yu Zhao wrote: > > On Fri, Dec 22, 2023 at 2:41=E2=80=AFAM =E9=BB=84=E6=9C=9D=E9=98=B3 (Zhao= yang Huang) > wrote: > > > > > > > > On Fri, Dec 22, 2023 at 2:45=E2=80=AFPM Yu Zhao wro= te: > > > > > > On Thu, Dec 21, 2023 at 11:29=E2=80=AFPM =E9=BB=84=E6=9C=9D=E9=98=B3 = (Zhaoyang Huang) > > > wrote: > > > > > > > > > > > > On Thu, Dec 21, 2023 at 10:53=E2=80=AFPM Zhaoyang Huang wrote: > > > > > > > > > > On Thu, Dec 21, 2023 at 2:33=E2=80=AFPM Yu Zhao wrote: > > > > > > > > > > > > On Wed, Dec 20, 2023 at 11:28=E2=80=AFPM Zhaoyang Huang wrote: > > > > > > > > > > > > > > On Thu, Dec 21, 2023 at 12:53=E2=80=AFPM Yu Zhao wrote: > > > > > > > > > > > > > > > > On Wed, Dec 20, 2023 at 9:09=E2=80=AFPM Matthew Wilcox wrote: > > > > > > > > > > > > > > > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang = wrote: > > > > > > > > > > On Wed, Dec 20, 2023 at 10:14=E2=80=AFPM Matthew Wilcox= wrote: > > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.hu= ang wrote: > > > > > > > > > > > > From: Zhaoyang Huang > > > > > > > > > > > > > > > > > > > > > > > > Inactive mapped folio will be promoted to active on= ly when it is > > > > > > > > > > > > scanned in shrink_inactive_list, while the vfs foli= o will do this > > > > > > > > > > > > immidiatly when it is accessed. These will introduc= e two affections: > > > > > > > > > > > > > > > > > > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected. > > > > > > > > > > > > 2. Low reclaiming efficiency caused by dummy nactiv= e folio which should > > > > > > > > > > > > be kept as earlier as shrink_active_list. > > > > > > > > > > > > > > > > > > > > > > > > I would like to suggest mark the folio be accessed = in minor fault to > > > > > > > > > > > > solve this situation. > > > > > > > > > > > > > > > > > > > > > > This isn't going to be as effective as you imagine. = Almost all file > > > > > > > > > > > faults are handled through filemap_map_pages(). So I= must ask, what > > > > > > > > > > > testing have you done with this patch? > > > > > > > > > > > > > > > > > > > > > > And while you're gathering data, what effect would th= is patch have on your > > > > > > > > > > > workloads? > > > > > > > > > > Thanks for heads-up, I am out of date for readahead mec= hanism. My goal > > > > > > > > > > > > > > > > > > It's not a terribly new mechanism ... filemap_map_pages()= was added nine > > > > > > > > > years ago in 2014 by commit f1820361f83d > > > > > > > > > > > > > > > > > > > is to have mapped file pages behave like other pages wh= ich could be > > > > > > > > > > promoted immediately when they are accessed. I will upd= ate the patch > > > > > > > > > > and provide benchmark data in new patch set. > > > > > > > > > > > > > > > > > > Understood. I don't know the history of this, so I'm not= sure if the > > > > > > > > > decision to not mark folios as accessed here was intentio= nal or not. > > > > > > > > > I suspect it's entirely unintentional. > > > > > > > > > > > > > > > > It's intentional. For the active/inactive LRU, all folios s= tart > > > > > > > > inactive. The first scan of a folio transfers the A-bit (if= it's set > > > > > > > > during the initial fault) to PG_referenced; the second scan= of this > > > > > > > > folio, if the A-bit is set again, moves it to the active li= st. This > > > > > > > > way single-use folios, i.e., folios mapped for file streami= ng, can be > > > > > > > > reclaimed quickly, since they are "demoted" rather than "pr= omoted" on > > > > > > > > the second scan. This RFC would regress memory streaming wo= rkloads. > > > > > > > Thanks. Please correct me if I am wrong. IMO, there will be n= o > > > > > > > minor-fault for single-use folios > > > > > > > > > > > > Why not? What prevents a specific *access pattern* from trigger= ing minor faults? > > > > > Please find the following chart for mapped page state machine > > > > > transfication. > > > > > > > > > I'm not sure what you are asking me to look at -- is the followin= g > > > > > trying to illustrate something related to my question above? > > > > > > > > sorry for my fault on table generation, resend it, I am trying to p= resent how RFC performs in a page's stat transfer > > > > > > > > 1. RFC behaves the same as the mainline in (1)(2) > > > > 2. VM_EXEC mapped pages are activated earlier than mainline which h= elp improve scan efficiency in (3)(4) > > > > 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd= scan. > > > > > > > > (1) > > > > 1st access shrink_active_= list 1st scan(shink_folio_list) 2nd scan(shrink_folio_li= st') > > > > mainline INA/UNR NA = INA/REF DROP > > > > RFC INA/UNR NA = INA/REF DROP > > > > > > > I don't think this is the case -- with this RFC, *readahead* folios= , > > > > which are added into pagecache as INA/UNR, become PG_referenced upo= n > > > > the initial fault (first access), i.e., INA/REF. The first scan wil= l > > > > actually activate them, i.e., they become ACT/UNR, because they hav= e > > > > both PG_referenced and the A-bit. > > > No,Sorry for the confusion. This RFC actually aims at minor fault of > > > the faulted pages(with one pte setup). In terms of the readahead > > > pages, can we solve it by add one criteria as bellow, which unifies > > > all kinds of mapped pages in RFC. > > Again this is still wrong -- how do you know the other process mapping > this folio isn't also streaming the file? > > It'd be best to take a step back and think through my original > question: what prevents a specific *access pattern* from triggering > minor faults? The simple answer is that you can't. I agree with that and get more puzzled. The RFC's goal is that the more minor faults over the page the sooner it gets promoted as vfs pages do. It's intentional. For the active/inactive LRU, all folios start inactive. The first scan of a folio transfers the A-bit (if it's set during the initial fault) to PG_referenced; [RFC behaves the same as above] the second scan of this folio, if the A-bit is set again, moves it to the active list. [RFC is NOT against this but just let minor faults promote the page in adva= nce] This way single-use folios, i.e., folios mapped for file streaming, can be reclaimed quickly, since they are "demoted" rather than "promoted" on the second scan. This RFC would regress memory streaming workloads. > > > > @@ -3273,6 +3273,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > > */ > > > folio =3D filemap_get_folio(mapping, index); > > > if (likely(!IS_ERR(folio))) { > > > + /* > > > + * try to promote inactive folio here when it is acce= ssed > > > + * as minor fault > > > + */ > > > + if(folio_mapcount(folio)) > > > + folio_mark_accessed(folio); > > > /* > > > * We found the page, so try async readahead before wa= iting for > > > * the lock. > > > > > Please find bellow for the stat machine table of updated RFC, where RFC= behaves same or enhances the scan efficiency by promoting the page in shri= nk_active_list. > > > > (1) > > 1st access shrink_active_list= 1st scan(shink_folio_list) 2nd scan(shrink_folio_list') > > mainline INA/UNR NA = INA/REF DROP > > RFC INA/UNR NA = INA/REF DROP > > RA INA/UNR NA = INA/REF DROP > > > > (2) > > 1st access 2nd access= shrink_active_list 1st scan(shink_folio_list) > > mainline INA/UNR INA/UNR = NA ACT/REF > > RFC INA/UNR INA/REF = NA ACT/REF > > RA INA/UNR INA/REF = NA ACT/REF > > > > (3) > > 1st access 1st scan(shink_foli= o_list) 2nd access 2nd scan(shrink_active_list) 3rd scan(shr= ink_folio_list) > > mainline INA/UNR INA/REF = INA/REF NA = ACT/REF > > RFC INA/UNR INA/REF = ACT/REF ACT/REF = NA > > (VM_EXEC) > > RFC INA/UNR INA/REF = ACT/REF INA/REF = DROP > > (non VM_EXEC) > > RA INA/UNR INA/REF = INA/REF NA = ACT/REF > > > > (4) > > 1st access 2nd acce= ss 3rd access 1st scan(shrink_active_list) 2nd= scan(shink_folio_list) > > mainline INA/UNR INA/UNR = INA/UNR NA = ACT/REF > > RFC INA/UNR INA/REF = ACT/REF ACT/REF = NA > > (VM_EXEC) > > RFC INA/UNR INA/REF = ACT/REF ACT/REF = NA > > (Non VM_EXEC) > > RA INA/UNR INA/REF = ACT/REF ACT/REF = NA > > > > > > > > So it doesn't behave the same way the mainline does for the first c= ase > > > > you listed. (I didn't look at the rest of the cases.)