From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66245C4332F for ; Tue, 27 Dec 2022 01:15:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFA888E0002; Mon, 26 Dec 2022 20:15:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A84208E0001; Mon, 26 Dec 2022 20:15:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FD528E0002; Mon, 26 Dec 2022 20:15:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7ACFF8E0001 for ; Mon, 26 Dec 2022 20:15:14 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5880C1C50AB for ; Tue, 27 Dec 2022 01:15:14 +0000 (UTC) X-FDA: 80286317748.12.5E5919D Received: from mail-vs1-f41.google.com (mail-vs1-f41.google.com [209.85.217.41]) by imf08.hostedemail.com (Postfix) with ESMTP id BAAFD16000E for ; Tue, 27 Dec 2022 01:15:12 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672103712; a=rsa-sha256; cv=none; b=vh87uRYeqVWpLzSx1KsC48B/0b8CSMcJ/5P//fqRmQnXYOoLUg7eBWve2ev4N/HVShuYAk bURABiAaE5VSdObLQetwyswxXqAqwoq4JNHzMZVwQlUXKLo76dXRsxKIHZ0N7qHU5ZY5W+ BiyiHwmzFhdWSxVJPh/LcgubCrMQ+d4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672103712; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=faex0xXiRDvF8rchFxpPffM5E3POPyWTY89h33AM+l8=; b=xoXRaYFUWR8RowGBUP2/5INdHQ2ZAhrZPxJ1tm5cfQlPnpFKyoncJcqyCkMQEuBG/xwtx9 YgTtFJQ/YsTlA2FNGpSzeuPKhXbiP8/rZdYVpXi5GHZjiESEoZjHbpMK6Agt7oP3nb5ZI5 nYN86Ro3zes0oJvqNh2at5e9lXzURow= Received: by mail-vs1-f41.google.com with SMTP id p30so5822487vsr.1 for ; Mon, 26 Dec 2022 17:15:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=faex0xXiRDvF8rchFxpPffM5E3POPyWTY89h33AM+l8=; b=DYY5xCzENjOut4d8vPfMfofkYoErR44yoVt0aJFaSxKWpq2bOFX6y/AtNaBZjWMke/ SFqaWYRf3dqBf7Z8BL/jR7w9US23Q98FFgFrOmhM6zfBUpCKOXk2wY1s4a1TK6CUY6bA 255znPJZzwbVfg+VGHFERz4HfzJnDA/3LlqkcAFbqP/oujpJ2WMfT0WgOouxzd1ICy8m zY5E1gKmj9DF8KH41teeIMcyl4j2DFUlz1+iorSSlcvkMJQotZTFtsZYyYORDxy9OJoj EPYnrFCqQ3QBfG6NhCsNZ0NLMReBimRtL/wMLKtqC+A6YKyt5o6NBa4CljEhq2bYJ9rI 2HGQ== X-Gm-Message-State: AFqh2kp4rCvSON7F6LvD1hCEOskJQxMPYqhptwjwRkBzeaN1eNOkcIc2 xvQCkVedZEMnSvP9lvq3EZzIe+ILw2iQr0vyrFA= X-Google-Smtp-Source: AMrXdXsQEoNQdqXvggV9K83MAN604+y4o++hGoLdu4UK/xbTxgUY12EmSg5+JRaPKpHDfpovBIvHnqe8LoSMBmvPVWs= X-Received: by 2002:a05:6102:1041:b0:3c6:2426:2210 with SMTP id h1-20020a056102104100b003c624262210mr936689vsq.86.1672103711801; Mon, 26 Dec 2022 17:15:11 -0800 (PST) MIME-Version: 1.0 References: <20221220072743.3039060-1-shiyn.lin@gmail.com> <20221220072743.3039060-5-shiyn.lin@gmail.com> In-Reply-To: From: Barry Song Date: Tue, 27 Dec 2022 14:15:00 +1300 Message-ID: Subject: Re: [PATCH v3 04/14] mm/rmap: Break COW PTE in rmap walking To: Chih-En Lin Cc: Andrew Morton , Qi Zheng , David Hildenbrand , Matthew Wilcox , Christophe Leroy , John Hubbard , Nadav Amit , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Rostedt , Masami Hiramatsu , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Yang Shi , Peter Xu , "Zach O'Keefe" , "Liam R . Howlett" , Alex Sierra , Xianting Tian , Colin Cross , Suren Baghdasaryan , Pasha Tatashin , Suleiman Souhlal , Brian Geffon , Yu Zhao , Tong Tiangen , Liu Shixin , Li kunyu , Anshuman Khandual , Vlastimil Babka , Hugh Dickins , Minchan Kim , Miaohe Lin , Gautam Menghani , Catalin Marinas , Mark Brown , Will Deacon , "Eric W . Biederman" , Thomas Gleixner , Sebastian Andrzej Siewior , Andy Lutomirski , Fenghua Yu , Barret Rhoden , Davidlohr Bueso , "Jason A . Donenfeld" , Dinglan Peng , Pedro Fonseca , Jim Huang , Huichun Feng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: BAAFD16000E X-Stat-Signature: d8sqt5tzej3gfa8ay66jwwmbfyt537a7 X-HE-Tag: 1672103712-552651 X-HE-Meta: U2FsdGVkX1/GqJTicKPztKdlGmSboOkf+KZmNQyn/BvMgNi00k0YttcPNbmjEqiDs6gqQ3sRDlk97Mp5+5/MzS1ZNx3LEqCPwPPQKW6jP+Vzxx4RXHc03StmsBQEepE9sZG4D5KASD8z4in1g/6G9IH+egS8uHkfATbMZxRFqGoccIAjU1SDQLepFHyA7LUQeMk1mAftCLwRxC3TJX8D5I2Lg9pGUl1hUJEudbyj23vuhMiTpZjluz7cttdbsYJL4cfLrHBsfZGa8y+d4myEou+QDMEvBvRmLHRBVAc+6Q5C0QA3Y/j4mOwBG2ppJb8wVjkgPiFB6JVou1q5xt4NtJvqOPzO7KacmLC2MiivcceZXHV2k0cJJOmVyjJzjmvBKWlvCp0CADCxvTIICTwaQhUTL+p+ldgYx4r66iLwPwagbwnVe7jtGiFMjYn62dSeBVUMCSrPpBGzQ91ukSp6HyxjHlHKMO5w2/hkC/jot7jGw0bI3hL99YzW9HgLw2P++CNuItTSWBXZIkeX+pykK40eQ/hNPj17A6vQqcdOXZCALdMXAe2bF5SYyonmqIBA4rrhKDrZs7uLZ73GKcUYphKycNP62qlMeMY3yBnBjToHPQ71uehm8LgbDBoMnepOKv8Bvr0yjfQQ1o8xfjJIPgYVo0cv1faR09GDfuCZRGG4YIq+5CDbJQZMk55OVEbJGnXaeARbGbVc5ihTlMT/LAdmRsnAn3cqh8w0bSVqXXdurZT2jVIpAVjY3Xs8SbN/pBlVQ1Zw+PyI2vwGOhNIZMn+kbhJAq0NxcklyXFiUNBXUlTcVmawJHQznhZLHWGM0uaAOnhALOOIkDg1P1bYWyMhILREY8RzypUV7csGPssnXf19yWy8yHLrMXdE71ETWCSMDqKUbsqjlqQJkJpM8sjKZsjZs5cKtci/CwRz9vucW5SMlW6q8y3JZkoJQjSBkZuM7PhzJiJLHQzGgv5 fruiN2Ux RzOfPh+ufy6FWUOU6b+gES7PzAIHCa4x3ghup0GgAp5gwM097cP/RoXHh7loc1eHX1RdaSQtx5lBrZIefZztnhZGOty+B3T9E6cSVtmXIiXvMMxdwMPBa66LRtFKv/r51GReSPwEmzEIZKHDMrIEWFxwIVrjhB2S2P+mShvZ05B/pPM4TL2R0zXm0fko3AiY7EWuC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 26, 2022 at 11:56 PM Chih-En Lin wrote: > > On Mon, Dec 26, 2022 at 10:40:49PM +1300, Barry Song wrote: > > On Tue, Dec 20, 2022 at 8:25 PM Chih-En Lin wrote= : > > > > > > Some of the features (unmap, migrate, device exclusive, mkclean, etc) > > > might modify the pte entry via rmap. Add a new page vma mapped walk > > > flag, PVMW_BREAK_COW_PTE, to indicate the rmap walking to break COW P= TE. > > > > > > Signed-off-by: Chih-En Lin > > > --- > > > include/linux/rmap.h | 2 ++ > > > mm/migrate.c | 3 ++- > > > mm/page_vma_mapped.c | 2 ++ > > > mm/rmap.c | 12 +++++++----- > > > mm/vmscan.c | 7 ++++++- > > > 5 files changed, 19 insertions(+), 7 deletions(-) > > > > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > > > index bd3504d11b155..d0f07e5519736 100644 > > > --- a/include/linux/rmap.h > > > +++ b/include/linux/rmap.h > > > @@ -368,6 +368,8 @@ int make_device_exclusive_range(struct mm_struct = *mm, unsigned long start, > > > #define PVMW_SYNC (1 << 0) > > > /* Look for migration entries rather than present PTEs */ > > > #define PVMW_MIGRATION (1 << 1) > > > +/* Break COW-ed PTE during walking */ > > > +#define PVMW_BREAK_COW_PTE (1 << 2) > > > > > > struct page_vma_mapped_walk { > > > unsigned long pfn; > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > index dff333593a8ae..a4be7e04c9b09 100644 > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -174,7 +174,8 @@ void putback_movable_pages(struct list_head *l) > > > static bool remove_migration_pte(struct folio *folio, > > > struct vm_area_struct *vma, unsigned long addr, void = *old) > > > { > > > - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_= MIGRATION); > > > + DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, > > > + PVMW_SYNC | PVMW_MIGRATION | PVMW_BREAK= _COW_PTE); > > > > > > while (page_vma_mapped_walk(&pvmw)) { > > > rmap_t rmap_flags =3D RMAP_NONE; > > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > > > index 93e13fc17d3cb..5dfc9236dc505 100644 > > > --- a/mm/page_vma_mapped.c > > > +++ b/mm/page_vma_mapped.c > > > @@ -251,6 +251,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_= walk *pvmw) > > > step_forward(pvmw, PMD_SIZE); > > > continue; > > > } > > > + if (pvmw->flags & PVMW_BREAK_COW_PTE) > > > + break_cow_pte(vma, pvmw->pmd, pvmw->address); > > > if (!map_pte(pvmw)) > > > goto next_pte; > > > this_pte: > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index 2ec925e5fa6a9..b1b7dcbd498be 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > @@ -807,7 +807,8 @@ static bool folio_referenced_one(struct folio *fo= lio, > > > struct vm_area_struct *vma, unsigned long address, vo= id *arg) > > > { > > > struct folio_referenced_arg *pra =3D arg; > > > - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); > > > + /* it will clear the entry, so we should break COW PTE. */ > > > + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_BREAK_C= OW_PTE); > > > > what do you mean by breaking cow pte? in memory reclamation case, we ar= e only > > checking and clearing page referenced bit in pte, do we really need to > > break cow? > > Since we might clear page referenced bit, it will modify the write > protection shared page table (COW-ed PTE). We should duplicate it. > > Actually, I didn=E2=80=99t break COW at first because it will conditional= ly > modify the table and only clear the referenced bit. > So, if clearing page referenced bit is fine to the COW-ed PTE table > and the break COW PTE is unnecessary here, we can remove it. if a page is mapped by 100 processes and anyone of these 100 processes access this page, we will get a reference bit in the PTE. Otherwise, we wil= l have to scan 100 PTEs to figure out if a page is accessed and should be kept in LRU. i don't see the fundamental necessity to duplicate PTE only because of clea= ring the reference bit. as keeping the pte shared will help save a lot of cost f= or memory reclamation for those CPUs which have hardware reference bits in PTE. > > Thanks, > Chih-En Lin Thanks barry