From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 290FCC433EF for ; Wed, 20 Oct 2021 05:48:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B19366137C for ; Wed, 20 Oct 2021 05:48:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B19366137C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7A003900002; Wed, 20 Oct 2021 01:47:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 700116B0074; Wed, 20 Oct 2021 01:47:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52F3A900002; Wed, 20 Oct 2021 01:47:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id 33E466B0074 for ; Wed, 20 Oct 2021 01:47:58 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E75BE20BFC for ; Wed, 20 Oct 2021 05:47:57 +0000 (UTC) X-FDA: 78715734594.14.ED05238 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf04.hostedemail.com (Postfix) with ESMTP id 381A050000B4 for ; Wed, 20 Oct 2021 05:47:53 +0000 (UTC) IronPort-Data: =?us-ascii?q?A9a23=3AEOAS+K1vFuavyONW+PbD5bhwkn2cJEfYwER7XOP?= =?us-ascii?q?LsXnJhzhz1mMCmGUbUGnSPqmDNmf8f4sgaom+pxwDsJTdztU2QQE+nZ1PZygU8?= =?us-ascii?q?JKaX7x1DatR0xu6d5SFFAQ+hyknQoGowPscEzmM+X9BDpC79SMljPnSHuKlYAL?= =?us-ascii?q?5EnsZqTFMGX5JZS1Ly7ZRbr5A2bBVMivV0T/Ai5S31GyNh1aYBlkpB5er83uDi?= =?us-ascii?q?hhdVAQw5TTSbdgT1LPXeuJ84Jg3fcldJFOgKmVY83LTegrN8F251juxExYFAdX?= =?us-ascii?q?jnKv5c1ERX/jZOg3mZnh+AvDk20Yd4HdplPtT2Pk0MC+7jx2Tgtl308QLu5qrV?= =?us-ascii?q?S8nI6/NhP8AFRJfFkmSOIUfoeGefCLg4Jz7I0ruNiGEL+9VJE0/I4wU0uhtBmR?= =?us-ascii?q?J7/YZNHYGaRXrr+K9wJq6TOd2j8guJcWtO5kQ0llsxDefD7A5QJTHQqzP/vdZ2?= =?us-ascii?q?is9goZFGvO2T8Ybdj1pYzzDbgdJN1NRD4gx9M+sh3/iY3hdrXqWu6M84C7U1gM?= =?us-ascii?q?Z+L7zPNvQf/SORN5JhQCcp2Tb7yL1Dw9yHN6WzzfD+XKxrujVlCj/VcQZE7jQ3?= =?us-ascii?q?vprhkCDg2IIBBAIWF+Tv/a0kAi9VshZJkhS/TAhxYA29Uq2Xpz+Uge+rXqsoBE?= =?us-ascii?q?RQZxTHvc85QXLzbDbiy6dB24ZXntRZscOqsA7X3op20WPktevAiZg2IB541r1G?= =?us-ascii?q?qy89Gv0YHZKazRZI3JscOfM2PG7yKlbs/4FZowL/HaJs+DI?= IronPort-HdrOrdr: =?us-ascii?q?A9a23=3ARHDas65JxaMIjv9QSQPXwCzXdLJyesId70hD?= =?us-ascii?q?6qhwISY6TiX+rbHWoB17726TtN9/YgBDpTntAsm9qDbnhPlICOoqTNOftWvdyQ?= =?us-ascii?q?iVxehZhOOIqVDd8m/Fl9K1vp0NT0ERMrLN5BRB/KPHCReDYqsd6ejC4Ka1nv3f?= =?us-ascii?q?0nsoaQlrbptr5wB/Bh3zKDwMeCB2QYo+CIGH5tdK4x6peXEsZMy9AXUfG8fZod?= =?us-ascii?q?mjruOdXTc2Qw4g9BKVjS6lrJrzEx2j1B8YVD9VhZcOmFK16zDE2g=3D=3D?= X-IronPort-AV: E=Sophos;i="5.87,166,1631548800"; d="scan'208";a="116152798" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 20 Oct 2021 13:47:55 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 52BD24D100E9; Wed, 20 Oct 2021 13:47:51 +0800 (CST) Received: from G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.80) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Wed, 20 Oct 2021 13:47:51 +0800 Received: from [10.167.216.64] (10.167.216.64) by G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.23 via Frontend Transport; Wed, 20 Oct 2021 13:47:50 +0800 Subject: Re: [PATCH v7 6/8] mm: Introduce mf_dax_kill_procs() for fsdax case To: "Darrick J. Wong" CC: , , , , , , , , References: <20210924130959.2695749-1-ruansy.fnst@fujitsu.com> <20210924130959.2695749-7-ruansy.fnst@fujitsu.com> <20211014193241.GK24307@magnolia> From: Shiyang Ruan Message-ID: <25f86782-ff1f-db4d-d5da-fd1e5bee45f6@fujitsu.com> Date: Wed, 20 Oct 2021 13:47:50 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20211014193241.GK24307@magnolia> Content-Type: text/plain; charset="UTF-8"; format=flowed X-yoursite-MailScanner-ID: 52BD24D100E9.A140D X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 381A050000B4 X-Stat-Signature: fawtxww9kh5rjkmtdphtrqqp9cogfztt Authentication-Results: imf04.hostedemail.com; dkim=none; spf=none (imf04.hostedemail.com: domain of ruansy.fnst@fujitsu.com has no SPF policy when checking 183.91.158.132) smtp.mailfrom=ruansy.fnst@fujitsu.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=fujitsu.com (policy=none) X-HE-Tag: 1634708873-828165 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =E5=9C=A8 2021/10/15 3:32, Darrick J. Wong =E5=86=99=E9=81=93: > On Fri, Sep 24, 2021 at 09:09:57PM +0800, Shiyang Ruan wrote: >> This function is called at the end of RMAP routine, i.e. filesystem >> recovery function, to collect and kill processes using a shared page o= f >> DAX file. The difference between mf_generic_kill_procs() is, >> it accepts file's mapping,offset instead of struct page. Because >> different file's mappings and offsets may share the same page in fsdax >> mode. So, it is called when filesystem RMAP results are found. >> >> Signed-off-by: Shiyang Ruan >> --- >> fs/dax.c | 10 ------ >> include/linux/dax.h | 9 +++++ >> include/linux/mm.h | 2 ++ >> mm/memory-failure.c | 83 ++++++++++++++++++++++++++++++++++++++++---= -- >> 4 files changed, 86 insertions(+), 18 deletions(-) >> >> diff --git a/fs/dax.c b/fs/dax.c >> index 509b65e60478..2536c105ec7f 100644 >> --- a/fs/dax.c >> +++ b/fs/dax.c >> @@ -852,16 +852,6 @@ static void *dax_insert_entry(struct xa_state *xa= s, >> return entry; >> } >> =20 >> -static inline >> -unsigned long pgoff_address(pgoff_t pgoff, struct vm_area_struct *vma= ) >> -{ >> - unsigned long address; >> - >> - address =3D vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); >> - VM_BUG_ON_VMA(address < vma->vm_start || address >=3D vma->vm_end, v= ma); >> - return address; >> -} >> - >> /* Walk all mappings of a given index of a file and writeprotect the= m */ >> static void dax_entry_mkclean(struct address_space *mapping, pgoff_t= index, >> unsigned long pfn) >> diff --git a/include/linux/dax.h b/include/linux/dax.h >> index 65411bee4312..3d90becbd160 100644 >> --- a/include/linux/dax.h >> +++ b/include/linux/dax.h >> @@ -258,6 +258,15 @@ static inline bool dax_mapping(struct address_spa= ce *mapping) >> { >> return mapping->host && IS_DAX(mapping->host); >> } >> +static inline unsigned long pgoff_address(pgoff_t pgoff, >> + struct vm_area_struct *vma) >> +{ >> + unsigned long address; >> + >> + address =3D vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); >> + VM_BUG_ON_VMA(address < vma->vm_start || address >=3D vma->vm_end, v= ma); >> + return address; >> +} >> =20 >> #ifdef CONFIG_DEV_DAX_HMEM_DEVICES >> void hmem_register_device(int target_nid, struct resource *r); >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 73a52aba448f..d06af0051e53 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -3114,6 +3114,8 @@ enum mf_flags { >> MF_MUST_KILL =3D 1 << 2, >> MF_SOFT_OFFLINE =3D 1 << 3, >> }; >> +extern int mf_dax_kill_procs(struct address_space *mapping, pgoff_t i= ndex, >> + size_t size, int flags); >> extern int memory_failure(unsigned long pfn, int flags); >> extern void memory_failure_queue(unsigned long pfn, int flags); >> extern void memory_failure_queue_kick(int cpu); >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 85eab206b68f..a9d0d487d205 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -302,10 +302,9 @@ void shake_page(struct page *p) >> } >> EXPORT_SYMBOL_GPL(shake_page); >> =20 >> -static unsigned long dev_pagemap_mapping_shift(struct page *page, >> +static unsigned long dev_pagemap_mapping_shift(unsigned long address, >> struct vm_area_struct *vma) >> { >> - unsigned long address =3D vma_address(page, vma); >> pgd_t *pgd; >> p4d_t *p4d; >> pud_t *pud; >> @@ -345,7 +344,7 @@ static unsigned long dev_pagemap_mapping_shift(str= uct page *page, >> * Schedule a process for later kill. >> * Uses GFP_ATOMIC allocations to avoid potential recursions in the = VM. >> */ >> -static void add_to_kill(struct task_struct *tsk, struct page *p, >> +static void add_to_kill(struct task_struct *tsk, struct page *p, pgof= f_t pgoff, >=20 > Hm, so I guess you're passing the page and the pgoff now because > page->index is meaningless for shared dax pages? Ok. Yes, it is for that case. >=20 >> struct vm_area_struct *vma, >> struct list_head *to_kill) >> { >> @@ -358,9 +357,15 @@ static void add_to_kill(struct task_struct *tsk, = struct page *p, >> } >> =20 >> tk->addr =3D page_address_in_vma(p, vma); >> - if (is_zone_device_page(p)) >> - tk->size_shift =3D dev_pagemap_mapping_shift(p, vma); >> - else >> + if (is_zone_device_page(p)) { >> + /* >> + * Since page->mapping is no more used for fsdax, we should >> + * calculate the address in a fsdax way. >> + */ >> + if (p->pgmap->type =3D=3D MEMORY_DEVICE_FS_DAX) >> + tk->addr =3D pgoff_address(pgoff, vma); >> + tk->size_shift =3D dev_pagemap_mapping_shift(tk->addr, vma); >> + } else >> tk->size_shift =3D page_shift(compound_head(p)); >> =20 >> /* >> @@ -508,7 +513,7 @@ static void collect_procs_anon(struct page *page, = struct list_head *to_kill, >> if (!page_mapped_in_vma(page, vma)) >> continue; >> if (vma->vm_mm =3D=3D t->mm) >> - add_to_kill(t, page, vma, to_kill); >> + add_to_kill(t, page, 0, vma, to_kill); >> } >> } >> read_unlock(&tasklist_lock); >> @@ -544,7 +549,32 @@ static void collect_procs_file(struct page *page,= struct list_head *to_kill, >> * to be informed of all such data corruptions. >> */ >> if (vma->vm_mm =3D=3D t->mm) >> - add_to_kill(t, page, vma, to_kill); >> + add_to_kill(t, page, 0, vma, to_kill); >> + } >> + } >> + read_unlock(&tasklist_lock); >> + i_mmap_unlock_read(mapping); >> +} >> + >> +/* >> + * Collect processes when the error hit a fsdax page. >> + */ >> +static void collect_procs_fsdax(struct page *page, struct address_spa= ce *mapping, >> + pgoff_t pgoff, struct list_head *to_kill) >> +{ >> + struct vm_area_struct *vma; >> + struct task_struct *tsk; >> + >> + i_mmap_lock_read(mapping); >> + read_lock(&tasklist_lock); >> + for_each_process(tsk) { >> + struct task_struct *t =3D task_early_kill(tsk, true); >> + >> + if (!t) >> + continue; >> + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { >> + if (vma->vm_mm =3D=3D t->mm) >> + add_to_kill(t, page, pgoff, vma, to_kill); >> } >> } >> read_unlock(&tasklist_lock); >> @@ -1503,6 +1533,43 @@ static int mf_generic_kill_procs(unsigned long = long pfn, int flags, >> return 0; >> } >> =20 >> +/** >> + * mf_dax_kill_procs - Collect and kill processes who are using this = file range >> + * @mapping: the file in use >> + * @index: start offset of the range >> + * @size: length of the range >=20 > It feels odd that one argument is in units of pgoff_t but the other is > in bytes. The index is page aligned but @size may not be. I will explain it in=20 detail in the comments. >=20 >> + * @flags: memory failure flags >> + */ >> +int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, >> + size_t size, int flags) >> +{ >> + LIST_HEAD(to_kill); >> + dax_entry_t cookie; >> + struct page *page; >> + size_t end =3D (index << PAGE_SHIFT) + size; >> + >> + flags |=3D MF_ACTION_REQUIRED | MF_MUST_KILL; >=20 > Hm. What flags will we be passing to the xfs_dax_notify_failure_fn? > Does XFS itself have to care about what's in the flags values, or is it > really just a magic cookie to be passed from the mm layer into the fs > and back to mf_dax_kill_procs? >=20 Just to pass the flag from mm layer to mf_dax_kill_procs(). No one=20 inside this RMAP progress will care about or change it. As you=20 mentioned in the next patch, I think this should be named with a "mf_"=20 prefix to make it easier to understand. -- Thanks, Ruan. > --D >=20 >> + >> + for (; (index << PAGE_SHIFT) < end; index++) { >> + page =3D NULL; >> + cookie =3D dax_lock_mapping_entry(mapping, index, &page); >> + if (!cookie) >> + return -EBUSY; >> + if (!page) >> + goto unlock; >> + >> + SetPageHWPoison(page); >> + >> + collect_procs_fsdax(page, mapping, index, &to_kill); >> + unmap_and_kill(&to_kill, page_to_pfn(page), mapping, >> + index, flags); >> +unlock: >> + dax_unlock_mapping_entry(mapping, index, cookie); >> + } >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(mf_dax_kill_procs); >> + >> static int memory_failure_hugetlb(unsigned long pfn, int flags) >> { >> struct page *p =3D pfn_to_page(pfn); >> --=20 >> 2.33.0 >> >> >>