From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5229C433EF for ; Tue, 7 Sep 2021 18:08:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4F13261101 for ; Tue, 7 Sep 2021 18:08:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4F13261101 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8D1026B0071; Tue, 7 Sep 2021 14:08:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8816C6B0072; Tue, 7 Sep 2021 14:08:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 772E46B0073; Tue, 7 Sep 2021 14:08:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id 66E756B0071 for ; Tue, 7 Sep 2021 14:08:40 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 25FF928DB9 for ; Tue, 7 Sep 2021 18:08:40 +0000 (UTC) X-FDA: 78561562800.34.16D1BBB Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf03.hostedemail.com (Postfix) with ESMTP id D89A230000A4 for ; Tue, 7 Sep 2021 18:08:39 +0000 (UTC) Received: by mail-ej1-f42.google.com with SMTP id n27so21372453eja.5 for ; Tue, 07 Sep 2021 11:08:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=fwYwMhWXPXrUKzZlg4iOJNECw6OhBovM0C6OA+I1Nf4=; b=Ar2rG8os2SXQq8tDsrsyfDda/XU0xpNXUr6bXykC4UJRCkpI015j2cKhiism3tRcn3 sNaduVX1CdEIF97zltIJ8Aruor+W7WCfnj8DcKXQmDLrV6KB+2dvfBiU8qNguzooSlwM roIaCWwawtJPsyQdZRHhbZg6yu8m+kfMFrxMKn7eTdGZdhw30i57oqHMKg938mrKmbJE ZGIVEXa7B2NyVlcnDSqp1J6eROQEjtTP5chm4rTFPxHXN7Nx0zotzdLM4iHIce+zrpai uJQYDoNcpAGUGZrgTcxp6+zoohVpG8PKns/En05qojao/2nW5f1hijp9ooVRNg4SliXi E+4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=fwYwMhWXPXrUKzZlg4iOJNECw6OhBovM0C6OA+I1Nf4=; b=RgNa32JEBUs52DN4Jt0Fu4LzHBBjNjkxioz1u2EnZQc+ews50SeSWFsTymyB+hWoLN m3WYXa9gfroW2pIytN4rjQoDy2KTtzpO6MbQu8xAd/7ITrhxy5F28jVZpv/k5AndpFTF SBx4PNyaAZZvu4s16wa1PJXHvBWD9NqI0M49MGLZGrVoWUSWxJej2fBGou/SPoCsest3 v4kYmsC9jUp8LbRbMLk1sSZ1Z0h1X+U7mjHvXgcB/mPCaTj/q2Gh+nTNyiADM6YKLdaS cmow+wBYL2IccEPK88SpLOi2GpvyPs62E6kqzNmP1+P47X9abQfdcElVPoaoSnqMNF43 R2Kg== X-Gm-Message-State: AOAM533XKGF2CXP/HylW3RvshQSrrn9kFTE1B73JW6yyov0RhO0ue2wC 8n/0GdUVZeLkZqtnnBHXbFdEzPxkwHqno8UMKd0= X-Google-Smtp-Source: ABdhPJyRB/8Qcg1XHc8bco/HGGlNy6Y6b52MmUsiORjCkUArHMsOHc06C6knZY14j1u50jWb2Wqt2PMccM/cZUvRMfQ= X-Received: by 2002:a17:906:3854:: with SMTP id w20mr19530536ejc.537.1631038118451; Tue, 07 Sep 2021 11:08:38 -0700 (PDT) MIME-Version: 1.0 References: <20210906121200.57905-1-rongwei.wang@linux.alibaba.com> <20210906121200.57905-2-rongwei.wang@linux.alibaba.com> In-Reply-To: <20210906121200.57905-2-rongwei.wang@linux.alibaba.com> From: Yang Shi Date: Tue, 7 Sep 2021 11:08:26 -0700 Message-ID: Subject: Re: [PATCH 1/2] mm, thp: check page mapping when truncating page cache To: Rongwei Wang Cc: Linux MM , Linux Kernel Mailing List , Andrew Morton , cfijalkovich@google.com, song@kernel.org, william.kucharski@oracle.com, Hugh Dickins , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Ar2rG8os; spf=pass (imf03.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D89A230000A4 X-Stat-Signature: 6y16nfep3eytrus1qkti1ffimi1xqffg X-HE-Tag: 1631038119-16015 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 6, 2021 at 5:12 AM Rongwei Wang wrote: > > Transparent huge page has supported read-only non-shmem files. The file- > backed THP is collapsed by khugepaged and truncated when written (for > shared libraries). > > However, there is race in two possible places. > > 1) multiple writers truncate the same page cache concurrently; > 2) collapse_file rolls back when writer truncates the page cache; > > In both cases, subpage(s) of file THP can be revealed by find_get_entry > in truncate_inode_pages_range, which will trigger PageTail BUG_ON in > truncate_inode_page, as follows. > > [40326.247034] page:000000009e420ff2 refcount:1 mapcount:0 mapping:000000= 0000000000 index:0x7ff pfn:0x50c3ff > [40326.247041] head:0000000075ff816d order:9 compound_mapcount:0 compound= _pincount:0 > [40326.247046] flags: 0x37fffe0000010815(locked|uptodate|lru|arch_1|head) > [40326.247051] raw: 37fffe0000000000 fffffe0013108001 dead000000000122 de= ad000000000400 > [40326.247053] raw: 0000000000000001 0000000000000000 00000000ffffffff 00= 00000000000000 > [40326.247055] head: 37fffe0000010815 fffffe001066bd48 ffff000404183c20 0= 000000000000000 > [40326.247057] head: 0000000000000600 0000000000000000 00000001ffffffff f= fff000c0345a000 > [40326.247058] page dumped because: VM_BUG_ON_PAGE(PageTail(page)) > [40326.247077] ------------[ cut here ]------------ > [40326.247080] kernel BUG at mm/truncate.c:213! > [40326.280581] Internal error: Oops - BUG: 0 [#1] SMP > [40326.281077] Modules linked in: xfs(E) libcrc32c(E) rfkill(E) aes_ce_bl= k(E) crypto_simd(E) cryptd(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) = sha1_ce(E) uio_pdrv_genirq(E) uio(E) nfsd(E) vfat(E) fat(E) auth_rpcgss(E) = nfs_acl(E) lockd(E) grace(E) sunrpc(E) sch_fq_codel(E) ip_tables(E) ext4(E)= mbcache(E) jbd2(E) virtio_net(E) net_failover(E) virtio_blk(E) failover(E)= sha2_ce(E) sha256_arm64(E) virtio_mmio(E) virtio_pci(E) virtio_ring(E) vir= tio(E) > [40326.285130] CPU: 14 PID: 11394 Comm: check_madvise_d Kdump: loaded Tai= nted: G W E 5.10.46-hugetext+ #55 > [40326.286202] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0= 02/06/2015 > [40326.286968] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=3D--) > [40326.287584] pc : truncate_inode_page+0x64/0x70 > [40326.288040] lr : truncate_inode_page+0x64/0x70 > [40326.288498] sp : ffff80001b60b900 > [40326.288837] x29: ffff80001b60b900 x28: 00000000000007ff > [40326.289377] x27: ffff80001b60b9a0 x26: 0000000000000000 > [40326.289943] x25: 000000000000000f x24: ffff80001b60b9a0 > [40326.290485] x23: ffff80001b60ba18 x22: ffff0001e0999ea8 > [40326.291027] x21: ffff0000c21db300 x20: ffffffffffffffff > [40326.291566] x19: fffffe001310ffc0 x18: 0000000000000020 > [40326.292106] x17: 0000000000000000 x16: 0000000000000000 > [40326.292655] x15: ffff0000c21db960 x14: 3030306666666620 > [40326.293197] x13: 6666666666666666 x12: 3130303030303030 > [40326.293746] x11: ffff8000117b69b8 x10: 00000000ffff8000 > [40326.294313] x9 : ffff80001012690c x8 : 0000000000000000 > [40326.294851] x7 : ffff8000114f69b8 x6 : 0000000000017ffd > [40326.295392] x5 : ffff0007fffbcbc8 x4 : ffff80001b60b5c0 > [40326.295942] x3 : 0000000000000001 x2 : 0000000000000000 > [40326.296497] x1 : 0000000000000000 x0 : 0000000000000000 > [40326.297047] Call trace: > [40326.297304] truncate_inode_page+0x64/0x70 > [40326.297724] truncate_inode_pages_range+0x550/0x7e4 > [40326.298251] truncate_pagecache+0x58/0x80 > [40326.298662] do_dentry_open+0x1e4/0x3c0 > [40326.299052] vfs_open+0x38/0x44 > [40326.299377] do_open+0x1f0/0x310 > [40326.299709] path_openat+0x114/0x1dc > [40326.300077] do_filp_open+0x84/0x134 > [40326.300444] do_sys_openat2+0xbc/0x164 > [40326.300825] __arm64_sys_openat+0x74/0xc0 > [40326.301236] el0_svc_common.constprop.0+0x88/0x220 > [40326.301723] do_el0_svc+0x30/0xa0 > [40326.302089] el0_svc+0x20/0x30 > [40326.302404] el0_sync_handler+0x1a4/0x1b0 > [40326.302814] el0_sync+0x180/0x1c0 > [40326.303157] Code: aa0103e0 900061e1 910ec021 9400d300 (d4210000) > [40326.303775] ---[ end trace f70cdb42cb7c2d42 ]--- > [40326.304244] Kernel panic - not syncing: Oops - BUG: Fatal exception > > This checks the page mapping and retries when subpage of file THP is > found, in truncate_inode_pages_range. > > Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-= backed THPs") > Signed-off-by: Xu Yu > Signed-off-by: Rongwei Wang > --- > mm/filemap.c | 7 ++++++- > mm/truncate.c | 17 ++++++++++++++++- > 2 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index dae481293..a3af2ec 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2093,7 +2093,6 @@ unsigned find_lock_entries(struct address_space *ma= pping, pgoff_t start, > if (!xa_is_value(page)) { > if (page->index < start) > goto put; > - VM_BUG_ON_PAGE(page->index !=3D xas.xa_index, pag= e); > if (page->index + thp_nr_pages(page) - 1 > end) > goto put; > if (!trylock_page(page)) > @@ -2102,6 +2101,12 @@ unsigned find_lock_entries(struct address_space *m= apping, pgoff_t start, > goto unlock; > VM_BUG_ON_PAGE(!thp_contains(page, xas.xa_index), > page); > + /* > + * We can find and get head page of file THP with > + * non-head index. The head page should have alre= ady > + * be truncated with page->mapping reset to NULL. > + */ > + VM_BUG_ON_PAGE(page->index !=3D xas.xa_index, pag= e); > } > indices[pvec->nr] =3D xas.xa_index; > if (!pagevec_add(pvec, page)) > diff --git a/mm/truncate.c b/mm/truncate.c > index 714eaf1..8c59c00 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -319,7 +319,8 @@ void truncate_inode_pages_range(struct address_space = *mapping, > index =3D start; > while (index < end && find_lock_entries(mapping, index, end - 1, > &pvec, indices)) { > - index =3D indices[pagevec_count(&pvec) - 1] + 1; > + index =3D indices[pagevec_count(&pvec) - 1] + > + thp_nr_pages(pvec.pages[pagevec_count(&pvec) - 1]= ); I don't quite get what this change is doing for. IIUC find_lock_entries() already handles index advance correctly. If truncate range is partial THP, it will be handled in the second pass. AFAICT, the problem is why the THP is not split if it will get partially truncated. Did I miss something? > truncate_exceptional_pvec_entries(mapping, &pvec, indices= ); > for (i =3D 0; i < pagevec_count(&pvec); i++) > truncate_cleanup_page(pvec.pages[i]); > @@ -391,6 +392,20 @@ void truncate_inode_pages_range(struct address_space= *mapping, > if (xa_is_value(page)) > continue; > > + /* > + * Already truncated? We can find and get subpage > + * of file THP, of which the head page is truncat= ed. > + * > + * In addition, another race will be avoided, whe= re > + * collapse_file rolls back when writer truncates= the > + * page cache. > + */ > + if (page_mapping(page) !=3D mapping) { > + /* Restart to make sure all gone */ > + index =3D start - 1; > + continue; > + } > + > lock_page(page); > WARN_ON(page_to_index(page) !=3D index); > wait_on_page_writeback(page); > -- > 1.8.3.1 > >