From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6299C48BCD for ; Wed, 9 Jun 2021 16:58:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 42454613C7 for ; Wed, 9 Jun 2021 16:58:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 42454613C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 92CB96B0036; Wed, 9 Jun 2021 12:58:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F0B66B006C; Wed, 9 Jun 2021 12:58:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76B0A6B0070; Wed, 9 Jun 2021 12:58:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 416486B0036 for ; Wed, 9 Jun 2021 12:58:12 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D7F1B824999B for ; Wed, 9 Jun 2021 16:58:11 +0000 (UTC) X-FDA: 78234793182.22.AED7DAE Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf16.hostedemail.com (Postfix) with ESMTP id BBF6D8019364 for ; Wed, 9 Jun 2021 16:58:07 +0000 (UTC) Received: by mail-ej1-f44.google.com with SMTP id og14so34170085ejc.5 for ; Wed, 09 Jun 2021 09:58:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/+Iz4z4AXZFpogc1y7XNcas3um0G9P+lqqs2vFYzvns=; b=uQAe4Qp74C/urpx55A2OIKTx8CEqWICURBpH4nSsA/vq+J5va3cimSQjvi5o28+NAn 3g7haWHqYSN7QsIWZG7pMlZZEdFgGk1aOcPtWuGLejMomxsq/avJF/jzLJOyycnGHKZU FJu9BKQ1K1xaoCKcRomuVREJQxas84bHZ5iODy8W0IA2/mOfRQ1Zvsz8v3CCWdbg9fBs fSg2gjvj4llWzXlwT7SJdj9wioxTQkR2Znwd63DVHwEdGSpe5oYDNC4w/cPuyW2eai1W FzpdAPU01XTk28NQaTO6CExYFPDABWh4CI5+667061HZuuS5SfPI6pL1v46Z/sryP+Tj 3uew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/+Iz4z4AXZFpogc1y7XNcas3um0G9P+lqqs2vFYzvns=; b=lwWSO0areCL6yAAxTW0FgMtn7aUhCA/8W4G1RLlcC1MKFEgW8vwcHzLcbVXDddQ6pU nMBDIVqqmKk71UxbfazHhQsSxcytLfdvU3HVuh+QesM62dXGXUPAYvtO/J/N3ivcmnnf 11MnVz6hgm7iazpwCgiTqOP1adsfcZW51sk0y8l5YJzXJQ1//39b6Y5aZ+/PQoZxdCQ3 3w2RxgmVUbiMU8iTfiaLzkjn6/ZqKWiLsu68Alv9goko94Bucd0p1H6YoqdojN+E6lo5 64vWqjnGyOVZIGt7x4OXMPFtETK9MPpy1tCsALqJRvb/VSW0J8DKRhSXDG5XpAlgS8xj JkMg== X-Gm-Message-State: AOAM532lgDb9/MBmqq0qQ/0Pu2J4R5BaLt95T1c8SBFDrZu86OQJzs6m euR4UFGFCzK+rvWlR9lW/rduALgDkJnSsqkKY+Jjh5+FJAQ= X-Google-Smtp-Source: ABdhPJw7NaYHNKcTjrCEXSSDQKq/WdTMR+eQzRAXm6v3HZoSFaW6MZ5nwGBDWkpwJjz3jiTZ/dnZCUsr7EF+Cn+leAQ= X-Received: by 2002:a17:906:3912:: with SMTP id f18mr783301eje.161.1623257890532; Wed, 09 Jun 2021 09:58:10 -0700 (PDT) MIME-Version: 1.0 References: <6b2b6683-d9a7-b7d0-a3e5-425b96338d63@google.com> In-Reply-To: <6b2b6683-d9a7-b7d0-a3e5-425b96338d63@google.com> From: Yang Shi Date: Wed, 9 Jun 2021 09:57:58 -0700 Message-ID: Subject: Re: [PATCH v2 03/10] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting (fwd) To: Hugh Dickins Cc: Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BBF6D8019364 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=uQAe4Qp7; spf=pass (imf16.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: fb9gzbzueqphu8cdt9kdg55md1t67xqu X-HE-Tag: 1623257887-598793 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 8, 2021 at 9:12 PM Hugh Dickins wrote: > > > > ---------- Forwarded message ---------- > Date: Tue, 8 Jun 2021 21:10:19 -0700 (PDT) > From: Hugh Dickins > To: Andrew Morton > Cc: Hugh Dickins , > Kirill A. Shutemov , > Yang Shi , Wang Yugui , > Matthew Wilcox , > Naoya Horiguchi , > Alistair Popple , Ralph Campbell , > Zi Yan , Miaohe Lin , > Minchan Kim , Jue Wang , > Peter Xu , Jan Kara , > Shakeel Butt , Oscar Salvador > Subject: [PATCH v2 03/10] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting > > Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE > (!unmap_success): with dump_page() showing mapcount:1, but then its > raw struct page output showing _mapcount ffffffff i.e. mapcount 0. > > And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, > it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), > and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): > all indicative of some mapcount difficulty in development here perhaps. > But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. > > I believe the problem is that once a racing unmap has cleared pte or pmd, > try_to_unmap_one() may skip taking the page table lock, and emerge from > try_to_unmap() before the racing task has reached decrementing mapcount. > > Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that > follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC > to the options, and passing that from unmap_page(). > > When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same > for both: the slight overhead added should rarely matter, except perhaps > if splitting sparsely-populated multiply-mapped shmem. Once confident > that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. > > Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") > Signed-off-by: Hugh Dickins > Cc: > --- > v2: moved TTU_SYNC definition up, to avoid conflict with other patchset > use TTU_SYNC even when non-debug, per Peter Xu and Yang Shi > expanded PVMW_SYNC's spin_unlock(pmd_lock()), per Kirill and Peter Reviewed-by: Yang Shi > > include/linux/rmap.h | 1 + > mm/huge_memory.c | 2 +- > mm/page_vma_mapped.c | 11 +++++++++++ > mm/rmap.c | 17 ++++++++++++++++- > 4 files changed, 29 insertions(+), 2 deletions(-) > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index def5c62c93b3..8d04e7deedc6 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -91,6 +91,7 @@ enum ttu_flags { > > TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ > TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ > + TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */ > TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */ > TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible > * and caller guarantees they will > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 5885c5f5836f..84ab735139dc 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, > > static void unmap_page(struct page *page) > { > - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | > + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC | > TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; > bool unmap_success; > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index 2cf01d933f13..5b559967410e 100644 > --- a/mm/page_vma_mapped.c > +++ b/mm/page_vma_mapped.c > @@ -212,6 +212,17 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > pvmw->ptl = NULL; > } > } else if (!pmd_present(pmde)) { > + /* > + * If PVMW_SYNC, take and drop THP pmd lock so that we > + * cannot return prematurely, while zap_huge_pmd() has > + * cleared *pmd but not decremented compound_mapcount(). > + */ > + if ((pvmw->flags & PVMW_SYNC) && > + PageTransCompound(pvmw->page)) { > + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); > + > + spin_unlock(ptl); > + } > return false; > } > if (!map_pte(pvmw)) > diff --git a/mm/rmap.c b/mm/rmap.c > index 693a610e181d..07811b4ae793 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > struct mmu_notifier_range range; > enum ttu_flags flags = (enum ttu_flags)(long)arg; > > + /* > + * When racing against e.g. zap_pte_range() on another cpu, > + * in between its ptep_get_and_clear_full() and page_remove_rmap(), > + * try_to_unmap() may return false when it is about to become true, > + * if page table locking is skipped: use TTU_SYNC to wait for that. > + */ > + if (flags & TTU_SYNC) > + pvmw.flags = PVMW_SYNC; > + > /* munlock has nothing to gain from examining un-locked vmas */ > if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) > return true; > @@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) > else > rmap_walk(page, &rwc); > > - return !page_mapcount(page) ? true : false; > + /* > + * When racing against e.g. zap_pte_range() on another cpu, > + * in between its ptep_get_and_clear_full() and page_remove_rmap(), > + * try_to_unmap() may return false when it is about to become true, > + * if page table locking is skipped: use TTU_SYNC to wait for that. > + */ > + return !page_mapcount(page); > } > > /** > -- > 2.26.2 > >