From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92B86C61DB3 for ; Thu, 12 Jan 2023 18:13:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C4AF8E0002; Thu, 12 Jan 2023 13:13:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04D6C8E0001; Thu, 12 Jan 2023 13:13:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E09C18E0002; Thu, 12 Jan 2023 13:13:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CC2C48E0001 for ; Thu, 12 Jan 2023 13:13:07 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A85D6160515 for ; Thu, 12 Jan 2023 18:13:07 +0000 (UTC) X-FDA: 80346943614.10.03BEBBF Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf03.hostedemail.com (Postfix) with ESMTP id 0660C20026 for ; Thu, 12 Jan 2023 18:13:05 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eJUyHk+8; spf=pass (imf03.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673547186; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JBDVrMoISlo1HEfdwCQV/IaZzgua+JrO2ZykJTOMdcM=; b=rtLVLXob8RR4CvrIqjDbYdFBSBCN0rdmWMzxcVCOzJninfpu0UZvXAWUCdth7Ht2DC7PZh /peylqJwOU4QcVDpeF7jJFJZumdoID/ChSwxBoRaITUmARkgsPNytGXqniDICMDTGlSgYv sIo81q6PEFj+6ljBvehF/78IBpmG/qc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eJUyHk+8; spf=pass (imf03.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673547186; a=rsa-sha256; cv=none; b=A6QLj+GlCxaRt0b4KGWKWzRKqo9sovP6hTEpIlK2fP6kQ2LgRAxEfPtUWC3xz0jsLtw88Y Edw7tHaFcrCY12px9q9MTm3TuJnCVKc2mRhRYk6Yz9t1vGGe6o5tPnXkEXB3z5MyCUIZWg nmrG0pDtnBXtU5H5/DpIVifo5orVjek= Received: by mail-pj1-f47.google.com with SMTP id n12so19955325pjp.1 for ; Thu, 12 Jan 2023 10:13:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JBDVrMoISlo1HEfdwCQV/IaZzgua+JrO2ZykJTOMdcM=; b=eJUyHk+83XxjV3sOkg318zGPf2cHgnhvXDXP6uu1NUcVnLbonMbbTFux3aPhF4JN0M DhEPlQnU/GzsoeA6YBSL6h7r7nUZseta/WGmtjlJmLSrfjzAJosYoCGBPmI1KT/iDKOe h7k9nLnSu0kG53HcAzJD9RrmsYIT4TrGpFPrQ/CuyWv3SPFMZGPgS4J8J5U97dNeH8z8 90abuKc8MIJepsSlCPrjJkzVyQLwrTZZ/gnw14V05/98BC8TM5w41PS+ZoCKsGrHKB6L /NFx7dVPRzYUqiKDNLA+1LPtc3+7eS9nqDaDHMk9g41T01j+iFR0qL76/3rgIwqaMA48 DefA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JBDVrMoISlo1HEfdwCQV/IaZzgua+JrO2ZykJTOMdcM=; b=q578nAHzqzNWdEkxGjqGYc3QJnNgXBnijwiHmuw7A8sv6dKEoXLmQxmr1CXJQ/UDDA Zbl7pzCYJloTOqKv0y+xdWZLtw7sgVnoFIIkeBuc3cWsGVC9MKIvvpIV8SF9loMNIVJ4 C7fFvLA/T3000ezCzlVRQsl/SAZqM24OB183OtgBpclXNCK0+bzhz34gYGBSJQ3QX6vV 3+XdkKf5Etpj1QY0A+hW9j0SKvXSUkm19/YbT37UYqRez43ODBeGWc6tjeVRxbKYcbu8 BQONAUGl4J9tF8xYIxFqocttmtffnMsJ7gMoQM9hsyMVdeCkEF67ii9zLdQbZ8LdGiiw ZWMA== X-Gm-Message-State: AFqh2krVKdtm1BkoBvGI/ja+13oylQlcAxJlYGQYmMiLwKGPZtQMRCN8 M0PZlYbuXndGxgOeJTtHWJ9KdzxhjEek3s+Qh7M= X-Google-Smtp-Source: AMrXdXtyygixd47OWmdTEy9qvnkH7mUhjQ25yIu3OL+enGZHEBQF1F2wV/fy7F2ALnWpqKSGRgnFS+XBitLg1Voku8I= X-Received: by 2002:a17:902:8e85:b0:192:ba7a:2be4 with SMTP id bg5-20020a1709028e8500b00192ba7a2be4mr4267212plb.27.1673547184675; Thu, 12 Jan 2023 10:13:04 -0800 (PST) MIME-Version: 1.0 References: <20230111133351.807024-1-jannh@google.com> <20230112085649.gvriasb2t5xwmxkm@box.shutemov.name> In-Reply-To: <20230112085649.gvriasb2t5xwmxkm@box.shutemov.name> From: Yang Shi Date: Thu, 12 Jan 2023 10:12:53 -0800 Message-ID: Subject: Re: [PATCH] mm/khugepaged: Fix ->anon_vma race To: "Kirill A. Shutemov" Cc: Jann Horn , Andrew Morton , linux-mm@kvack.org, "Kirill A. Shutemov" , "Zach O'Keefe" , linux-kernel@vger.kernel.org, David Hildenbrand Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 0660C20026 X-Stat-Signature: 6fsnfz6s78m4sqzts9tkhxsgpgtcq1y5 X-Rspam-User: X-HE-Tag: 1673547185-648262 X-HE-Meta: U2FsdGVkX19/tWnlLOzGqHkr58WVddBzC42EmW2g02CVVUVW7DyLERz10yv3a92QUxgdU1KEaa0K6IvCTp4XEEvQFpaKgUx6Q/XgMsZO5lXziclDfTzffhEM9/56v83dOX7NS2tQkYbpEZG/E3LGjyG3EXK3l5PMUcSNVViakVsXXxUp5b1QXBJAeo9MXvEX6SBy7Nnw7tFGTgakMACx8z6AgNwjgstjLBVKZqpZy+xR/Y66+cTOpvLqVmROxlz8MH0lySIhdSXlY8MUcX+G3IFeUtC9J/RBmHq9V8vBQOADfk16OQx/A3UsT71FWWZVBhD6TlAxrcKwGWWDuwcqAp2UkKqA3CueO0R7WzPT9qP4u42FHDae+pJBuYK4WJgfygQJ+o5k6Z0Y+E8reN5wh2GKWhWqCb4BatQotRk6xol3Ulpue1dkrZTKwocwBGm1faQFp6oaZaIJyTWbroVAQEccaKPPSZX29H6Hh/5sutiLKr1HSP2F2KrpjdxN4sA6EgF8awMefpUDkmW+uvq47AozHIHv8eIKMvPeOsTo/Z65pmf2i/vP+usJZRfSZy1V33eOYOtpsTMG//wQnS5kj+6qxfJNU5NZCryA7uRIK5DO/0QkD/SOjiVRBE1u0/1G+3dmSKeNxDFXNJEbXystsxWx21OfQlKrgZ6aGI9b3r4VmF9zGY9tSECvpWhk/57AVjQ+cEg7dO04LwOcHSfpJRTHWfunIm49eaqshYRTeMigZYdiwU03tZIrprN+uuf8qTWYOVZpNLPYLerugPeGUqNd+lEdtVyjGAKc837uliC+j7rgna/56VqQBeXbogoZEykCZdcW+I1eR77jvFT3BJpQsBW2KVJDylXtdqvZL6mQl400OjCct+DmHbBWdo2eOTsCfxxYGIEvGVv7iW0xfoJAN6L5eeqaOP19hqELOCXnjqgklpRLeeEAOXcgEDgrglVzmYr84ug6XbQXVEA MEyOkeQy 7hvXVIumKKRTNZZHJPNzi+4v/l2d5NEfpGgxTlHlt6H2TitcWjk73zQSdhYop8ipYzyzPp9uHShKKpJN56LX5lyJq7vtQO6B5fElS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 12, 2023 at 12:56 AM Kirill A. Shutemov wrote: > > On Wed, Jan 11, 2023 at 02:33:51PM +0100, Jann Horn wrote: > > If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires > > it to be locked. retract_page_tables() bails out if an ->anon_vma is > > attached, but does this check before holding the mmap lock (as the comment > > above the check explains). > > > > If we racily merge an existing ->anon_vma (shared with a child process) > > from a neighboring VMA, subsequent rmap traversals on pages belonging to > > the child will be able to see the page tables that we are concurrently > > removing while assuming that nothing else can access them. > > > > Repeat the ->anon_vma check once we hold the mmap lock to ensure that there > > really is no concurrent page table access. > > > > Reported-by: Zach O'Keefe > > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") > > Cc: stable@vger.kernel.org > > Signed-off-by: Jann Horn > > --- > > zokeefe@ pointed out to me that the current code (after my last round of patches) > > can hit a lockdep assert by racing, and after staring at it a bit I've > > convinced myself that this is a real, preexisting bug. > > (I haven't written a reproducer for it though. One way to hit it might be > > something along the lines of: > > > > - set up a process A with a private-file-mapping VMA V1 > > - let A fork() to create process B, thereby copying V1 in A to V1' in B > > - let B extend the end of V1' > > - let B put some anon pages into the extended part of V1' > > At this point V1' gets it's own ->anon_vma, not connected to V1, right? This is what I got confused too. > > > - let A map a new private-file-mapping VMA V2 directly behind V1, without > > an anon_vma > > [race begins here] > > - in A's thread 1: begin retract_page_tables() on V2, run through first > > ->anon_vma check > > - in A's thread 2: run __anon_vma_prepare() on V2 and ensure that it > > merges the anon_vma of V1 (which implies V1 and V2 must be mapping the > > same file at compatible offsets) > > - in B: trigger rmap traversal on anon page in V1' > > I don't follow the race. rmap on V1' will not reach V1. > > > mm/khugepaged.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 5cb401aa2b9d..0bfed37f3a3b 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -1644,7 +1644,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > > * has higher cost too. It would also probably require locking > > * the anon_vma. > > */ > > - if (vma->anon_vma) { > > + if (READ_ONCE(vma->anon_vma)) { > > result = SCAN_PAGE_ANON; > > goto next; > > } > > This makes perfect sense. At least for readability. But I think > false-negative should not lead to bad results. > > > @@ -1672,6 +1672,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > > result = SCAN_PTE_MAPPED_HUGEPAGE; > > if ((cc->is_khugepaged || is_target) && > > mmap_write_trylock(mm)) { > > + /* > > + * Re-check whether we have an ->anon_vma, because > > + * collapse_and_free_pmd() requires that either no > > + * ->anon_vma exists or the anon_vma is locked. > > + * We already checked ->anon_vma above, but that check > > + * is racy because ->anon_vma can be populated under the > > + * mmap lock in read mode. > > + */ > > + if (vma->anon_vma) { > > + result = SCAN_PAGE_ANON; > > + goto unlock_next; > > + } > > This is totally wrong direction. Or I don't understand the race. > > At this point we already paid nearly all price of of pagetable retraction. > I don't see any correctness reason to stop here, except for the assert. Isn't it possible that collapse_and_free_pmd() clear the pmd which may point to a PTE which maps the COW'ed anon page if this race happens? > > I think lockdep assert in collapse_and_free_pmd() is wrong and has to be > dropped. > > > /* > > * When a vma is registered with uffd-wp, we can't > > * recycle the pmd pgtable because there can be pte > > > > base-commit: 7dd4b804e08041ff56c88bdd8da742d14b17ed25 > > -- > > 2.39.0.314.g84b9a713c41-goog > > > > -- > Kiryl Shutsemau / Kirill A. Shutemov