From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 133CEC5479D for ; Thu, 12 Jan 2023 01:06:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 353838E0002; Wed, 11 Jan 2023 20:06:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 303628E0001; Wed, 11 Jan 2023 20:06:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CB018E0002; Wed, 11 Jan 2023 20:06:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0AA578E0001 for ; Wed, 11 Jan 2023 20:06:37 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CA7CF16098E for ; Thu, 12 Jan 2023 01:06:36 +0000 (UTC) X-FDA: 80344356792.10.2E026A2 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf29.hostedemail.com (Postfix) with ESMTP id 2913F120002 for ; Thu, 12 Jan 2023 01:06:34 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=phQ25BvA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673485595; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/EuDV1NUKfsL4GEXfqEv23SAelvBXb3ssMRhZva4bUQ=; b=M+ZfcebcLG88LbV2e3LV7xrV4ds/DMx/ptQR6G5vps/rGz963M+m5ZqpFdh6Zp578zd5Ne VKI9WEaqjxk1H1LJ7qUzW09ismLKlRBmJh/9tiTZMEatpI34NaXWlW0rI8PSslPVXSNe7y 8/MuMgY5Qa9Bw31dUJe2+Dzi0lyc5YI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=phQ25BvA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673485595; a=rsa-sha256; cv=none; b=JQFrhjsj8YtWohJ9qVvbTuilOiLWKqvH7kgFlkQysEFJhqOs0/ZYQX7EgYTEdL6OxSqTJD UMHST8LjdfXSFSYbPlCy2PC2ZdHW9XsO2UPNqkG5eAeOlymWKmaKDjPOl1Pk29vp44vl1d jzSbwAsXfgO/5dU3Rsg7j2Duri0T+ew= Received: by mail-pg1-f179.google.com with SMTP id 141so11769636pgc.0 for ; Wed, 11 Jan 2023 17:06:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/EuDV1NUKfsL4GEXfqEv23SAelvBXb3ssMRhZva4bUQ=; b=phQ25BvAFBeuQtOnMo8GA7sj/hVjEu8xpCWDWKmFFqeBQM8Xh4dJoGrq5+uKBCWCJO BPi2JeNAdu2C6jvyR+5J482j9kU07ya8TyJ39H3HKLYGOKQN91hRSBH6Z/ze/uxQSDu8 r9NufpcOhux4lXG9cMwgnT4dvxJaQ4q/IuNLB2F0b8j6ETQSHeItrrw8D0YL1rBWAIsM NuTTTom8ZzIPz+bT/Oqqn6to8PXgheDNNjs3N4UEce5j6Mg0dYjjB40ux0QC7nBLT6mk RgEIXpi4JOb0k+EeOc5m+kiD7uzydsibHlY41kNerbJKYOvPFBH2kiGNSUpUtxpIj9Ve Sg0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/EuDV1NUKfsL4GEXfqEv23SAelvBXb3ssMRhZva4bUQ=; b=qcO6g1WbD3i2oRp3OPTaK/OwBJxhX8UtSTqE5wxv+lEWqmpTPhZjdr/gooq1A63eId aB5wjbE0vZQNBGWdtE0SU2A+9672mZiBijyz2JxREcjgxJZ+1J1bT9290aXiI+hZ861Y eyn+09KY9lCyQfZbwIo82nZkYJK015GfpwUmGyU5x3IQ/7TXrTnwPzv5ENmHCdhnXmri QhRlYqccU7tXHRak5PtoLw4eDxlr//ucG8fEsvtpwgmTVn4ga/818WiWefDJar44SdOM m3qnq6iJsBKSuA3L0vFB2+dVqAkeN1uVK0EMTSZCkVMabkvdsFY9nFEQL1IwIKW2AUjj JL4w== X-Gm-Message-State: AFqh2kpznsQY2Hr2WlbiINNsX+UHz83f3p6ICq9oh1Esz3+mGKEiegQn +TxY3gNnqRgpWwNZflfUw7LYKhzDkbLwMsOOdUA= X-Google-Smtp-Source: AMrXdXsU07RBuYlPZNvoWjqBoRCrA9nV+WED0cM8rX2iBjZUsBQkQbWNvRSY67DvAcXfRB/IByFw4z/YadO1L84xkmk= X-Received: by 2002:a63:1b64:0:b0:481:5b48:9ea with SMTP id b36-20020a631b64000000b004815b4809eamr4988303pgm.237.1673485594007; Wed, 11 Jan 2023 17:06:34 -0800 (PST) MIME-Version: 1.0 References: <20230111133351.807024-1-jannh@google.com> In-Reply-To: <20230111133351.807024-1-jannh@google.com> From: Yang Shi Date: Wed, 11 Jan 2023 17:06:22 -0800 Message-ID: Subject: Re: [PATCH] mm/khugepaged: Fix ->anon_vma race To: Jann Horn Cc: Andrew Morton , linux-mm@kvack.org, "Kirill A. Shutemov" , "Zach O'Keefe" , linux-kernel@vger.kernel.org, David Hildenbrand Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2913F120002 X-Stat-Signature: ebcajcr8medbmsekntwuncxpm3eh1o6z X-HE-Tag: 1673485594-334488 X-HE-Meta: U2FsdGVkX18Rrgt5756JQ77de+1CxWCNziIUV7pg/HGdgig5YzZAqlWMpDsCTXnRlLYbcANTilNkHaoUdQ10q6jziPno6nuQDlwO3Du5gogDFjd1mkNGwDNZpXqbmfXVlFfYSS7v0OabevpDJGKCLsMgd3nvK+uyIG8HWl+ujfYXGVtlr3fucnyE+2iqEClfOIM6Yazp0SQ5O++4OI4/WqTutJB/hOjZdwRPaQhuWISRAwIFlz/EmWWffIz6BeeTcQ+CfAK5gDdz/MQdyloUw8335ZBR71K/J+IT+X/V9SR1h56yir2/+60eNuJTHy1ftY+Np9SxdTFnP7jJyw3Zcks+yj046w6x1+pVqM0CGZt1PPll2sWsF81EYN8lePZZnFCZ+NS8SsuqArMSIz1lc7cOTmsIPm8vFwSBR4TUOXo09BW6YeKqBiLe9eIIFmLTgDx3IyV7wujrDZvNwzK37y5xhmio9SCJEfAUFxOg236ZcdX8cih0YTlOae6dt/r8io6KLc0yh8AjlIjUb09mL4ppvFXs6FhpfLQ84rCUi5ZNesneWP8KqDjcMs3xvJS9pBbyrgnIZAPJuWo/inrqumVnPh55DnTPbFQhz+t5UvB+oHC74NXRIk6XLDqBqUNsHCYgFhMOrnHPSERo+nJXz02vrUeiJv9QoyyCoDgnxX+xdMpYhZYU17MSpnRavuaxsyAo241BsBuzHjUGBkPm+vjBOc09Kiulhad7lmj5pUngNQ2H+3GL0Vj3NHt5MUyuMNtslnFH4ZAUIS9T95olLtM9al9b79GBf11cWxjHAOTFuol/HjwINis0cCVEfPmktXIir4O5tYXfGYXRx7kiDDQcAez0QO7Jekq3NcCWtOJMVFSV4QVLSz1o7T5sSoz+3P9j8N6p3GV4l0EqBq1vupIiV+/XgKVRprSIl+12W56kBOw+DMdSPrGBvKn1S7G9ZKCOFp4Q/PrC2TZ+8oO 14mNg7eX Hw5r+05PxYIQTizl02e0TpGzl9hsFjcRU/ium/OiyGdEiQMTyEKUdsDIsZ3gT+0P+f3y8OhGQo/h5vnJ1xJNxq3thCAUOHobZlCjr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 11, 2023 at 5:33 AM Jann Horn wrote: > > If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires > it to be locked. retract_page_tables() bails out if an ->anon_vma is > attached, but does this check before holding the mmap lock (as the comment > above the check explains). > > If we racily merge an existing ->anon_vma (shared with a child process) > from a neighboring VMA, subsequent rmap traversals on pages belonging to > the child will be able to see the page tables that we are concurrently > removing while assuming that nothing else can access them. > > Repeat the ->anon_vma check once we hold the mmap lock to ensure that there > really is no concurrent page table access. > > Reported-by: Zach O'Keefe > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") > Cc: stable@vger.kernel.org > Signed-off-by: Jann Horn > --- > zokeefe@ pointed out to me that the current code (after my last round of patches) > can hit a lockdep assert by racing, and after staring at it a bit I've I'm supposed the lockdep is the one in collapse_and_free_pmd(). It is better to have the splat included in the commit log. > convinced myself that this is a real, preexisting bug. > (I haven't written a reproducer for it though. One way to hit it might be > something along the lines of: > > - set up a process A with a private-file-mapping VMA V1 > - let A fork() to create process B, thereby copying V1 in A to V1' in B > - let B extend the end of V1' > - let B put some anon pages into the extended part of V1' I don't quite get why we need this step. A cow fault on A's V1 isn't enough to have anon_vma for V1? This should not prevent V1 and V2 from sharing anon_vma. Did I miss something? > - let A map a new private-file-mapping VMA V2 directly behind V1, without > an anon_vma > [race begins here] > - in A's thread 1: begin retract_page_tables() on V2, run through first > ->anon_vma check > - in A's thread 2: run __anon_vma_prepare() on V2 and ensure that it > merges the anon_vma of V1 (which implies V1 and V2 must be mapping the > same file at compatible offsets) > - in B: trigger rmap traversal on anon page in V1' > > mm/khugepaged.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 5cb401aa2b9d..0bfed37f3a3b 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1644,7 +1644,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > * has higher cost too. It would also probably require locking > * the anon_vma. > */ > - if (vma->anon_vma) { > + if (READ_ONCE(vma->anon_vma)) { > result = SCAN_PAGE_ANON; > goto next; > } > @@ -1672,6 +1672,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > result = SCAN_PTE_MAPPED_HUGEPAGE; > if ((cc->is_khugepaged || is_target) && > mmap_write_trylock(mm)) { > + /* > + * Re-check whether we have an ->anon_vma, because > + * collapse_and_free_pmd() requires that either no > + * ->anon_vma exists or the anon_vma is locked. > + * We already checked ->anon_vma above, but that check > + * is racy because ->anon_vma can be populated under the > + * mmap lock in read mode. > + */ > + if (vma->anon_vma) { > + result = SCAN_PAGE_ANON; > + goto unlock_next; > + } > /* > * When a vma is registered with uffd-wp, we can't > * recycle the pmd pgtable because there can be pte > > base-commit: 7dd4b804e08041ff56c88bdd8da742d14b17ed25 > -- > 2.39.0.314.g84b9a713c41-goog >