From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09660C52D7B for ; Wed, 14 Aug 2024 02:25:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94D3E6B0085; Tue, 13 Aug 2024 22:25:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8FE5E6B0088; Tue, 13 Aug 2024 22:25:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C5636B0089; Tue, 13 Aug 2024 22:25:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5D4CF6B0085 for ; Tue, 13 Aug 2024 22:25:56 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 106B380BB5 for ; Wed, 14 Aug 2024 02:25:56 +0000 (UTC) X-FDA: 82449260712.07.CB83A20 Received: from mail-il1-f171.google.com (mail-il1-f171.google.com [209.85.166.171]) by imf30.hostedemail.com (Postfix) with ESMTP id 4F27680005 for ; Wed, 14 Aug 2024 02:25:53 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=gdgVuhm7; spf=pass (imf30.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.166.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723602342; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h6LNGe/J5ub4oSc4wEcWX9/lvXmZbxXATcRN21QG/c4=; b=VlDMG/DFQgqJpXqS0YdPmGPbZEl0/fw+JZzQey4J2RCU3s5jsVuEFimG8XgjK5Xq4KxKPU y2kxuk8VMTu8i9Lq7plwSY6iANxtTSA8Xc4ErHB0XUuFuVsmJjUQHviTZ5UlrTcSUwbz7s r608TPHAqn3ezh7No/Pa5HbgFq7cSwE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=gdgVuhm7; spf=pass (imf30.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.166.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723602342; a=rsa-sha256; cv=none; b=R+4akcniZ25PDlUfwrmPZ7EK8lTKzEQPXbqBMPLQeRlUFzWqqQHL0SadB5sj7KVyk+su/8 LJWmgA3Jy9s/2QzDpq3PUoK/L1uUYyHKuuVuNhT5/6ZzzXS19Dp5/pN2wrDjZlf5EaSOxX Y779d1UwgE57Imx8xTqKwM0MMYls2AQ= Received: by mail-il1-f171.google.com with SMTP id e9e14a558f8ab-39b3c36d429so3186635ab.1 for ; Tue, 13 Aug 2024 19:25:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1723602352; x=1724207152; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=h6LNGe/J5ub4oSc4wEcWX9/lvXmZbxXATcRN21QG/c4=; b=gdgVuhm7+ChpHKaxwVJ0eIw1hR47ky9aZyjHDx4tDIyGEyZGzFe9f23ZdmGOSfmJe5 tBb4lvVraM+2rkTiSVHAuQBkJgxSGlZs8eI4Y4CMSvP3KDereMBJQfH8wkVaPENda2RN F0UfFj896Cxow06DaynfLzEP32HjKy0iyXkOX712xr1pYgbpLYumv/kr5VUCCregyc/7 R9963xDLREa/tbxaBKc8Ci+BRt24zKKkA/b6clVPZOEbnSPXnIkQP6nTGr/MuQWI094J Slt2f0cNpWmf9ZDPWotNcl80Mw7t7hswLJpc4lxgvdNJFNSCF3PT0hfEfOlxj2HEe7m+ yCBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723602352; x=1724207152; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=h6LNGe/J5ub4oSc4wEcWX9/lvXmZbxXATcRN21QG/c4=; b=MOMTXl7fl7tgS4id2PQ5z3XTA+GZuXAMiA+6DDvxTGIJ+kaWVjiiXMcPZNNfF99JLl Brft1GWb3hxSl2gFCRZmtBmHN0A4BMWA8U0UgGo9WWQdFwkcISCCFlSV1l9dvZOzVm0l lb46T5BEq57mQ9iUedNU3tfsVRTttMk56QKqM7so35Ysf0Tif8h3+KMi+8bAu0fCRpK2 Qf/JSKw1MLMotXINyvD+IjsoI54daXFV94yzPoiRkBB65krTflbOJuDcYFEwwnaSgoPf wl44LJbnxeI2o+fOqRQ9OJ1j3kv03nZi0E2mL7C5BmcDujvLhC3aR4MyvqElFVFe521F 4dnQ== X-Forwarded-Encrypted: i=1; AJvYcCXSr/8p68dGDF6BoMOOQ1P8JxvR1ZbBgf66DvhqFrgSeFX8kyzKuDamX8pKT7Dg262r9fn5V/82I34WoUyHRCjmvLo= X-Gm-Message-State: AOJu0YwAHtjbA2CA369xAY20qbxiw2qZadrUMRTHLe/L36rXkK/jFvQl VfLufEoS4G7CF55VwGx975R8ArIDXctSbp0ujJfQ1GzNw3A22yjwT/XABwgGj/s= X-Google-Smtp-Source: AGHT+IEn4p4u+nmkKxqQbRWPDC6r2DEt5JWPBdmKEfZqfN+TfyQpUSmW5aIcJHJ7gF6pbgyQMbYzzg== X-Received: by 2002:a05:6e02:1d81:b0:39c:2cf0:42f2 with SMTP id e9e14a558f8ab-39d124f2f2dmr10090755ab.4.1723602351912; Tue, 13 Aug 2024 19:25:51 -0700 (PDT) Received: from [10.4.217.215] ([139.177.225.242]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7c6979ebfaesm2154264a12.24.2024.08.13.19.25.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Aug 2024 19:25:51 -0700 (PDT) Message-ID: <382ae7b5-9ccd-4202-a02d-be5d453f7c43@bytedance.com> Date: Wed, 14 Aug 2024 10:25:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] userfaultfd: Fix checks for huge PMDs Content-Language: en-US To: Jann Horn Cc: Andrew Morton , Pavel Emelianov , Andrea Arcangeli , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , stable@vger.kernel.org References: <20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com> <20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com> From: Qi Zheng In-Reply-To: <20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: k8d8jxsgx9nm7b1xrnu1sceumohn6ojf X-Rspamd-Queue-Id: 4F27680005 X-Rspamd-Server: rspam11 X-HE-Tag: 1723602353-92093 X-HE-Meta: U2FsdGVkX18M/SFymnzjawE/u95s7VxdAbjkM5S3mNlqaMJL8WqPd2qjG7EWqOn+/s5o936XsMM6smdKrChVLtUrgNuEMmtgBG5l0ZfCJUr6FGjVPT6qVQwvP+Lew3KuOayGxfLdylIeBRXAnO/t6JFUuGbSsfWi9QKOfkFZpNETjDMa7OFDyX6hgpT+P06p7O3Ef9gdKctP0/kWpmiUR+FgIApkPgX6d5NII+tfl+UM1DKTe+XeWPuJj8Wz308HXo9NTnHuvsnkgsJrTymE4pfRYHfghNWC1Xiioxj5CeWmNMK5HE+PmTDrXipPMlp+EsaicG52bcfJ3bf3sggdnGfGmXsqtAi7AtqWcdYDTHVupFD0wyierVdDFs1VTjPw6YLUXWgO5MTPbyAINa1HrhHu28wMtuaQpFm+zDmyabjJpIB4B2DKlFl+3QKLTiGpJmhGT2HWPiQbuNzUAIHLim75O2X2CKl5q41t6tz5EOuc5oy/A6WM5+4VyECxiLlYlxyCy8lJJGo9p1D7KZv5EJ1dkpvPvjxQ7yFGtOq4XqouyepXJldrtbuTlbh6jA2eUdrGxcX0e6TpMa1SfwAUgWhu1Sld3z7hCJKZ6ZjzA6vmPNrvEbDHVnCSp8XVjmxYtjfqnXXj/A1NI2Tzf081NgN9/qpr6YdxxH5Alt6SCqBa0Fuv4SMhAx9dkeASCTqoIxmHDWz8kceVE970otFliZQjqtCvtRPEuT4JUPh9Q1JCoP1KPFAv8XWr1bUCuPC3PtUQxcb1OgQbNjmIBAIQmgtQ4Oa0eqqyv2r7n3PClXYUN40AADghet+sgCtGrIoUAYQ+1uETUG00ff+0HcXFnYve8WqACUY2pNdvBg56RlGJGfNeKHqOhr5jmx1dIeZsFrV+rb/rSl2ldA30cpj3KvKFWJIxtgTwMwajTk6+V1rOCbdum/ax9om90IF6NeSKsRcYD4p0KZOKL7jtgvy 067MQzJg 4/ORQIm8pOA6vC1dlH3mwgsq5laGQUjoQ+n/JB7cpANOVQoPUibC1+SZKqmFHM/TPhPIDDEzHK5wLbVPJ6BlPZAov6VyJRRzQEk/oGbSiex/l5wiMzSaV/LvQbm07WVFeZa9sQOFQKZhD7QSXUl73ev/JWRQz76rDTG+AjT+2u1V+QpAfjTSM2uxDM5b++t+2peWWf/t0OasnI1qqC7l/M1i3Dbu7SVCZrrgQU+6P0VWoo8NcZ3LtDiOKFQYvTrgaBGZThYOFBpJ41aOyhRhPDNpR57cQaowB89rcq47JCWPxHHtfkeTwQW5xanvXG7aJ5w5A/pQVmLPc9fy0SK9vPBWI9YyXsVjngA58PzAEHcSNGaq+cf++LeDtAi9kW2kD+rVkeXWM4daU5RTtRXmGme8aI9mrAPiUCzyHwxdTYgHqtF1h6st+kdoeIeywEjfert5X4qHlg2FLbQw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000051, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/14 04:25, Jann Horn wrote: > This fixes two issues. > > I discovered that the following race can occur: > > mfill_atomic other thread > ============ ============ > > pmdp_get_lockless() [reads none pmd] > > > > __pte_alloc [no-op] > > > BUG_ON(pmd_none(*dst_pmd)) > > I have experimentally verified this in a kernel with extra mdelay() calls; > the BUG_ON(pmd_none(*dst_pmd)) triggers. > > On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow > pte_offset_map[_lock]() to fail"), this can't lead to anything worse than > a BUG_ON(), since the page table access helpers are actually designed to > deal with page tables concurrently disappearing; but on older kernels > (<=6.4), I think we could probably theoretically race past the two BUG_ON() > checks and end up treating a hugepage as a page table. > > The second issue is that, as Qi Zheng pointed out, there are other types of > huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs > (in particular, migration PMDs). > On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a > PMD that contains a migration entry (which just requires winning a single, > fairly wide race), it will pass the PMD to pte_offset_map_lock(), which > assumes that the PMD points to a page table. > Breakage follows: First, the kernel tries to take the PTE lock (which will > crash or maybe worse if there is no "struct page" for the address bits in > the migration entry PMD - I think at least on X86 there usually is no > corresponding "struct page" thanks to the PTE inversion mitigation, amd64 > looks different). > If that didn't crash, the kernel would next try to write a PTE into what it > wrongly thinks is a page table. > > As part of fixing these issues, get rid of the check for pmd_trans_huge() > before __pte_alloc() - that's redundant, we're going to have to check for > that after the __pte_alloc() anyway. > > Backport note: pmdp_get_lockless() is pmd_read_atomic() in older > kernels. > > Reported-by: Qi Zheng > Closes: https://lore.kernel.org/r/59bf3c2e-d58b-41af-ab10-3e631d802229@bytedance.com Ah, the issue fixed by this patch was not reported by me, so I think that this Reported-by does not need to be added. ;) > Cc: stable@vger.kernel.org > Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") > Signed-off-by: Jann Horn > --- > mm/userfaultfd.c | 22 ++++++++++++---------- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index e54e5c8907fa..290b2a0d84ac 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -787,21 +787,23 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > } > > dst_pmdval = pmdp_get_lockless(dst_pmd); > - /* > - * If the dst_pmd is mapped as THP don't > - * override it and just be strict. > - */ > - if (unlikely(pmd_trans_huge(dst_pmdval))) { > - err = -EEXIST; > - break; > - } > if (unlikely(pmd_none(dst_pmdval)) && > unlikely(__pte_alloc(dst_mm, dst_pmd))) { > err = -ENOMEM; > break; > } > - /* If an huge pmd materialized from under us fail */ > - if (unlikely(pmd_trans_huge(*dst_pmd))) { > + dst_pmdval = pmdp_get_lockless(dst_pmd); > + /* > + * If the dst_pmd is THP don't override it and just be strict. > + * (This includes the case where the PMD used to be THP and > + * changed back to none after __pte_alloc().) > + */ > + if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) || > + pmd_devmap(dst_pmdval))) { > + err = -EEXIST; > + break; > + } > + if (unlikely(pmd_bad(dst_pmdval))) { > err = -EFAULT; > break; > } Reviewed-by: Qi Zheng >