From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C27F7E77199 for ; Wed, 8 Jan 2025 07:35:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 441F46B00B1; Wed, 8 Jan 2025 02:35:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F25F6B00B3; Wed, 8 Jan 2025 02:35:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B9F56B00B4; Wed, 8 Jan 2025 02:35:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0DC936B00B1 for ; Wed, 8 Jan 2025 02:35:27 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 68C34A0EE6 for ; Wed, 8 Jan 2025 07:35:26 +0000 (UTC) X-FDA: 82983474252.11.123555C Received: from mail-vs1-f51.google.com (mail-vs1-f51.google.com [209.85.217.51]) by imf28.hostedemail.com (Postfix) with ESMTP id 8E1D4C000F for ; Wed, 8 Jan 2025 07:35:24 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hRxdm3sV; spf=pass (imf28.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736321724; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MIjbliaL6LbfSG3hGvRn/B/hnLWa9tStkSeXPH2Ndok=; b=v3kuFRYNO8zaGSgo7yC1oRWQdCRn9N1zkdvpo6a/MikE1Uj+1JP0W7MqbRU/9WJJ8viK8d 29HQwDweObevE9SyypYgwSzLDJ+c+PStTwlkgYyU2kuqVVkZYvfo8k8QSV4oHY4HlPJdBz VYcil54qLU16zEHO56PC+Fi6/PozIWs= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hRxdm3sV; spf=pass (imf28.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736321724; a=rsa-sha256; cv=none; b=Y0XkDa3Z1ghnHCozoBkDq2u2DALU6aJZoNQKpkw8WmxQE8451Yn0kyrrgAbfg5TMpaip1X EBP4eTJYIXtDEU7le0DyIRxbS1giy8X3t6+Q+7SWa+gwoFk3t7MVf0kjD68Z88o6K3teJD 1juKrLodrSzn4/rnhtnYOWiiH95ErCA= Received: by mail-vs1-f51.google.com with SMTP id ada2fe7eead31-4aff04f17c7so387280137.0 for ; Tue, 07 Jan 2025 23:35:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736321723; x=1736926523; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MIjbliaL6LbfSG3hGvRn/B/hnLWa9tStkSeXPH2Ndok=; b=hRxdm3sV1gF7cwLl1/KSf8SXfxj4uQXims6HokvsrMAaWYBdm4YNWRPxsmIk5GDCwF H3L+OPpc5ZiJyxVKQPHo1qszZ4h5UHJLFwJ9pt3o3MIuc0Fpr/kDJG/0MVM5m2/JXXio QTpPDnTZ5rQjdOijLcqewjIU3pixPzsTeGVKx9Ya3/GeXidXsvnas+iVBbXGALv8Djv9 3gGry6PvH38lnneNvIMPqR8RkEZzHim0etkjEh7g3kSQ19W1FzyIg1R3soEqimqzIMER MFC/Wc+c1nFQUqCHu8GwJxxkJDEITuyzu9ugAF89DNvUhbFGcQqcqi1Opw1/dM+trtU/ u5gQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736321723; x=1736926523; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MIjbliaL6LbfSG3hGvRn/B/hnLWa9tStkSeXPH2Ndok=; b=gMoYYfVQv348ELqpzrwC9pCfnNqM0oVCCOf3ELaBrzb3QLvxwejnxVeWaxetpJVQnh VsPKmuoSjnrFqcfvHT8T38AJq5RSp48DSjMhGLy1Rx2elZEkPo7DknadoHFjUy606p4v MOB68eiOHqvYIbhvUwljQkZxzRoJU7flwFzOmKBVRCeVjGe3ehbgV4mUI9rpvJZoFuVK JxZLAQq+dO2AnMOaymcfrE1spIHyAxJDi2+S3elKOX3Mqdcef01U8hTDUp9YKTA6BJAQ OcJgYA23y31oXU3QcTZqsVsqUPoZ/CJgQtGnDtG5Pna58R0CMDIlnqpQ60ZxKoPg/9Pk j3og== X-Forwarded-Encrypted: i=1; AJvYcCUAFasglnuyq82g8foWd2SqUwx2xPkPyuhf110tjtiU+m7n+iCHa8JFb6tdP3ELn63ar6oVq5KNQw==@kvack.org X-Gm-Message-State: AOJu0YzGR4kJzM8cXbDwwNs8CqcrSkk2Zh0Duz5bhOCtjhIKAve4z/13 7/2gz12TFNVhvGj1Oe8FVXZ9wSj1JJ/vS069v1z3KXszQEp8Bq01YbpUy871/UElpP+NBj+Y5D6 yLCQrjwTeEHpxIZNHtcmaFQ11ic0OGZ3c5xQ3 X-Gm-Gg: ASbGncsz92rV6aWFh13o7/kzoisP6gskZJpXwo+61lIoR8KPwi96DAwVbdJSqJgV9la Dyf150mIjGs30bN5YmWDL5QgVw9NYlJes17pVnLOopHIajD/6jRI0wvMra+HkT8A6h8TytQ== X-Google-Smtp-Source: AGHT+IFSeSP9Eoa+vMN0mQGdJUMiMiQ7wtNl9O4C/Kb8kEKVJi7enaqH09x1zpeQ4bi2gwmsl/jroIVpz3cZ3nwHFZE= X-Received: by 2002:a05:6102:5245:b0:4af:d554:add4 with SMTP id ada2fe7eead31-4b3c1bd45famr4319681137.0.1736321723454; Tue, 07 Jan 2025 23:35:23 -0800 (PST) MIME-Version: 1.0 References: <20250107043505.351925-1-yuzhao@google.com> <85ea0bc9-0892-489d-b42b-430ff8a1f368@redhat.com> In-Reply-To: <85ea0bc9-0892-489d-b42b-430ff8a1f368@redhat.com> From: Yu Zhao Date: Wed, 8 Jan 2025 00:34:46 -0700 X-Gm-Features: AbW1kvbgEPMpKXk4596c6aAUll8AcqXY4yARbNVXFnbaDnDY_EG7JryH5itCovM Message-ID: Subject: Re: [PATCH mm-unstable v1] mm/hugetlb_vmemmap: fix memory loads ordering To: David Hildenbrand Cc: Andrew Morton , Mateusz Guzik , "Matthew Wilcox (Oracle)" , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Will Deacon Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8E1D4C000F X-Rspamd-Server: rspam12 X-Stat-Signature: yzkunn6tufmhf9ypbandfmchiofrwsq6 X-Rspam-User: X-HE-Tag: 1736321724-765108 X-HE-Meta: U2FsdGVkX1+3dNqMhD9rdhH3jrVo0ItB74MbyPxemEYUvwRGg99QCPxiqiGpOl5I0WlWx9d9uvFppzl5p3KxmJEejNKhgzhQmphEY5uR0JumAMcZExXv50CdO2mcgjfrUmyooGcvSBlK/I9W9Jx/1Ggkje2e+Bffeyht6qcFVFy+Xkwg7NpTRpxrlSbhFNvqpNkrb8+4HQokL9l2/to0l+snyvVmWjsL8A96Hn/xqa56G1poavzrWYYoUkI+ADR97js5koAQg9NJeOlSUeg7hqxb53GBplzPq1d7PhCUZWRQGk7Lokl/OVHVvfA2zXGURYWnGu6lsrFcOeZ79xm+9mvv0O0XHS0rP8SZOko1E8vY9exkxlkDjIUa06NQ1iKFXGPII/YPTKF7CVJF+LLURuxuXbhkPAZUmg35JKIKHTZQTFwdBME87Y2Jzro7RSPlqms030rpA6l/6WRSZVx6Tg482LdiJN6qrzfWmb9Ih//ory4IL/Xc408Uqig/BTBma3pesq8PQHpmq/3oysRI0x1X8mC3OL5flU5y4a0XzhdO10Nck5LXGGcccsTdHC5mR7/m7tkmBx7dWlkPdPzaeEXuqk01NNGTBRPBXP6AsZLLK+yX6r2r8Uyza3GNzUdPa6uG2rFU1BA7TuNnt6cYPbkvV7jqZbJP1FRV0xSkIKNcUgaFjCCpo5VRy8jTazGdB6bZDcXfZkiSJwlGJi2Am8cxEiZhUICiUJI2DU4i3aaVQ77clmhYRR4v/m88IUMOKv0+kyJXmIuL/7U3AqS8Fuep4oeopLkny50tTPRw5ZauEdi74A+70tAU54JDHyZMsSS1iZfWcHjQs4ojA8GHk9dmJfn5Zix0JlMdHE6o2ftTr/2HusWomVXxg5nds2Bs7Q1SsqrRT136oQc4vB92wt2Gjz0Mi+NAalekDVW0D2QOeSUTv1KugyZYDU07RcloIyDhAm0ZO0SWV7fH98a vkYktZXn OvmvPVJ9XgbbXPUb/DlGy6B0j0Mc7yBuAXsoKxmeVk1dkyxYRvHPIDjDh+q3IRVDcemA7LT4cBP+IUUThk214W2SBrC2BhhWob1tAy5kuB8he07TIKfm1/b27XnDSFpi/dERvwD+nBZt9sWsV5sCCfedvrBOKJo8tAo0G/hPfeMTeWEgVSVbJ19+coBHU4Yn0ddQkI+atpBikpAfzQdSXIvWILjrHwKfINOukzeop17Xa0xTEGtPYH3ux8tJmuImpWla9nXISrGdKEp3VySdptMXXhAi9sr1JCpiK9ehqcHhwKIimCOjtI7SjUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 7, 2025 at 1:49=E2=80=AFAM David Hildenbrand = wrote: > > On 07.01.25 05:35, Yu Zhao wrote: > > Using x86_64 as an example, for a 32KB struct page[] area describing a > > 2MB hugeTLB, HVO reduces the area to 4KB by the following steps: > > 1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs; > > 2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped > > by PTE 0, and at the same time change the permission from r/w to > > r/o; > > 3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB > > to 4KB. > > > > However, the following race can happen due to improperly memory loads > > ordering: > > CPU 1 (HVO) CPU 2 (speculative PFN walker) > > > > page_ref_freeze() > > synchronize_rcu() > > rcu_read_lock() > > page_is_fake_head() is false > > vmemmap_remap_pte() > > XXX: struct page[] becomes r/o > > > > page_ref_unfreeze() > > page_ref_count() is not zero > > > > atomic_add_unless(&page->_refcount) > > XXX: try to modify r/o struct page[] > > > > Specifically, page_is_fake_head() must be ordered after > > page_ref_count() on CPU 2 so that it can only return true for this > > case, to avoid the later attempt to modify r/o struct page[]. > > I *think* this is correct. > > > > > This patch adds the missing memory barrier and makes the tests on > > page_is_fake_head() and page_ref_count() done in the proper order. > > > > Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN= walkers") > > Reported-by: Will Deacon > > Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/ > > Signed-off-by: Yu Zhao > > --- > > include/linux/page-flags.h | 2 +- > > include/linux/page_ref.h | 8 ++++++-- > > 2 files changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index 691506bdf2c5..6b8ecf86f1b6 100644 > > --- a/include/linux/page-flags.h > > +++ b/include/linux/page-flags.h > > @@ -212,7 +212,7 @@ static __always_inline const struct page *page_fixe= d_fake_head(const struct page > > * cold cacheline in some cases. > > */ > > if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && > > - test_bit(PG_head, &page->flags)) { > > + test_bit_acquire(PG_head, &page->flags)) { > > This change will affect all page_fixed_fake_head() users, like ordinary > PageTail even on !hugetlb. > > I assume you want an explicit memory barrier in the single problematic > caller instead. Let me make it HVO specific in v2. It might look cleaner that way.