From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49DD3C3DA49 for ; Fri, 26 Jul 2024 03:03:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F5B56B008C; Thu, 25 Jul 2024 23:03:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A5DF6B0092; Thu, 25 Jul 2024 23:03:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 894746B0098; Thu, 25 Jul 2024 23:03:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6B26F6B008C for ; Thu, 25 Jul 2024 23:03:20 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0816DA14F6 for ; Fri, 26 Jul 2024 03:03:20 +0000 (UTC) X-FDA: 82380407760.09.C7A8D3D Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by imf24.hostedemail.com (Postfix) with ESMTP id 3B490180002 for ; Fri, 26 Jul 2024 03:03:15 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ilsfmeQA; spf=pass (imf24.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721962972; a=rsa-sha256; cv=none; b=0R4Jq3j9Zz1LSGF7eYsPuz+LyGnXL7CeHrXCzcsam77jDF5knc55OuilaRP2Ji/Ft+lr2h 983/2AJAMo2brHzVFyUQzso9IkfaHo4gIddeVOMzFdWvQWC90UMdvhqlwCBR1k698fbQTJ 8/boC3YcI9Y1Ei4ypZpFyOoI5O/DjeE= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ilsfmeQA; spf=pass (imf24.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721962972; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DEeU61Mv+VNz/k6t7DEYpHVLFckAb60VA3KKLUZPnoE=; b=jORu8Df9x4OODT/Q9+c9CEL1ra16u1+Y04oP3TzHnCH99cWBL8ztvGvmSVSr6TH/ScBojw 6KrxPOpm9P3VaHTzTkwYNTTk2+KjGnyIrn4j2/KA9zHItxqHms4pVhGQD162mpUnEus0CD UMff+h7FoFVR4I/M3eXsnGtCBLw9jFg= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1721962991; h=Message-ID:Date:MIME-Version:Subject:From:To:Content-Type; bh=DEeU61Mv+VNz/k6t7DEYpHVLFckAb60VA3KKLUZPnoE=; b=ilsfmeQAQkn95+2ROcE+51MI+yW8dMx+Axt8ha9qUhlzT60JzASDOpQf2ztjEuodhSkQxt7gqDuR01dZQo9nzF5kvxnGj6BiOQMRRoHlmPfXQkFoX58dIGkkBBFj1gn6bY3i7KzxBkSSFpYNTe7phMzCaQ+4I/i9iURY9jZuC7E= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033068173054;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0WBK7fww_1721962989; Received: from 30.97.56.64(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WBK7fww_1721962989) by smtp.aliyun-inc.com; Fri, 26 Jul 2024 11:03:10 +0800 Message-ID: <707d8937-d4c8-43b3-bc19-70f0038522a9@linux.alibaba.com> Date: Fri, 26 Jul 2024 11:03:09 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking From: Baolin Wang To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton , Muchun Song , Peter Xu , Oscar Salvador , stable@vger.kernel.org References: <20240725183955.2268884-1-david@redhat.com> <20240725183955.2268884-3-david@redhat.com> <0067dfe6-b9a6-4e98-9eef-7219299bfe58@linux.alibaba.com> In-Reply-To: <0067dfe6-b9a6-4e98-9eef-7219299bfe58@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: zxo3qgrgiba418mnc5dxd14whukmp3w7 X-Rspamd-Queue-Id: 3B490180002 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1721962995-575590 X-HE-Meta: U2FsdGVkX1/crUmAbVXe1TGwMp8M9X/fyvvuX1oiFQjAkUUc/hLFyE47b7RwzNO2AGO1Fcmv3bYEX39epXnf1aZLc8k5MpZjkbGkEy/suYYzZb2LK4s5wUE5PKNxVMA1IHhFGzMdfjHkwjg4kFFY0crl0DJNKRSYo6Mo06Mzv8tY/rGQqNVuU5+QjSqemMLE2MgnTHQCMC02yGxBob5nM2x4NXzbn4Tg/RG1IYC7t8SNx/B+d887P0BVhzk9gH/6gSAvDbna85kiyfoZgoyUBHaVl/+quZIaa2kK6Q72RTBW0v0gRuYgQYWCTIvhnp83+GqR6r6OObtTjhSKP7nF6jTQSECgIax6JvLcCKAf2VW0RWgJ5GdW9RlfGP/nfWepO8t5f1WIXjFQRR3++L1NhcTtrU9deCeCAEkjMgZJ5Vl4QA3wsKWQqohwqPGY6lR7lY6hoQYSwjRlQTgt5lRXIF6EoOp5nVDa0yqkD2HDtb1twFbtIaOQiRweAdOiCo4BWiJXjB1vij0KFlbviGgn8zREG0TlEeGd1KfNLOjUx2GbaARI76A9QLfCXKaUvi3KaYHYLTmKn8xZPrcZ7i4UleguYDa4e6GxlNEN/da3v0dzIpCNVImwqKIW8ZG3+9FHZzVXalVjgTsaKtpwa/nufp6t2dKrr0P8LQAPX5uFrCQKuEGPoQ7G9tsLxFD9hDzlhEA7D4DPMVWIEkQ5auoETKYNaSrKXuwYw54dPhnFqnDNV0lWhbqSxsbHNOZrOX//eu6nD5Xmo8qZi3MtkhDzazY9ZQVvfOsk3Z1OOm9ODsLqtGobBTxxgA1WBBvgxVp8aNfxX6om48NpCh3sRyx6p8A+dc6A6I2ecXJcY5elbnlCgZas+339FCeTbzuthgHrutb0C9+ViuFM/U7W3Vnv/AYokDMXMFkzo75SbuUjssr6vjlgEev7/avhnXSUrnDC4XLdRDPhC74u3bhWLjT vhyYMzwm RiwN/dmte27lXqrhAkYEbZ2o6PSUmEb/Y01WMo0XkyySvChBggASqwabmR7nW/M48u7kZiSLTkxIw/yb3pBRguYY8vEnoLC9PounHoKslaufLpTpEFf0i6A8FX022xxhbJ8fIKsSzH1jRRA6gUjwYcelfVPZFYqDRaZtCHFsIFMoEpOlwcyS/524S5HKCqwNABNN4aLcx6B99aJIbCBKd6yMw2BNcGZFcUhZ2I3TwyElmVdSWJnmqArg4AmD0NMYu0RtZAnD1pTy6dpWUYj+tv798BYBXw+apNLx07Uqr1eHob6ztjqQZuueUtb0timv6MIYEsJk6bLCMY+yECMz73EscgiliHTIBjkDitnpVu1qm/UTZSd78jqO6jmR8lJTtS/JhV+MVhS8Ejdasi1HoczUKU/F2Ax0n/VgvPVjgZoP3Km8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/7/26 10:33, Baolin Wang wrote: > > > On 2024/7/26 02:39, David Hildenbrand wrote: >> We recently made GUP's common page table walking code to also walk >> hugetlb VMAs without most hugetlb special-casing, preparing for the >> future of having less hugetlb-specific page table walking code in the >> codebase. Turns out that we missed one page table locking detail: page >> table locking for hugetlb folios that are not mapped using a single >> PMD/PUD. >> >> Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB >> hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the >> page tables, will perform a pte_offset_map_lock() to grab the PTE table >> lock. >> >> However, hugetlb that concurrently modifies these page tables would >> actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the >> locks would differ. Something similar can happen right now with hugetlb >> folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS. >> >> Let's make huge_pte_lockptr() effectively uses the same PT locks as any >> core-mm page table walker would. > > Thanks for raising the issue again. I remember fixing this issue 2 years > ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a > CONT-PTE/PMD size hugetlb page"), but it seems to be broken again. > >> There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb >> folio being mapped using two PTE page tables. While hugetlb wants to take >> the PMD table lock, core-mm would grab the PTE table lock of one of both >> PTE page tables. In such corner cases, we have to make sure that both >> locks match, which is (fortunately!) currently guaranteed for 8xx as it >> does not support SMP. >> >> Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic >> follow_page_mask code") >> Cc: >> Signed-off-by: David Hildenbrand >> --- >>   include/linux/hugetlb.h | 25 ++++++++++++++++++++++--- >>   1 file changed, 22 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >> index c9bf68c239a01..da800e56fe590 100644 >> --- a/include/linux/hugetlb.h >> +++ b/include/linux/hugetlb.h >> @@ -944,10 +944,29 @@ static inline bool htlb_allow_alloc_fallback(int >> reason) >>   static inline spinlock_t *huge_pte_lockptr(struct hstate *h, >>                          struct mm_struct *mm, pte_t *pte) >>   { >> -    if (huge_page_size(h) == PMD_SIZE) >> +    VM_WARN_ON(huge_page_size(h) == PAGE_SIZE); >> +    VM_WARN_ON(huge_page_size(h) >= P4D_SIZE); >> + >> +    /* >> +     * hugetlb must use the exact same PT locks as core-mm page table >> +     * walkers would. When modifying a PTE table, hugetlb must take the >> +     * PTE PT lock, when modifying a PMD table, hugetlb must take the >> PMD >> +     * PT lock etc. >> +     * >> +     * The expectation is that any hugetlb folio smaller than a PMD is >> +     * always mapped into a single PTE table and that any hugetlb folio >> +     * smaller than a PUD (but at least as big as a PMD) is always >> mapped >> +     * into a single PMD table. > > ARM64 also supports cont-PMD size hugetlb, which is 32MiB size with a 4 > KiB base page size. This means the PT locks for 32MiB hugetlb may race > again, as we currently only hold one PMD lock for several PMD entries of > a cont-PMD size hugetlb. > >> +     * >> +     * If that does not hold for an architecture, then that architecture >> +     * must disable split PT locks such that all *_lockptr() functions >> +     * will give us the same result: the per-MM PT lock. >> +     */ >> +    if (huge_page_size(h) < PMD_SIZE) >> +        return pte_lockptr(mm, pte); >> +    else if (huge_page_size(h) < PUD_SIZE) >>           return pmd_lockptr(mm, (pmd_t *) pte); > > IIUC, as I said above, this change doesn't fix the inconsistent lock for > cont-PMD size hugetlb for GUP, and it will also break the lock rule for > unmapping/migrating a cont-PMD size hugetlb (use mm->page_table_lock > before for cont-PMD size hugetlb before). After more thinking, I realized I confused the PMD table with the PMD entry. Therefore, using the PMD table's lock is safe for cont-PMD size hugetlb. This change looks good to me. Sorry for noise. Please feel free to add: Reviewed-by: Baolin Wang > >> -    VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); >> -    return &mm->page_table_lock; >> +    return pud_lockptr(mm, (pud_t *) pte); >>   } >>   #ifndef hugepages_supported