From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84D66C0015E for ; Fri, 28 Jul 2023 13:53:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19B238D0003; Fri, 28 Jul 2023 09:53:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 14C148D0001; Fri, 28 Jul 2023 09:53:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2EDA8D0003; Fri, 28 Jul 2023 09:53:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E4C8F8D0001 for ; Fri, 28 Jul 2023 09:53:44 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B06E9B165D for ; Fri, 28 Jul 2023 13:53:44 +0000 (UTC) X-FDA: 81061163568.27.3505889 Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) by imf01.hostedemail.com (Postfix) with ESMTP id CC70E40009 for ; Fri, 28 Jul 2023 13:53:42 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linaro.org header.s=google header.b=AfGejJ2z; dmarc=pass (policy=none) header.from=linaro.org; spf=pass (imf01.hostedemail.com: domain of yongqin.liu@linaro.org designates 209.85.219.179 as permitted sender) smtp.mailfrom=yongqin.liu@linaro.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690552422; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+J6kwTJsFkDX/akCe0Kyav4Ctw8RgrMFq9IKI8LH3rU=; b=KDs7jXrDOy+uUzM7B9TpTNKa6UTNjCjqeboMRldeOQogSUwScjLy+bbtVgESNUsILR7U4Q SnQngRep02gjgdLleKIvj93SDVOP/JFjl4qIMI38wd2x6xrt8iVoiYY7JkSvIwfSWJQAGZ FCLBfMP4yLsA5wVZ8MEtl2pkG9htP0w= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linaro.org header.s=google header.b=AfGejJ2z; dmarc=pass (policy=none) header.from=linaro.org; spf=pass (imf01.hostedemail.com: domain of yongqin.liu@linaro.org designates 209.85.219.179 as permitted sender) smtp.mailfrom=yongqin.liu@linaro.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690552422; a=rsa-sha256; cv=none; b=PxheK+cMNsWESRnOB5R3bdxNHdq19YobwPm99x22BYB1iN5nRpK8is8/LosPLb9EUcegUC NmrZtBt+WIuOSXqLdefdmJuSYM/7eVZhfC5ssrTk55cpsguiAgSHnJv1ExFdrMpwLOwFqX Zjam4vmD1wncHkPzt1dKGMY7FRp9glk= Received: by mail-yb1-f179.google.com with SMTP id 3f1490d57ef6-c2cf4e61bc6so1902028276.3 for ; Fri, 28 Jul 2023 06:53:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1690552422; x=1691157222; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+J6kwTJsFkDX/akCe0Kyav4Ctw8RgrMFq9IKI8LH3rU=; b=AfGejJ2zG81nEj5vxrgyIE/0vNuc3UKH3VK0k33HZgppeAPeRusn2oYdzhr0FEK1zy GIuOlWBOvDNROWcWTuk3SXDDSEAPYF5m+oi9LXEior8/BhNtoJzBbvvw52tZVZDntwM6 kDA3lGf8AHvOL/0NAhrFsIeLAje4s0M/rXHVrphJVFoS2JT+0WCq3QgPFJ/yVBt4175f G+oSTEwG+jB/Ln+IGKw9ezMgpXUF0CE1V1ZhC4WJKPdRDPJMzfm/c+qAibmSt0X+4p8q HsCnwib4e/U75+umYV+lZUdce/b5XcePnkBZyFyCI2HQplp2TMbkQnLQAF9UXwnnpsyM Wjqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690552422; x=1691157222; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+J6kwTJsFkDX/akCe0Kyav4Ctw8RgrMFq9IKI8LH3rU=; b=f1Em0GE57Rkc9jSZudIwU88spqyQ8l1KpwR/eW2vvSbRRxAH3urJVNmIdMkQdzUtrk j9vGxU+WfeZN3Ts3Pxl4yIX2QJ0LjUYv4Q5+X15ve2yQA1+j90tIGOAIc82f+O4WhH4Y YMzIbp5KIvpVwy2lA8Jr8axlGoT4bvpPKL4y8pAMnhpTqYtlM80Qa8c5OB1xz27+Zz9Z WqSF8OeCw7DPP6154Rtzk8noLyc+CXfGoq0E7G/RI8dfWb1mw/d1h/R8K+efN2nzilNG AlBLnEu1X04/wIVJ/b1XWHMfnfLxOJynh0hVnGHdbSDMYUcuGg0sIHgCkgwn31altRnK IgkA== X-Gm-Message-State: ABy/qLYuXtuOOMfgihAZfeCgcUfNo1wcJwic6VSvfLs2H2Ej2jAeA/RI KwWKRfVrDEizhSwtxF1CkdpG2zk/sHVPtzJQ9oQr9Q== X-Google-Smtp-Source: APBJJlF2d3xK3sg+/6K4TnCA5oLube1UGdmEuzZz7RxKgPSrEJuOf8//6B1N/CRO8ojAIYosmEKuBaRon4XkWVtVN6g= X-Received: by 2002:a25:40c8:0:b0:d1a:6ea4:474d with SMTP id n191-20020a2540c8000000b00d1a6ea4474dmr1594515yba.53.1690552421647; Fri, 28 Jul 2023 06:53:41 -0700 (PDT) MIME-Version: 1.0 References: <2929bfd-9893-a374-e463-4c3127ff9b9d@google.com> In-Reply-To: <2929bfd-9893-a374-e463-4c3127ff9b9d@google.com> From: Yongqin Liu Date: Fri, 28 Jul 2023 21:53:29 +0800 Message-ID: Subject: Re: [PATCH v2 04/32] mm/pgtable: allow pte_offset_map[_lock]() to fail To: Hugh Dickins Cc: Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CC70E40009 X-Stat-Signature: 77b9m8x7mskcuqeig3k8xcijcyut9qsi X-HE-Tag: 1690552422-412928 X-HE-Meta: U2FsdGVkX1/pzkFH+6cwhvJy/O8SJI2Pl1nnU6E5346yEvSZCYkyDo+VAhJFgHJp0o1lvUAQRwAVltzSMbOz2G0FCVRLNNEMeOjgGLC8c8VdtMl+1yBAK+ykN83fgppL8bjbPekkoXvH8QIYXNTH3KcsckyYDMTMQcKjn3KTfchW5A0L3PFiCPDTOUcJqb/nPY/5xcX84faks2fkXt9Ds80Bm6kW90w4b1YzeDuHnkM0KN8gTZVn9mzcWz2VQbyIZF4ZbBU/Wg5OV4NVTAsG5otYaD/rNihevTODZbLk0BaV0P+9TLm8NH43xS4fyXLCsMxTMDMRHKPwlAFmJiQ0YN5xpI4XNYtYbWNJRF0ciGm7l0jPExCK+C8+s9gVuLY3zui4TTNG8SHiidM2bSvILhru/bG5VGciU0/9qfOrq1tOyhKqVpBOCVRyQLUni14ojv+oYOE0GLYMVGDRBJUAn1kl8J7dUvmU7Z2mEyQzJN2mLB/iIEBEI1TSGL+vJqypYgPhrPGHnRVq1+jzdNnDz+CmdwK5z9a0ZQblutBoipeWW0P9wjxvth+rCUTJJxLkDpr8kYIDM966CyjnnotrJpNiVBubrOVJIxPIhaqCrERETN6J44ph/kBqh+bVFI0peqVFaqA9714LJbAFLvOM6fSoGzXddFsMRepO97TLz1vipNezTrnAindyxBRQs2ZcH0oq6gQbdCCsm/1KuqRN/vLyemi5oKNtX49UPrpyI1451Dh2RXHv+OAjOg/CvYAQr03xp1ko1lP5+/dKeFDdonQFpULa0UYQ1jN2dBgJKmOJ7BAw7r2PaiTouvuhdMGWe/hOwVIbAiCqncC5q5XbibiAL464Okv8xfFN4lQYmJob8rbbztyX9/3uSCUjGvkTkxWkvF/x3TpQm5rzVM2BsCHx0aADFPZYe+I6yQRfvCMKaxh5afIKmsJbLK5MVYK/NvWKYzEFwkbyagV8jMD u9fkDppm hN7oq19YDz/foUJhFC7fUn0vx8RXVkewK4VewbNRjtlMzkyzx3xAmAJZ1loee0tC4sBj7PBwPt0euEZKJYCtqP+NgQOmSQCuBE96PG5ifOYVBlYhzyUek4z5dJVncXVG0iY3UKWD4DecZR8YBmZols5Ldgz+My+pCzdU2hbGPGwOzevY37YPo1boNi8adiTkTfEQVPoW+MdsloW5foh4qJXqw7JirGq1AdT0iTzVsKOxODA1mEXarjwxvqVe+3Nus51daVqyP5dBxbw3PWWDVb4JSFYLu8aduaXbYUyZKXNW1LpbvtUh57tGSZvYV1TiFu4jAnMt+4MUm6+QBjscYb4Pp0miM0IrWtk6F X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Hugh It seems this change makes pte_offset_map_lock not possible to be called in out of tree modules, otherwise it will report error like this: ERROR: modpost: "__pte_offset_map_lock" [../omap-modules/android-mainline/pvr/pvrsrvkm.ko] undefined! Not sure if you have any idea about it, and any suggestions on how to resolve it? Thanks, Yongqin Liu On Fri, 9 Jun 2023 at 09:10, Hugh Dickins wrote: > > Make pte_offset_map() a wrapper for __pte_offset_map() (optionally > outputs pmdval), pte_offset_map_lock() a sparse __cond_lock wrapper for > __pte_offset_map_lock(): those __funcs added in mm/pgtable-generic.c. > > __pte_offset_map() do pmdval validation (including pmd_clear_bad() > when pmd_bad()), returning NULL if pmdval is not for a page table. > __pte_offset_map_lock() verify pmdval unchanged after getting the > lock, trying again if it changed. > > No #ifdef CONFIG_TRANSPARENT_HUGEPAGE around them: that could be done > to cover the imminent case, but we expect to generalize it later, and > it makes a mess of where to do the pmd_bad() clearing. > > Add pte_offset_map_nolock(): outputs ptl like pte_offset_map_lock(), > without actually taking the lock. This will be preferred to open uses of > pte_lockptr(), because (when split ptlock is in page table's struct page) > it points to the right lock for the returned pte pointer, even if *pmd > gets changed racily afterwards. > > Update corresponding Documentation. > > Do not add the anticipated rcu_read_lock() and rcu_read_unlock()s yet: > they have to wait until all architectures are balancing pte_offset_map()s > with pte_unmap()s (as in the arch series posted earlier). But comment > where they will go, so that it's easy to add them for experiments. And > only when those are in place can transient racy failure cases be enabled. > Add more safety for the PAE mismatched pmd_low pmd_high case at that time. > > Signed-off-by: Hugh Dickins > --- > Documentation/mm/split_page_table_lock.rst | 17 ++++--- > include/linux/mm.h | 27 +++++++---- > include/linux/pgtable.h | 22 ++++++--- > mm/pgtable-generic.c | 56 ++++++++++++++++++++++ > 4 files changed, 101 insertions(+), 21 deletions(-) > > diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst > index 50ee0dfc95be..a834fad9de12 100644 > --- a/Documentation/mm/split_page_table_lock.rst > +++ b/Documentation/mm/split_page_table_lock.rst > @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->page_table_lock. > There are helpers to lock/unlock a table and other accessor functions: > > - pte_offset_map_lock() > - maps pte and takes PTE table lock, returns pointer to the taken > - lock; > + maps PTE and takes PTE table lock, returns pointer to PTE with > + pointer to its PTE table lock, or returns NULL if no PTE table; > + - pte_offset_map_nolock() > + maps PTE, returns pointer to PTE with pointer to its PTE table > + lock (not taken), or returns NULL if no PTE table; > + - pte_offset_map() > + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; > + - pte_unmap() > + unmaps PTE table; > - pte_unmap_unlock() > unlocks and unmaps PTE table; > - pte_alloc_map_lock() > - allocates PTE table if needed and take the lock, returns pointer > - to taken lock or NULL if allocation failed; > - - pte_lockptr() > - returns pointer to PTE table lock; > + allocates PTE table if needed and takes its lock, returns pointer to > + PTE with pointer to its lock, or returns NULL if allocation failed; > - pmd_lock() > takes PMD table lock, returns pointer to taken lock; > - pmd_lockptr() > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 27ce77080c79..3c2e56980853 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2787,14 +2787,25 @@ static inline void pgtable_pte_page_dtor(struct page *page) > dec_lruvec_page_state(page, NR_PAGETABLE); > } > > -#define pte_offset_map_lock(mm, pmd, address, ptlp) \ > -({ \ > - spinlock_t *__ptl = pte_lockptr(mm, pmd); \ > - pte_t *__pte = pte_offset_map(pmd, address); \ > - *(ptlp) = __ptl; \ > - spin_lock(__ptl); \ > - __pte; \ > -}) > +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp); > +static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr) > +{ > + return __pte_offset_map(pmd, addr, NULL); > +} > + > +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, > + unsigned long addr, spinlock_t **ptlp); > +static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, > + unsigned long addr, spinlock_t **ptlp) > +{ > + pte_t *pte; > + > + __cond_lock(*ptlp, pte = __pte_offset_map_lock(mm, pmd, addr, ptlp)); > + return pte; > +} > + > +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, > + unsigned long addr, spinlock_t **ptlp); > > #define pte_unmap_unlock(pte, ptl) do { \ > spin_unlock(ptl); \ > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index 94235ff2706e..3fabbb018557 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -94,14 +94,22 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) > #define pte_offset_kernel pte_offset_kernel > #endif > > -#if defined(CONFIG_HIGHPTE) > -#define pte_offset_map(dir, address) \ > - ((pte_t *)kmap_local_page(pmd_page(*(dir))) + \ > - pte_index((address))) > -#define pte_unmap(pte) kunmap_local((pte)) > +#ifdef CONFIG_HIGHPTE > +#define __pte_map(pmd, address) \ > + ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) > +#define pte_unmap(pte) do { \ > + kunmap_local((pte)); \ > + /* rcu_read_unlock() to be added later */ \ > +} while (0) > #else > -#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) > -#define pte_unmap(pte) ((void)(pte)) /* NOP */ > +static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) > +{ > + return pte_offset_kernel(pmd, address); > +} > +static inline void pte_unmap(pte_t *pte) > +{ > + /* rcu_read_unlock() to be added later */ > +} > #endif > > /* Find an entry in the second-level page table.. */ > diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c > index d2fc52bffafc..c7ab18a5fb77 100644 > --- a/mm/pgtable-generic.c > +++ b/mm/pgtable-generic.c > @@ -10,6 +10,8 @@ > #include > #include > #include > +#include > +#include > #include > #include > > @@ -229,3 +231,57 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, > } > #endif > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > + > +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) > +{ > + pmd_t pmdval; > + > + /* rcu_read_lock() to be added later */ > + pmdval = pmdp_get_lockless(pmd); > + if (pmdvalp) > + *pmdvalp = pmdval; > + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) > + goto nomap; > + if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) > + goto nomap; > + if (unlikely(pmd_bad(pmdval))) { > + pmd_clear_bad(pmd); > + goto nomap; > + } > + return __pte_map(&pmdval, addr); > +nomap: > + /* rcu_read_unlock() to be added later */ > + return NULL; > +} > + > +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, > + unsigned long addr, spinlock_t **ptlp) > +{ > + pmd_t pmdval; > + pte_t *pte; > + > + pte = __pte_offset_map(pmd, addr, &pmdval); > + if (likely(pte)) > + *ptlp = pte_lockptr(mm, &pmdval); > + return pte; > +} > + > +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, > + unsigned long addr, spinlock_t **ptlp) > +{ > + spinlock_t *ptl; > + pmd_t pmdval; > + pte_t *pte; > +again: > + pte = __pte_offset_map(pmd, addr, &pmdval); > + if (unlikely(!pte)) > + return pte; > + ptl = pte_lockptr(mm, &pmdval); > + spin_lock(ptl); > + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { > + *ptlp = ptl; > + return pte; > + } > + pte_unmap_unlock(pte, ptl); > + goto again; > +} > -- > 2.35.3 > -- Best Regards, Yongqin Liu --------------------------------------------------------------- #mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android