* [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
@ 2023-01-25 1:57 Zach O'Keefe
2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Zach O'Keefe @ 2023-01-25 1:57 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Hugh Dickins, Yang Shi, Zach O'Keefe
During collapse, in a few places we check to see if a given small page
has any unaccounted references. If the refcount on the page doesn't
match our expectations, it must be there is an unknown user concurrently
interested in the page, and so it's not safe to move the contents
elsewhere. However, the unaccounted pins are likely an ephemeral state.
In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
collapse may succeed on retry.
Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
mm/khugepaged.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e23619bfecc4..fa38cae240b9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r)
case SCAN_CGROUP_CHARGE_FAIL:
return -EBUSY;
/* Resource temporary unavailable - trying again might succeed */
+ case SCAN_PAGE_COUNT:
case SCAN_PAGE_LOCK:
case SCAN_PAGE_LRU:
case SCAN_DEL_PAGE_LRU:
--
2.39.1.405.gd4c25cc71f-goog
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups 2023-01-25 1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe @ 2023-01-25 1:57 ` Zach O'Keefe 2023-01-25 12:54 ` kernel test robot 2023-01-25 13:38 ` kernel test robot 2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi 2023-02-09 5:09 ` Hugh Dickins 2 siblings, 2 replies; 11+ messages in thread From: Zach O'Keefe @ 2023-01-25 1:57 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Andrew Morton, Hugh Dickins, Yang Shi, Zach O'Keefe, stable In commit 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none(): - if (!pmd_present(pmde)) - return SCAN_PMD_NULL; + if (pmd_none(pmde)) + return SCAN_PMD_NONE; This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include: A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address. In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or it's a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd. However, the commit introduces a logical hole; namely, that we've allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values aren't checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd. We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch. The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86's pmd_present() also checks the _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit. Covering all 4 cases for x86 (all checks done on the same pmd value): 1) pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger? The only relevant path is: madvise_collapse() -> collapse_pte_mapped_thp() Where we might just incorrectly report back "success", when really the memory isn't pmd-backed. This is fine, since split could happen immediately after (actually) successful madvise_collapse(). So, it should be safe to just assume huge-pmd here. 2) pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock -- or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated. 3) !pmd_present() && pmd_trans_huge() Not possible. 4) !pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present I've checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesn't necessarily hold in general -- but that doesn't matter since !pmd_present() always takes failure path). Also, add a comment above find_pmd_or_thp_or_none() to help future travelers reason about the validity of the code; namely, the possible mutations that might happen out from under us, depending on how mmap_lock is held (if at all). Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Zach O'Keefe <zokeefe@google.com> Cc: stable@vger.kernel.org --- Request that this be pulled into stable since it's theoretically possible (though I have no reproducer) that while mmap_lock is dropped, racing thp migration installs a pmd migration entry which then has a path to be consumed, unchecked, by pte_offset_map(). --- mm/khugepaged.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index fa38cae240b9..7ea668bbea70 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -941,6 +941,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return SCAN_SUCCEED; } +/* + * See pmd_trans_unstable() for how the result may change out from + * underneath us, even if we hold mmap_lock in read. + */ static int find_pmd_or_thp_or_none(struct mm_struct *mm, unsigned long address, pmd_t **pmd) @@ -959,8 +963,12 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, #endif if (pmd_none(pmde)) return SCAN_PMD_NONE; + if (!pmd_present(pmde)) + return SCAN_PMD_NULL; if (pmd_trans_huge(pmde)) return SCAN_PMD_MAPPED; + if (pmd_devmap(pmd)) + return SCAN_PMD_NULL; if (pmd_bad(pmde)) return SCAN_PMD_NULL; return SCAN_SUCCEED; -- 2.39.1.405.gd4c25cc71f-goog ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups 2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe @ 2023-01-25 12:54 ` kernel test robot 2023-01-25 13:38 ` kernel test robot 1 sibling, 0 replies; 11+ messages in thread From: kernel test robot @ 2023-01-25 12:54 UTC (permalink / raw) To: Zach O'Keefe, linux-mm Cc: llvm, oe-kbuild-all, linux-kernel, Andrew Morton, Linux Memory Management List, Hugh Dickins, Yang Shi, Zach O'Keefe, stable Hi Zach, Thank you for the patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20230125015738.912924-2-zokeefe%40google.com patch subject: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups config: x86_64-randconfig-r025-20230123 (https://download.01.org/0day-ci/archive/20230125/202301252033.HoFIRXm4-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/6001eb9a8f1687a1d0b72831d991886106cac37b git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954 git checkout 6001eb9a8f1687a1d0b72831d991886106cac37b # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> mm/khugepaged.c:972:17: error: passing 'pmd_t **' to parameter of incompatible type 'pmd_t' if (pmd_devmap(pmd)) ^~~ arch/x86/include/asm/pgtable.h:254:36: note: passing argument to parameter 'pmd' here static inline int pmd_devmap(pmd_t pmd) ^ 1 error generated. vim +972 mm/khugepaged.c 945 946 /* 947 * See pmd_trans_unstable() for how the result may change out from 948 * underneath us, even if we hold mmap_lock in read. 949 */ 950 static int find_pmd_or_thp_or_none(struct mm_struct *mm, 951 unsigned long address, 952 pmd_t **pmd) 953 { 954 pmd_t pmde; 955 956 *pmd = mm_find_pmd(mm, address); 957 if (!*pmd) 958 return SCAN_PMD_NULL; 959 960 pmde = pmdp_get_lockless(*pmd); 961 962 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 963 /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ 964 barrier(); 965 #endif 966 if (pmd_none(pmde)) 967 return SCAN_PMD_NONE; 968 if (!pmd_present(pmde)) 969 return SCAN_PMD_NULL; 970 if (pmd_trans_huge(pmde)) 971 return SCAN_PMD_MAPPED; > 972 if (pmd_devmap(pmd)) 973 return SCAN_PMD_NULL; 974 if (pmd_bad(pmde)) 975 return SCAN_PMD_NULL; 976 return SCAN_SUCCEED; 977 } 978 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups 2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe 2023-01-25 12:54 ` kernel test robot @ 2023-01-25 13:38 ` kernel test robot 1 sibling, 0 replies; 11+ messages in thread From: kernel test robot @ 2023-01-25 13:38 UTC (permalink / raw) To: Zach O'Keefe, linux-mm Cc: oe-kbuild-all, linux-kernel, Andrew Morton, Linux Memory Management List, Hugh Dickins, Yang Shi, Zach O'Keefe, stable Hi Zach, Thank you for the patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20230125015738.912924-2-zokeefe%40google.com patch subject: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20230125/202301252110.hFYRsbrm-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/6001eb9a8f1687a1d0b72831d991886106cac37b git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Zach-O-Keefe/mm-MADV_COLLAPSE-catch-none-huge-bad-pmd-lookups/20230125-095954 git checkout 6001eb9a8f1687a1d0b72831d991886106cac37b # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 olddefconfig make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): mm/khugepaged.c: In function 'find_pmd_or_thp_or_none': >> mm/khugepaged.c:972:24: error: incompatible type for argument 1 of 'pmd_devmap' 972 | if (pmd_devmap(pmd)) | ^~~ | | | pmd_t ** In file included from include/linux/pgtable.h:6, from include/linux/mm.h:29, from mm/khugepaged.c:4: arch/x86/include/asm/pgtable.h:254:36: note: expected 'pmd_t' but argument is of type 'pmd_t **' 254 | static inline int pmd_devmap(pmd_t pmd) | ~~~~~~^~~ vim +/pmd_devmap +972 mm/khugepaged.c 945 946 /* 947 * See pmd_trans_unstable() for how the result may change out from 948 * underneath us, even if we hold mmap_lock in read. 949 */ 950 static int find_pmd_or_thp_or_none(struct mm_struct *mm, 951 unsigned long address, 952 pmd_t **pmd) 953 { 954 pmd_t pmde; 955 956 *pmd = mm_find_pmd(mm, address); 957 if (!*pmd) 958 return SCAN_PMD_NULL; 959 960 pmde = pmdp_get_lockless(*pmd); 961 962 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 963 /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ 964 barrier(); 965 #endif 966 if (pmd_none(pmde)) 967 return SCAN_PMD_NONE; 968 if (!pmd_present(pmde)) 969 return SCAN_PMD_NULL; 970 if (pmd_trans_huge(pmde)) 971 return SCAN_PMD_MAPPED; > 972 if (pmd_devmap(pmd)) 973 return SCAN_PMD_NULL; 974 if (pmd_bad(pmde)) 975 return SCAN_PMD_NULL; 976 return SCAN_SUCCEED; 977 } 978 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount 2023-01-25 1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe 2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe @ 2023-01-25 18:06 ` Yang Shi 2023-01-25 19:15 ` Zach O'Keefe 2023-02-09 5:09 ` Hugh Dickins 2 siblings, 1 reply; 11+ messages in thread From: Yang Shi @ 2023-01-25 18:06 UTC (permalink / raw) To: Zach O'Keefe; +Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote: > > During collapse, in a few places we check to see if a given small page > has any unaccounted references. If the refcount on the page doesn't > match our expectations, it must be there is an unknown user concurrently > interested in the page, and so it's not safe to move the contents > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > collapse may succeed on retry. The page may be DMA pinned (for example, pin_user_pages()), it is not worth retrying for such pages. But it may also not be worth optimizing for this case at this point. So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com> > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > Reported-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > --- > mm/khugepaged.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index e23619bfecc4..fa38cae240b9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) > case SCAN_CGROUP_CHARGE_FAIL: > return -EBUSY; > /* Resource temporary unavailable - trying again might succeed */ > + case SCAN_PAGE_COUNT: > case SCAN_PAGE_LOCK: > case SCAN_PAGE_LRU: > case SCAN_DEL_PAGE_LRU: > -- > 2.39.1.405.gd4c25cc71f-goog > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount 2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi @ 2023-01-25 19:15 ` Zach O'Keefe 0 siblings, 0 replies; 11+ messages in thread From: Zach O'Keefe @ 2023-01-25 19:15 UTC (permalink / raw) To: Yang Shi; +Cc: linux-mm, linux-kernel, Andrew Morton, Hugh Dickins On Wed, Jan 25, 2023 at 10:06 AM Yang Shi <shy828301@gmail.com> wrote: > > On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > During collapse, in a few places we check to see if a given small page > > has any unaccounted references. If the refcount on the page doesn't > > match our expectations, it must be there is an unknown user concurrently > > interested in the page, and so it's not safe to move the contents > > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > > collapse may succeed on retry. > > The page may be DMA pinned (for example, pin_user_pages()), it is not > worth retrying for such pages. But it may also not be worth optimizing > for this case at this point. > > So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com> Thanks as always, Yang, and good point about DMA pinning. As you mentioned, I don't know if it's worth considering that too much right now, as it's unlikely these two uses (MADV_COLLAPSE and DMA pining) would be used together. We can revisit if necessary later if it's an issue, but for now, I think it's a win that MADV_COLLAPSE (+ a bounded userspace retry loop based off erno) is more likely to succeed. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount 2023-01-25 1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe 2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe 2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi @ 2023-02-09 5:09 ` Hugh Dickins 2023-02-09 21:28 ` Andrew Morton 2 siblings, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2023-02-09 5:09 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Hugh Dickins, Yang Shi, Zach O'Keefe On Tue, 24 Jan 2023, Zach O'Keefe wrote: > During collapse, in a few places we check to see if a given small page > has any unaccounted references. If the refcount on the page doesn't > match our expectations, it must be there is an unknown user concurrently > interested in the page, and so it's not safe to move the contents > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > collapse may succeed on retry. > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > Reported-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> This was Reviewed-by: Yang Shi <shy828301@gmail.com> and now I'll give it a nudge with Acked-by: Hugh Dickins <hughd@google.com> since it hasn't appeared in mm-unstable or linux-next yet: I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention. Thanks! Hugh > > --- > mm/khugepaged.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index e23619bfecc4..fa38cae240b9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) > case SCAN_CGROUP_CHARGE_FAIL: > return -EBUSY; > /* Resource temporary unavailable - trying again might succeed */ > + case SCAN_PAGE_COUNT: > case SCAN_PAGE_LOCK: > case SCAN_PAGE_LRU: > case SCAN_DEL_PAGE_LRU: > -- > 2.39.1.405.gd4c25cc71f-goog ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount 2023-02-09 5:09 ` Hugh Dickins @ 2023-02-09 21:28 ` Andrew Morton 2023-02-09 21:50 ` Hugh Dickins 0 siblings, 1 reply; 11+ messages in thread From: Andrew Morton @ 2023-02-09 21:28 UTC (permalink / raw) To: Hugh Dickins; +Cc: linux-mm, linux-kernel, Yang Shi, Zach O'Keefe On Wed, 8 Feb 2023 21:09:04 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > On Tue, 24 Jan 2023, Zach O'Keefe wrote: > > > During collapse, in a few places we check to see if a given small page > > has any unaccounted references. If the refcount on the page doesn't > > match our expectations, it must be there is an unknown user concurrently > > interested in the page, and so it's not safe to move the contents > > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > > collapse may succeed on retry. > > > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > > Reported-by: Hugh Dickins <hughd@google.com> > > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > This was > Reviewed-by: Yang Shi <shy828301@gmail.com> > and now I'll give it a nudge with > Acked-by: Hugh Dickins <hughd@google.com> > since it hasn't appeared in mm-unstable or linux-next yet: Buildbot failed on [2/2] so I skipped the whole series in expectation of a v2 series, which didn't happen. Instead, Zach trickily sent what was [2/2] as a standalone patch. So [1/2] got lost. Sigh, poor me. Thanks, I'll merge [1/2] into mm-hotfixes. > I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention. I'm not seeing anything in the [1/2] changelog which indicates that a backport is needed. IOW, # cat .signature When fixing a bug, please describe the end-user visible effects of that bug. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount 2023-02-09 21:28 ` Andrew Morton @ 2023-02-09 21:50 ` Hugh Dickins 2023-02-09 22:12 ` Andrew Morton 0 siblings, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2023-02-09 21:50 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, linux-mm, linux-kernel, Yang Shi, Zach O'Keefe On Thu, 9 Feb 2023, Andrew Morton wrote: > > Thanks, I'll merge [1/2] into mm-hotfixes. Great, thanks. > > I'm not seeing anything in the [1/2] changelog which indicates that a > backport is needed. IOW, Correct: it's just changing the errno for some racy cases from "you're wrong, don't bother me again" to "it might be worth having another go": not fixing an instability, as 2/2 was. > > # cat .signature > When fixing a bug, please describe the end-user visible effects of that bug. If whatever's being run by the end-user is coded to try again on -EAGAIN, then the end-user will less often see occasional unexplained failures. Hugh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount 2023-02-09 21:50 ` Hugh Dickins @ 2023-02-09 22:12 ` Andrew Morton 2023-02-09 22:29 ` Zach O'Keefe 0 siblings, 1 reply; 11+ messages in thread From: Andrew Morton @ 2023-02-09 22:12 UTC (permalink / raw) To: Hugh Dickins; +Cc: linux-mm, linux-kernel, Yang Shi, Zach O'Keefe On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > > > > I'm not seeing anything in the [1/2] changelog which indicates that a > > backport is needed. IOW, > > Correct: it's just changing the errno for some racy cases from "you're > wrong, don't bother me again" to "it might be worth having another go": > not fixing an instability, as 2/2 was. > > > > > # cat .signature > > When fixing a bug, please describe the end-user visible effects of that bug. > > If whatever's being run by the end-user is coded to try again on -EAGAIN, > then the end-user will less often see occasional unexplained failures. > OK, thanks. I redid the changelog's final paragraph thusly: : In this situation, MADV_COLLAPSE returns -EINVAL when it should return : -EAGAIN. This could cause userspace to conclude that the syscall failed, : when it in fact could succeed by retrying. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount 2023-02-09 22:12 ` Andrew Morton @ 2023-02-09 22:29 ` Zach O'Keefe 0 siblings, 0 replies; 11+ messages in thread From: Zach O'Keefe @ 2023-02-09 22:29 UTC (permalink / raw) To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, linux-kernel, Yang Shi On Thu, Feb 9, 2023 at 2:12 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > > > > > > > > I'm not seeing anything in the [1/2] changelog which indicates that a > > > backport is needed. IOW, > > > > Correct: it's just changing the errno for some racy cases from "you're > > wrong, don't bother me again" to "it might be worth having another go": > > not fixing an instability, as 2/2 was. > > > > > > > > # cat .signature > > > When fixing a bug, please describe the end-user visible effects of that bug. > > > > If whatever's being run by the end-user is coded to try again on -EAGAIN, > > then the end-user will less often see occasional unexplained failures. > > > > OK, thanks. I redid the changelog's final paragraph thusly: > > : In this situation, MADV_COLLAPSE returns -EINVAL when it should return > : -EAGAIN. This could cause userspace to conclude that the syscall failed, > : when it in fact could succeed by retrying. > This looks good to me. Thanks Andrew! Also thanks Hugh for being on the lookout for this patch -- I hastily read through my emails regarding which patches were merged where and had assumed this merged with 2/2. Also, apologies about the confusing v1 [1/2] and v2 [2/2] fiasco; in hindsight that probably wasn't the most decipherable thing to do :) Best, Zach ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-02-09 22:30 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-25 1:57 [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Zach O'Keefe 2023-01-25 1:57 ` [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Zach O'Keefe 2023-01-25 12:54 ` kernel test robot 2023-01-25 13:38 ` kernel test robot 2023-01-25 18:06 ` [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount Yang Shi 2023-01-25 19:15 ` Zach O'Keefe 2023-02-09 5:09 ` Hugh Dickins 2023-02-09 21:28 ` Andrew Morton 2023-02-09 21:50 ` Hugh Dickins 2023-02-09 22:12 ` Andrew Morton 2023-02-09 22:29 ` Zach O'Keefe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox