* [PATCH 0/3] Fix some issues when looking up hugetlb page
@ 2022-08-19 10:12 Baolin Wang
2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel
Hi,
On ARM64 architecture, it can support CONT-PTE/PMD size hugetlb. When
looking up hugetlb page by follow_page(), we will hold the incorrect
pte/pmd lock for the CONT-PTE/PMD size hugetlb page, which will make
the pte/pmd entry unstable even under the lock and cause some potential
race issues. So considering the CONT-PTE/PMD size hugetlb, this patch set
changes to use the correct function to get the correct pte/pmd entry lock
to make the pte/pmd entry stable.
Baolin Wang (3):
mm/gup: fix races when looking up a CONT-PTE size hugetlb page
mm/hugetlb: fix races when looking up a CONT-PMD size hugetlb page
mm/hugetlb: add FOLL_MIGRATION validation before waiting for a
migration entry
include/linux/hugetlb.h | 4 ++--
mm/gup.c | 24 +++++++++++++++++++++---
mm/hugetlb.c | 17 ++++++++++++++---
3 files changed, 37 insertions(+), 8 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size hugetlb page
2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
@ 2022-08-19 10:12 ` Baolin Wang
2022-08-19 16:34 ` kernel test robot
2022-08-19 17:46 ` kernel test robot
2022-08-19 10:12 ` [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD " Baolin Wang
` (2 subsequent siblings)
3 siblings, 2 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size
specified.
When looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte lock is incorrect for
the CONT-PTE size hugetlb, since we should use mm->page_table_lock
by huge_pte_lockptr().
That means the pte entry of the CONT-PTE size hugetlb under current
pte lock is unstable in follow_page_pte(), we can continue to migrate
or poison the pte entry of the CONT-PTE size hugetlb, which can cause
some potential race issues, since the pte entry is unstable, and
following pte_xxx() validation is also incorrect in follow_page_pte(),
even though they are under the 'pte lock'.
To fix this issue, we should validate if it is a CONT-PTE size VMA
at first, and use huge_pte_lockptr() to get the correct pte lock
and get the pte value by huge_ptep_get() to make the pte entry stable
under the correct pte lock.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/gup.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 5aa7531..3b2fa86 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -534,8 +534,26 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
- ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
- pte = *ptep;
+ /*
+ * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+ * ARM64 architecture.
+ */
+ if (is_vm_hugetlb_page(vma)) {
+ struct hstate *hstate = hstate_vma(vma);
+ unsigned long size = huge_page_size(hstate);
+
+ ptep = huge_pte_offset(mm, address, size);
+ if (!ptep)
+ return no_page_table(vma, flags);
+
+ ptl = huge_pte_lockptr(hstate, mm, ptep);
+ spin_lock(ptl);
+ pte = huge_ptep_get(ptep);
+ } else {
+ ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+ pte = *ptep;
+ }
+
if (!pte_present(pte)) {
swp_entry_t entry;
/*
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD size hugetlb page
2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
@ 2022-08-19 10:12 ` Baolin Wang
2022-08-19 10:12 ` [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang
2022-08-20 0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
3 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size
specified.
When looking up a CONT-PMD size hugetlb page by follow_page(), it will
always use the pmd lock to protect the pmd entry in follow_huge_pmd().
However this is not the correct lock for CONT-PMD size hugetlb, instead
we should use mm->page_table_lock for the CONT-PMD size hugetlb to make
sure the pmd entry is stable.
Thus changing to use huge_pte_lock() to get the correct pmd entry lock
for CONT-PMD size hugetlb to fix the potential race.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/hugetlb.h | 4 ++--
mm/gup.c | 2 +-
mm/hugetlb.c | 7 ++++---
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3ec981a..dbc2773 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -207,7 +207,7 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
struct page *follow_huge_pd(struct vm_area_struct *vma,
unsigned long address, hugepd_t hpd,
int flags, int pdshift);
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+struct page *follow_huge_pmd(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, int flags);
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags);
@@ -312,7 +312,7 @@ static inline struct page *follow_huge_pd(struct vm_area_struct *vma,
return NULL;
}
-static inline struct page *follow_huge_pmd(struct mm_struct *mm,
+static inline struct page *follow_huge_pmd(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, int flags)
{
return NULL;
diff --git a/mm/gup.c b/mm/gup.c
index 3b2fa86..0856964 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -680,7 +680,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
if (pmd_none(pmdval))
return no_page_table(vma, flags);
if (pmd_huge(pmdval) && is_vm_hugetlb_page(vma)) {
- page = follow_huge_pmd(mm, address, pmd, flags);
+ page = follow_huge_pmd(vma, address, pmd, flags);
if (page)
return page;
return no_page_table(vma, flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ea1c7bf..efb53ba 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6960,9 +6960,11 @@ struct page * __weak
}
struct page * __weak
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+follow_huge_pmd(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, int flags)
{
+ struct mm_struct *mm = vma->vm_mm;
+ struct hstate *hstate = hstate_vma(vma);
struct page *page = NULL;
spinlock_t *ptl;
pte_t pte;
@@ -6975,8 +6977,7 @@ struct page * __weak
return NULL;
retry:
- ptl = pmd_lockptr(mm, pmd);
- spin_lock(ptl);
+ ptl = huge_pte_lock(hstate, mm, (pte_t *)pmd);
/*
* make sure that the address range covered by this pmd is not
* unmapped from other threads.
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry
2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
2022-08-19 10:12 ` [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD " Baolin Wang
@ 2022-08-19 10:12 ` Baolin Wang
2022-08-20 0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
3 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel
The hugetlb should keep the same logics with normal page when waiting
for a migration pte entry, that means we should also validate if
the FOLL_MIGRATION flag is set before waiting for a migration pte entry
of a hugetlb page.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/hugetlb.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index efb53ba..fac1b33 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7000,6 +7000,11 @@ struct page * __weak
goto out;
}
} else {
+ if (!(flags & FOLL_MIGRATION)) {
+ page = NULL;
+ goto out;
+ }
+
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
__migration_entry_wait_huge((pte_t *)pmd, ptl);
@@ -7038,6 +7043,11 @@ struct page * __weak
goto out;
}
} else {
+ if (!(flags & FOLL_MIGRATION)) {
+ page = NULL;
+ goto out;
+ }
+
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
__migration_entry_wait(mm, (pte_t *)pud, ptl);
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size hugetlb page
2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
@ 2022-08-19 16:34 ` kernel test robot
2022-08-19 17:46 ` kernel test robot
1 sibling, 0 replies; 8+ messages in thread
From: kernel test robot @ 2022-08-19 16:34 UTC (permalink / raw)
To: Baolin Wang, akpm, songmuchun, mike.kravetz
Cc: kbuild-all, baolin.wang, linux-mm, linux-kernel
Hi Baolin,
I love your patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.0-rc1 next-20220819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
config: microblaze-randconfig-r003-20220819 (https://download.01.org/0day-ci/archive/20220820/202208200030.9j5HjdtZ-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/c0add09e9de4b39c58633c89ea25ba10ed12d134
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
git checkout c0add09e9de4b39c58633c89ea25ba10ed12d134
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=microblaze SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
mm/gup.c: In function 'follow_page_pte':
>> mm/gup.c:551:23: error: implicit declaration of function 'huge_ptep_get' [-Werror=implicit-function-declaration]
551 | pte = huge_ptep_get(ptep);
| ^~~~~~~~~~~~~
>> mm/gup.c:551:23: error: incompatible types when assigning to type 'pte_t' from type 'int'
cc1: some warnings being treated as errors
vim +/huge_ptep_get +551 mm/gup.c
518
519 static struct page *follow_page_pte(struct vm_area_struct *vma,
520 unsigned long address, pmd_t *pmd, unsigned int flags,
521 struct dev_pagemap **pgmap)
522 {
523 struct mm_struct *mm = vma->vm_mm;
524 struct page *page;
525 spinlock_t *ptl;
526 pte_t *ptep, pte;
527 int ret;
528
529 /* FOLL_GET and FOLL_PIN are mutually exclusive. */
530 if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
531 (FOLL_PIN | FOLL_GET)))
532 return ERR_PTR(-EINVAL);
533 retry:
534 if (unlikely(pmd_bad(*pmd)))
535 return no_page_table(vma, flags);
536
537 /*
538 * Considering PTE level hugetlb, like continuous-PTE hugetlb on
539 * ARM64 architecture.
540 */
541 if (is_vm_hugetlb_page(vma)) {
542 struct hstate *hstate = hstate_vma(vma);
543 unsigned long size = huge_page_size(hstate);
544
545 ptep = huge_pte_offset(mm, address, size);
546 if (!ptep)
547 return no_page_table(vma, flags);
548
549 ptl = huge_pte_lockptr(hstate, mm, ptep);
550 spin_lock(ptl);
> 551 pte = huge_ptep_get(ptep);
552 } else {
553 ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
554 pte = *ptep;
555 }
556
557 if (!pte_present(pte)) {
558 swp_entry_t entry;
559 /*
560 * KSM's break_ksm() relies upon recognizing a ksm page
561 * even while it is being migrated, so for that case we
562 * need migration_entry_wait().
563 */
564 if (likely(!(flags & FOLL_MIGRATION)))
565 goto no_page;
566 if (pte_none(pte))
567 goto no_page;
568 entry = pte_to_swp_entry(pte);
569 if (!is_migration_entry(entry))
570 goto no_page;
571 pte_unmap_unlock(ptep, ptl);
572 migration_entry_wait(mm, pmd, address);
573 goto retry;
574 }
575 if ((flags & FOLL_NUMA) && pte_protnone(pte))
576 goto no_page;
577
578 page = vm_normal_page(vma, address, pte);
579
580 /*
581 * We only care about anon pages in can_follow_write_pte() and don't
582 * have to worry about pte_devmap() because they are never anon.
583 */
584 if ((flags & FOLL_WRITE) &&
585 !can_follow_write_pte(pte, page, vma, flags)) {
586 page = NULL;
587 goto out;
588 }
589
590 if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
591 /*
592 * Only return device mapping pages in the FOLL_GET or FOLL_PIN
593 * case since they are only valid while holding the pgmap
594 * reference.
595 */
596 *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
597 if (*pgmap)
598 page = pte_page(pte);
599 else
600 goto no_page;
601 } else if (unlikely(!page)) {
602 if (flags & FOLL_DUMP) {
603 /* Avoid special (like zero) pages in core dumps */
604 page = ERR_PTR(-EFAULT);
605 goto out;
606 }
607
608 if (is_zero_pfn(pte_pfn(pte))) {
609 page = pte_page(pte);
610 } else {
611 ret = follow_pfn_pte(vma, address, ptep, flags);
612 page = ERR_PTR(ret);
613 goto out;
614 }
615 }
616
617 if (!pte_write(pte) && gup_must_unshare(flags, page)) {
618 page = ERR_PTR(-EMLINK);
619 goto out;
620 }
621
622 VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
623 !PageAnonExclusive(page), page);
624
625 /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
626 if (unlikely(!try_grab_page(page, flags))) {
627 page = ERR_PTR(-ENOMEM);
628 goto out;
629 }
630 /*
631 * We need to make the page accessible if and only if we are going
632 * to access its content (the FOLL_PIN case). Please see
633 * Documentation/core-api/pin_user_pages.rst for details.
634 */
635 if (flags & FOLL_PIN) {
636 ret = arch_make_page_accessible(page);
637 if (ret) {
638 unpin_user_page(page);
639 page = ERR_PTR(ret);
640 goto out;
641 }
642 }
643 if (flags & FOLL_TOUCH) {
644 if ((flags & FOLL_WRITE) &&
645 !pte_dirty(pte) && !PageDirty(page))
646 set_page_dirty(page);
647 /*
648 * pte_mkyoung() would be more correct here, but atomic care
649 * is needed to avoid losing the dirty bit: it is easier to use
650 * mark_page_accessed().
651 */
652 mark_page_accessed(page);
653 }
654 out:
655 pte_unmap_unlock(ptep, ptl);
656 return page;
657 no_page:
658 pte_unmap_unlock(ptep, ptl);
659 if (!pte_none(pte))
660 return NULL;
661 return no_page_table(vma, flags);
662 }
663
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size hugetlb page
2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
2022-08-19 16:34 ` kernel test robot
@ 2022-08-19 17:46 ` kernel test robot
1 sibling, 0 replies; 8+ messages in thread
From: kernel test robot @ 2022-08-19 17:46 UTC (permalink / raw)
To: Baolin Wang, akpm, songmuchun, mike.kravetz
Cc: llvm, kbuild-all, baolin.wang, linux-mm, linux-kernel
Hi Baolin,
I love your patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.0-rc1 next-20220819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
config: riscv-randconfig-r042-20220819 (https://download.01.org/0day-ci/archive/20220820/202208200109.XEXN0wPy-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 0ac597f3cacf60479ffd36b03766fa7462dabd78)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/c0add09e9de4b39c58633c89ea25ba10ed12d134
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
git checkout c0add09e9de4b39c58633c89ea25ba10ed12d134
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> mm/gup.c:551:9: error: implicit declaration of function 'huge_ptep_get' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pte = huge_ptep_get(ptep);
^
>> mm/gup.c:551:7: error: assigning to 'pte_t' from incompatible type 'int'
pte = huge_ptep_get(ptep);
^ ~~~~~~~~~~~~~~~~~~~
2 errors generated.
vim +/huge_ptep_get +551 mm/gup.c
518
519 static struct page *follow_page_pte(struct vm_area_struct *vma,
520 unsigned long address, pmd_t *pmd, unsigned int flags,
521 struct dev_pagemap **pgmap)
522 {
523 struct mm_struct *mm = vma->vm_mm;
524 struct page *page;
525 spinlock_t *ptl;
526 pte_t *ptep, pte;
527 int ret;
528
529 /* FOLL_GET and FOLL_PIN are mutually exclusive. */
530 if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
531 (FOLL_PIN | FOLL_GET)))
532 return ERR_PTR(-EINVAL);
533 retry:
534 if (unlikely(pmd_bad(*pmd)))
535 return no_page_table(vma, flags);
536
537 /*
538 * Considering PTE level hugetlb, like continuous-PTE hugetlb on
539 * ARM64 architecture.
540 */
541 if (is_vm_hugetlb_page(vma)) {
542 struct hstate *hstate = hstate_vma(vma);
543 unsigned long size = huge_page_size(hstate);
544
545 ptep = huge_pte_offset(mm, address, size);
546 if (!ptep)
547 return no_page_table(vma, flags);
548
549 ptl = huge_pte_lockptr(hstate, mm, ptep);
550 spin_lock(ptl);
> 551 pte = huge_ptep_get(ptep);
552 } else {
553 ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
554 pte = *ptep;
555 }
556
557 if (!pte_present(pte)) {
558 swp_entry_t entry;
559 /*
560 * KSM's break_ksm() relies upon recognizing a ksm page
561 * even while it is being migrated, so for that case we
562 * need migration_entry_wait().
563 */
564 if (likely(!(flags & FOLL_MIGRATION)))
565 goto no_page;
566 if (pte_none(pte))
567 goto no_page;
568 entry = pte_to_swp_entry(pte);
569 if (!is_migration_entry(entry))
570 goto no_page;
571 pte_unmap_unlock(ptep, ptl);
572 migration_entry_wait(mm, pmd, address);
573 goto retry;
574 }
575 if ((flags & FOLL_NUMA) && pte_protnone(pte))
576 goto no_page;
577
578 page = vm_normal_page(vma, address, pte);
579
580 /*
581 * We only care about anon pages in can_follow_write_pte() and don't
582 * have to worry about pte_devmap() because they are never anon.
583 */
584 if ((flags & FOLL_WRITE) &&
585 !can_follow_write_pte(pte, page, vma, flags)) {
586 page = NULL;
587 goto out;
588 }
589
590 if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
591 /*
592 * Only return device mapping pages in the FOLL_GET or FOLL_PIN
593 * case since they are only valid while holding the pgmap
594 * reference.
595 */
596 *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
597 if (*pgmap)
598 page = pte_page(pte);
599 else
600 goto no_page;
601 } else if (unlikely(!page)) {
602 if (flags & FOLL_DUMP) {
603 /* Avoid special (like zero) pages in core dumps */
604 page = ERR_PTR(-EFAULT);
605 goto out;
606 }
607
608 if (is_zero_pfn(pte_pfn(pte))) {
609 page = pte_page(pte);
610 } else {
611 ret = follow_pfn_pte(vma, address, ptep, flags);
612 page = ERR_PTR(ret);
613 goto out;
614 }
615 }
616
617 if (!pte_write(pte) && gup_must_unshare(flags, page)) {
618 page = ERR_PTR(-EMLINK);
619 goto out;
620 }
621
622 VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
623 !PageAnonExclusive(page), page);
624
625 /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
626 if (unlikely(!try_grab_page(page, flags))) {
627 page = ERR_PTR(-ENOMEM);
628 goto out;
629 }
630 /*
631 * We need to make the page accessible if and only if we are going
632 * to access its content (the FOLL_PIN case). Please see
633 * Documentation/core-api/pin_user_pages.rst for details.
634 */
635 if (flags & FOLL_PIN) {
636 ret = arch_make_page_accessible(page);
637 if (ret) {
638 unpin_user_page(page);
639 page = ERR_PTR(ret);
640 goto out;
641 }
642 }
643 if (flags & FOLL_TOUCH) {
644 if ((flags & FOLL_WRITE) &&
645 !pte_dirty(pte) && !PageDirty(page))
646 set_page_dirty(page);
647 /*
648 * pte_mkyoung() would be more correct here, but atomic care
649 * is needed to avoid losing the dirty bit: it is easier to use
650 * mark_page_accessed().
651 */
652 mark_page_accessed(page);
653 }
654 out:
655 pte_unmap_unlock(ptep, ptl);
656 return page;
657 no_page:
658 pte_unmap_unlock(ptep, ptl);
659 if (!pte_none(pte))
660 return NULL;
661 return no_page_table(vma, flags);
662 }
663
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] Fix some issues when looking up hugetlb page
2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
` (2 preceding siblings ...)
2022-08-19 10:12 ` [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang
@ 2022-08-20 0:08 ` Mike Kravetz
2022-08-21 5:54 ` Baolin Wang
3 siblings, 1 reply; 8+ messages in thread
From: Mike Kravetz @ 2022-08-20 0:08 UTC (permalink / raw)
To: Baolin Wang; +Cc: akpm, songmuchun, linux-mm, linux-kernel
On 08/19/22 18:12, Baolin Wang wrote:
> Hi,
>
> On ARM64 architecture, it can support CONT-PTE/PMD size hugetlb. When
> looking up hugetlb page by follow_page(), we will hold the incorrect
> pte/pmd lock for the CONT-PTE/PMD size hugetlb page, which will make
> the pte/pmd entry unstable even under the lock and cause some potential
> race issues. So considering the CONT-PTE/PMD size hugetlb, this patch set
> changes to use the correct function to get the correct pte/pmd entry lock
> to make the pte/pmd entry stable.
Thank you for looking at this.
I often get confused by arm64 CONT-PTE/PMD layout, so my understanding may be
wrong. Can we use the PMD page lock for locking both CONT-PTE and CONT-PMD
entries? Again, I may be confused by the CONT-* page table layout, but it
seems these would all be referenced via that same PMD page of the page table.
Or, perhaps CONT-PMD can span multiple PMD pages?
If we can use PMD page for locking, this would be much finer grain that
lock in the mm.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] Fix some issues when looking up hugetlb page
2022-08-20 0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
@ 2022-08-21 5:54 ` Baolin Wang
0 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-21 5:54 UTC (permalink / raw)
To: Mike Kravetz; +Cc: akpm, songmuchun, linux-mm, linux-kernel
On 8/20/2022 8:08 AM, Mike Kravetz wrote:
> On 08/19/22 18:12, Baolin Wang wrote:
>> Hi,
>>
>> On ARM64 architecture, it can support CONT-PTE/PMD size hugetlb. When
>> looking up hugetlb page by follow_page(), we will hold the incorrect
>> pte/pmd lock for the CONT-PTE/PMD size hugetlb page, which will make
>> the pte/pmd entry unstable even under the lock and cause some potential
>> race issues. So considering the CONT-PTE/PMD size hugetlb, this patch set
>> changes to use the correct function to get the correct pte/pmd entry lock
>> to make the pte/pmd entry stable.
>
> Thank you for looking at this.
>
> I often get confused by arm64 CONT-PTE/PMD layout, so my understanding may be
> wrong. Can we use the PMD page lock for locking both CONT-PTE and CONT-PMD
> entries? Again, I may be confused by the CONT-* page table layout, but it
> seems these would all be referenced via that same PMD page of the page table.
> Or, perhaps CONT-PMD can span multiple PMD pages?
Good point. CONT-PMD can not span multiple PMD pages, so I think
CONT-PMD can use PMD pagetable page lock, but CONT-PTE also can not span
multiple PTE pagetable page lock, so I think CONT-PTE can use the PTE
pagetable page lock to get a fine grained lock.
I will add CONT-PTE and CONT-PMD case in huge_pte_lockptr() in next
version. Thanks for your comment.
> If we can use PMD page for locking, this would be much finer grain that
> lock in the mm.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-08-21 5:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
2022-08-19 16:34 ` kernel test robot
2022-08-19 17:46 ` kernel test robot
2022-08-19 10:12 ` [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD " Baolin Wang
2022-08-19 10:12 ` [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang
2022-08-20 0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
2022-08-21 5:54 ` Baolin Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox