linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix some issues when looking up hugetlb page
@ 2022-08-19 10:12 Baolin Wang
  2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
  To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel

Hi,

On ARM64 architecture, it can support CONT-PTE/PMD size hugetlb. When
looking up hugetlb page by follow_page(), we will hold the incorrect
pte/pmd lock for the CONT-PTE/PMD size hugetlb page, which will make
the pte/pmd entry unstable even under the lock and cause some potential
race issues. So considering the CONT-PTE/PMD size hugetlb, this patch set
changes to use the correct function to get the correct pte/pmd entry lock
to make the pte/pmd entry stable.

Baolin Wang (3):
  mm/gup: fix races when looking up a CONT-PTE size hugetlb page
  mm/hugetlb: fix races when looking up a CONT-PMD size hugetlb page
  mm/hugetlb: add FOLL_MIGRATION validation before waiting for a
    migration entry

 include/linux/hugetlb.h |  4 ++--
 mm/gup.c                | 24 +++++++++++++++++++++---
 mm/hugetlb.c            | 17 ++++++++++++++---
 3 files changed, 37 insertions(+), 8 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size hugetlb page
  2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
@ 2022-08-19 10:12 ` Baolin Wang
  2022-08-19 16:34   ` kernel test robot
  2022-08-19 17:46   ` kernel test robot
  2022-08-19 10:12 ` [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD " Baolin Wang
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
  To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size
specified.

When looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte lock is incorrect for
the CONT-PTE size hugetlb, since we should use mm->page_table_lock
by huge_pte_lockptr().

That means the pte entry of the CONT-PTE size hugetlb under current
pte lock is unstable in follow_page_pte(), we can continue to migrate
or poison the pte entry of the CONT-PTE size hugetlb, which can cause
some potential race issues, since the pte entry is unstable, and
following pte_xxx() validation is also incorrect in follow_page_pte(),
even though they are under the 'pte lock'.

To fix this issue, we should validate if it is a CONT-PTE size VMA
at first, and use huge_pte_lockptr() to get the correct pte lock
and get the pte value by huge_ptep_get() to make the pte entry stable
under the correct pte lock.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/gup.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 5aa7531..3b2fa86 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -534,8 +534,26 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	if (unlikely(pmd_bad(*pmd)))
 		return no_page_table(vma, flags);
 
-	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
-	pte = *ptep;
+	/*
+	 * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+	 * ARM64 architecture.
+	 */
+	if (is_vm_hugetlb_page(vma)) {
+		struct hstate *hstate = hstate_vma(vma);
+		unsigned long size = huge_page_size(hstate);
+
+		ptep = huge_pte_offset(mm, address, size);
+		if (!ptep)
+			return no_page_table(vma, flags);
+
+		ptl = huge_pte_lockptr(hstate, mm, ptep);
+		spin_lock(ptl);
+		pte = huge_ptep_get(ptep);
+	} else {
+		ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+		pte = *ptep;
+	}
+
 	if (!pte_present(pte)) {
 		swp_entry_t entry;
 		/*
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD size hugetlb page
  2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
  2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
@ 2022-08-19 10:12 ` Baolin Wang
  2022-08-19 10:12 ` [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang
  2022-08-20  0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
  3 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
  To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size
specified.

When looking up a CONT-PMD size hugetlb page by follow_page(), it will
always use the pmd lock to protect the pmd entry in follow_huge_pmd().
However this is not the correct lock for CONT-PMD size hugetlb, instead
we should use mm->page_table_lock for the CONT-PMD size hugetlb to make
sure the pmd entry is stable.

Thus changing to use huge_pte_lock() to get the correct pmd entry lock
for CONT-PMD size hugetlb to fix the potential race.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/hugetlb.h | 4 ++--
 mm/gup.c                | 2 +-
 mm/hugetlb.c            | 7 ++++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3ec981a..dbc2773 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -207,7 +207,7 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
 struct page *follow_huge_pd(struct vm_area_struct *vma,
 			    unsigned long address, hugepd_t hpd,
 			    int flags, int pdshift);
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+struct page *follow_huge_pmd(struct vm_area_struct *vma, unsigned long address,
 				pmd_t *pmd, int flags);
 struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
 				pud_t *pud, int flags);
@@ -312,7 +312,7 @@ static inline struct page *follow_huge_pd(struct vm_area_struct *vma,
 	return NULL;
 }
 
-static inline struct page *follow_huge_pmd(struct mm_struct *mm,
+static inline struct page *follow_huge_pmd(struct vm_area_struct *vma,
 				unsigned long address, pmd_t *pmd, int flags)
 {
 	return NULL;
diff --git a/mm/gup.c b/mm/gup.c
index 3b2fa86..0856964 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -680,7 +680,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 	if (pmd_none(pmdval))
 		return no_page_table(vma, flags);
 	if (pmd_huge(pmdval) && is_vm_hugetlb_page(vma)) {
-		page = follow_huge_pmd(mm, address, pmd, flags);
+		page = follow_huge_pmd(vma, address, pmd, flags);
 		if (page)
 			return page;
 		return no_page_table(vma, flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ea1c7bf..efb53ba 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6960,9 +6960,11 @@ struct page * __weak
 }
 
 struct page * __weak
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
+follow_huge_pmd(struct vm_area_struct *vma, unsigned long address,
 		pmd_t *pmd, int flags)
 {
+	struct mm_struct *mm = vma->vm_mm;
+	struct hstate *hstate = hstate_vma(vma);
 	struct page *page = NULL;
 	spinlock_t *ptl;
 	pte_t pte;
@@ -6975,8 +6977,7 @@ struct page * __weak
 		return NULL;
 
 retry:
-	ptl = pmd_lockptr(mm, pmd);
-	spin_lock(ptl);
+	ptl = huge_pte_lock(hstate, mm, (pte_t *)pmd);
 	/*
 	 * make sure that the address range covered by this pmd is not
 	 * unmapped from other threads.
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry
  2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
  2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
  2022-08-19 10:12 ` [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD " Baolin Wang
@ 2022-08-19 10:12 ` Baolin Wang
  2022-08-20  0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
  3 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-19 10:12 UTC (permalink / raw)
  To: akpm, songmuchun, mike.kravetz; +Cc: baolin.wang, linux-mm, linux-kernel

The hugetlb should keep the same logics with normal page when waiting
for a migration pte entry, that means we should also validate if
the FOLL_MIGRATION flag is set before waiting for a migration pte entry
of a hugetlb page.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/hugetlb.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index efb53ba..fac1b33 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7000,6 +7000,11 @@ struct page * __weak
 			goto out;
 		}
 	} else {
+		if (!(flags & FOLL_MIGRATION)) {
+			page = NULL;
+			goto out;
+		}
+
 		if (is_hugetlb_entry_migration(pte)) {
 			spin_unlock(ptl);
 			__migration_entry_wait_huge((pte_t *)pmd, ptl);
@@ -7038,6 +7043,11 @@ struct page * __weak
 			goto out;
 		}
 	} else {
+		if (!(flags & FOLL_MIGRATION)) {
+			page = NULL;
+			goto out;
+		}
+
 		if (is_hugetlb_entry_migration(pte)) {
 			spin_unlock(ptl);
 			__migration_entry_wait(mm, (pte_t *)pud, ptl);
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size hugetlb page
  2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
@ 2022-08-19 16:34   ` kernel test robot
  2022-08-19 17:46   ` kernel test robot
  1 sibling, 0 replies; 8+ messages in thread
From: kernel test robot @ 2022-08-19 16:34 UTC (permalink / raw)
  To: Baolin Wang, akpm, songmuchun, mike.kravetz
  Cc: kbuild-all, baolin.wang, linux-mm, linux-kernel

Hi Baolin,

I love your patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.0-rc1 next-20220819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
config: microblaze-randconfig-r003-20220819 (https://download.01.org/0day-ci/archive/20220820/202208200030.9j5HjdtZ-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/c0add09e9de4b39c58633c89ea25ba10ed12d134
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
        git checkout c0add09e9de4b39c58633c89ea25ba10ed12d134
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=microblaze SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/gup.c: In function 'follow_page_pte':
>> mm/gup.c:551:23: error: implicit declaration of function 'huge_ptep_get' [-Werror=implicit-function-declaration]
     551 |                 pte = huge_ptep_get(ptep);
         |                       ^~~~~~~~~~~~~
>> mm/gup.c:551:23: error: incompatible types when assigning to type 'pte_t' from type 'int'
   cc1: some warnings being treated as errors


vim +/huge_ptep_get +551 mm/gup.c

   518	
   519	static struct page *follow_page_pte(struct vm_area_struct *vma,
   520			unsigned long address, pmd_t *pmd, unsigned int flags,
   521			struct dev_pagemap **pgmap)
   522	{
   523		struct mm_struct *mm = vma->vm_mm;
   524		struct page *page;
   525		spinlock_t *ptl;
   526		pte_t *ptep, pte;
   527		int ret;
   528	
   529		/* FOLL_GET and FOLL_PIN are mutually exclusive. */
   530		if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
   531				 (FOLL_PIN | FOLL_GET)))
   532			return ERR_PTR(-EINVAL);
   533	retry:
   534		if (unlikely(pmd_bad(*pmd)))
   535			return no_page_table(vma, flags);
   536	
   537		/*
   538		 * Considering PTE level hugetlb, like continuous-PTE hugetlb on
   539		 * ARM64 architecture.
   540		 */
   541		if (is_vm_hugetlb_page(vma)) {
   542			struct hstate *hstate = hstate_vma(vma);
   543			unsigned long size = huge_page_size(hstate);
   544	
   545			ptep = huge_pte_offset(mm, address, size);
   546			if (!ptep)
   547				return no_page_table(vma, flags);
   548	
   549			ptl = huge_pte_lockptr(hstate, mm, ptep);
   550			spin_lock(ptl);
 > 551			pte = huge_ptep_get(ptep);
   552		} else {
   553			ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
   554			pte = *ptep;
   555		}
   556	
   557		if (!pte_present(pte)) {
   558			swp_entry_t entry;
   559			/*
   560			 * KSM's break_ksm() relies upon recognizing a ksm page
   561			 * even while it is being migrated, so for that case we
   562			 * need migration_entry_wait().
   563			 */
   564			if (likely(!(flags & FOLL_MIGRATION)))
   565				goto no_page;
   566			if (pte_none(pte))
   567				goto no_page;
   568			entry = pte_to_swp_entry(pte);
   569			if (!is_migration_entry(entry))
   570				goto no_page;
   571			pte_unmap_unlock(ptep, ptl);
   572			migration_entry_wait(mm, pmd, address);
   573			goto retry;
   574		}
   575		if ((flags & FOLL_NUMA) && pte_protnone(pte))
   576			goto no_page;
   577	
   578		page = vm_normal_page(vma, address, pte);
   579	
   580		/*
   581		 * We only care about anon pages in can_follow_write_pte() and don't
   582		 * have to worry about pte_devmap() because they are never anon.
   583		 */
   584		if ((flags & FOLL_WRITE) &&
   585		    !can_follow_write_pte(pte, page, vma, flags)) {
   586			page = NULL;
   587			goto out;
   588		}
   589	
   590		if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
   591			/*
   592			 * Only return device mapping pages in the FOLL_GET or FOLL_PIN
   593			 * case since they are only valid while holding the pgmap
   594			 * reference.
   595			 */
   596			*pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
   597			if (*pgmap)
   598				page = pte_page(pte);
   599			else
   600				goto no_page;
   601		} else if (unlikely(!page)) {
   602			if (flags & FOLL_DUMP) {
   603				/* Avoid special (like zero) pages in core dumps */
   604				page = ERR_PTR(-EFAULT);
   605				goto out;
   606			}
   607	
   608			if (is_zero_pfn(pte_pfn(pte))) {
   609				page = pte_page(pte);
   610			} else {
   611				ret = follow_pfn_pte(vma, address, ptep, flags);
   612				page = ERR_PTR(ret);
   613				goto out;
   614			}
   615		}
   616	
   617		if (!pte_write(pte) && gup_must_unshare(flags, page)) {
   618			page = ERR_PTR(-EMLINK);
   619			goto out;
   620		}
   621	
   622		VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
   623			       !PageAnonExclusive(page), page);
   624	
   625		/* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
   626		if (unlikely(!try_grab_page(page, flags))) {
   627			page = ERR_PTR(-ENOMEM);
   628			goto out;
   629		}
   630		/*
   631		 * We need to make the page accessible if and only if we are going
   632		 * to access its content (the FOLL_PIN case).  Please see
   633		 * Documentation/core-api/pin_user_pages.rst for details.
   634		 */
   635		if (flags & FOLL_PIN) {
   636			ret = arch_make_page_accessible(page);
   637			if (ret) {
   638				unpin_user_page(page);
   639				page = ERR_PTR(ret);
   640				goto out;
   641			}
   642		}
   643		if (flags & FOLL_TOUCH) {
   644			if ((flags & FOLL_WRITE) &&
   645			    !pte_dirty(pte) && !PageDirty(page))
   646				set_page_dirty(page);
   647			/*
   648			 * pte_mkyoung() would be more correct here, but atomic care
   649			 * is needed to avoid losing the dirty bit: it is easier to use
   650			 * mark_page_accessed().
   651			 */
   652			mark_page_accessed(page);
   653		}
   654	out:
   655		pte_unmap_unlock(ptep, ptl);
   656		return page;
   657	no_page:
   658		pte_unmap_unlock(ptep, ptl);
   659		if (!pte_none(pte))
   660			return NULL;
   661		return no_page_table(vma, flags);
   662	}
   663	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size hugetlb page
  2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
  2022-08-19 16:34   ` kernel test robot
@ 2022-08-19 17:46   ` kernel test robot
  1 sibling, 0 replies; 8+ messages in thread
From: kernel test robot @ 2022-08-19 17:46 UTC (permalink / raw)
  To: Baolin Wang, akpm, songmuchun, mike.kravetz
  Cc: llvm, kbuild-all, baolin.wang, linux-mm, linux-kernel

Hi Baolin,

I love your patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.0-rc1 next-20220819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
config: riscv-randconfig-r042-20220819 (https://download.01.org/0day-ci/archive/20220820/202208200109.XEXN0wPy-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 0ac597f3cacf60479ffd36b03766fa7462dabd78)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/c0add09e9de4b39c58633c89ea25ba10ed12d134
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Baolin-Wang/Fix-some-issues-when-looking-up-hugetlb-page/20220819-182017
        git checkout c0add09e9de4b39c58633c89ea25ba10ed12d134
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/gup.c:551:9: error: implicit declaration of function 'huge_ptep_get' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
                   pte = huge_ptep_get(ptep);
                         ^
>> mm/gup.c:551:7: error: assigning to 'pte_t' from incompatible type 'int'
                   pte = huge_ptep_get(ptep);
                       ^ ~~~~~~~~~~~~~~~~~~~
   2 errors generated.


vim +/huge_ptep_get +551 mm/gup.c

   518	
   519	static struct page *follow_page_pte(struct vm_area_struct *vma,
   520			unsigned long address, pmd_t *pmd, unsigned int flags,
   521			struct dev_pagemap **pgmap)
   522	{
   523		struct mm_struct *mm = vma->vm_mm;
   524		struct page *page;
   525		spinlock_t *ptl;
   526		pte_t *ptep, pte;
   527		int ret;
   528	
   529		/* FOLL_GET and FOLL_PIN are mutually exclusive. */
   530		if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
   531				 (FOLL_PIN | FOLL_GET)))
   532			return ERR_PTR(-EINVAL);
   533	retry:
   534		if (unlikely(pmd_bad(*pmd)))
   535			return no_page_table(vma, flags);
   536	
   537		/*
   538		 * Considering PTE level hugetlb, like continuous-PTE hugetlb on
   539		 * ARM64 architecture.
   540		 */
   541		if (is_vm_hugetlb_page(vma)) {
   542			struct hstate *hstate = hstate_vma(vma);
   543			unsigned long size = huge_page_size(hstate);
   544	
   545			ptep = huge_pte_offset(mm, address, size);
   546			if (!ptep)
   547				return no_page_table(vma, flags);
   548	
   549			ptl = huge_pte_lockptr(hstate, mm, ptep);
   550			spin_lock(ptl);
 > 551			pte = huge_ptep_get(ptep);
   552		} else {
   553			ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
   554			pte = *ptep;
   555		}
   556	
   557		if (!pte_present(pte)) {
   558			swp_entry_t entry;
   559			/*
   560			 * KSM's break_ksm() relies upon recognizing a ksm page
   561			 * even while it is being migrated, so for that case we
   562			 * need migration_entry_wait().
   563			 */
   564			if (likely(!(flags & FOLL_MIGRATION)))
   565				goto no_page;
   566			if (pte_none(pte))
   567				goto no_page;
   568			entry = pte_to_swp_entry(pte);
   569			if (!is_migration_entry(entry))
   570				goto no_page;
   571			pte_unmap_unlock(ptep, ptl);
   572			migration_entry_wait(mm, pmd, address);
   573			goto retry;
   574		}
   575		if ((flags & FOLL_NUMA) && pte_protnone(pte))
   576			goto no_page;
   577	
   578		page = vm_normal_page(vma, address, pte);
   579	
   580		/*
   581		 * We only care about anon pages in can_follow_write_pte() and don't
   582		 * have to worry about pte_devmap() because they are never anon.
   583		 */
   584		if ((flags & FOLL_WRITE) &&
   585		    !can_follow_write_pte(pte, page, vma, flags)) {
   586			page = NULL;
   587			goto out;
   588		}
   589	
   590		if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
   591			/*
   592			 * Only return device mapping pages in the FOLL_GET or FOLL_PIN
   593			 * case since they are only valid while holding the pgmap
   594			 * reference.
   595			 */
   596			*pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
   597			if (*pgmap)
   598				page = pte_page(pte);
   599			else
   600				goto no_page;
   601		} else if (unlikely(!page)) {
   602			if (flags & FOLL_DUMP) {
   603				/* Avoid special (like zero) pages in core dumps */
   604				page = ERR_PTR(-EFAULT);
   605				goto out;
   606			}
   607	
   608			if (is_zero_pfn(pte_pfn(pte))) {
   609				page = pte_page(pte);
   610			} else {
   611				ret = follow_pfn_pte(vma, address, ptep, flags);
   612				page = ERR_PTR(ret);
   613				goto out;
   614			}
   615		}
   616	
   617		if (!pte_write(pte) && gup_must_unshare(flags, page)) {
   618			page = ERR_PTR(-EMLINK);
   619			goto out;
   620		}
   621	
   622		VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
   623			       !PageAnonExclusive(page), page);
   624	
   625		/* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
   626		if (unlikely(!try_grab_page(page, flags))) {
   627			page = ERR_PTR(-ENOMEM);
   628			goto out;
   629		}
   630		/*
   631		 * We need to make the page accessible if and only if we are going
   632		 * to access its content (the FOLL_PIN case).  Please see
   633		 * Documentation/core-api/pin_user_pages.rst for details.
   634		 */
   635		if (flags & FOLL_PIN) {
   636			ret = arch_make_page_accessible(page);
   637			if (ret) {
   638				unpin_user_page(page);
   639				page = ERR_PTR(ret);
   640				goto out;
   641			}
   642		}
   643		if (flags & FOLL_TOUCH) {
   644			if ((flags & FOLL_WRITE) &&
   645			    !pte_dirty(pte) && !PageDirty(page))
   646				set_page_dirty(page);
   647			/*
   648			 * pte_mkyoung() would be more correct here, but atomic care
   649			 * is needed to avoid losing the dirty bit: it is easier to use
   650			 * mark_page_accessed().
   651			 */
   652			mark_page_accessed(page);
   653		}
   654	out:
   655		pte_unmap_unlock(ptep, ptl);
   656		return page;
   657	no_page:
   658		pte_unmap_unlock(ptep, ptl);
   659		if (!pte_none(pte))
   660			return NULL;
   661		return no_page_table(vma, flags);
   662	}
   663	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] Fix some issues when looking up hugetlb page
  2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
                   ` (2 preceding siblings ...)
  2022-08-19 10:12 ` [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang
@ 2022-08-20  0:08 ` Mike Kravetz
  2022-08-21  5:54   ` Baolin Wang
  3 siblings, 1 reply; 8+ messages in thread
From: Mike Kravetz @ 2022-08-20  0:08 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, songmuchun, linux-mm, linux-kernel

On 08/19/22 18:12, Baolin Wang wrote:
> Hi,
> 
> On ARM64 architecture, it can support CONT-PTE/PMD size hugetlb. When
> looking up hugetlb page by follow_page(), we will hold the incorrect
> pte/pmd lock for the CONT-PTE/PMD size hugetlb page, which will make
> the pte/pmd entry unstable even under the lock and cause some potential
> race issues. So considering the CONT-PTE/PMD size hugetlb, this patch set
> changes to use the correct function to get the correct pte/pmd entry lock
> to make the pte/pmd entry stable.

Thank you for looking at this.

I often get confused by arm64 CONT-PTE/PMD layout, so my understanding may be
wrong.  Can we use the PMD page lock for locking both CONT-PTE and CONT-PMD
entries?  Again, I may be confused by the CONT-* page table layout, but it
seems these would all be referenced via that same PMD page of the page table.
Or, perhaps CONT-PMD can span multiple PMD pages?

If we can use PMD page for locking, this would be much finer grain that
lock in the mm.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] Fix some issues when looking up hugetlb page
  2022-08-20  0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
@ 2022-08-21  5:54   ` Baolin Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2022-08-21  5:54 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: akpm, songmuchun, linux-mm, linux-kernel



On 8/20/2022 8:08 AM, Mike Kravetz wrote:
> On 08/19/22 18:12, Baolin Wang wrote:
>> Hi,
>>
>> On ARM64 architecture, it can support CONT-PTE/PMD size hugetlb. When
>> looking up hugetlb page by follow_page(), we will hold the incorrect
>> pte/pmd lock for the CONT-PTE/PMD size hugetlb page, which will make
>> the pte/pmd entry unstable even under the lock and cause some potential
>> race issues. So considering the CONT-PTE/PMD size hugetlb, this patch set
>> changes to use the correct function to get the correct pte/pmd entry lock
>> to make the pte/pmd entry stable.
> 
> Thank you for looking at this.
> 
> I often get confused by arm64 CONT-PTE/PMD layout, so my understanding may be
> wrong.  Can we use the PMD page lock for locking both CONT-PTE and CONT-PMD
> entries?  Again, I may be confused by the CONT-* page table layout, but it
> seems these would all be referenced via that same PMD page of the page table.
> Or, perhaps CONT-PMD can span multiple PMD pages?

Good point. CONT-PMD can not span multiple PMD pages, so I think 
CONT-PMD can use PMD pagetable page lock, but CONT-PTE also can not span 
multiple PTE pagetable page lock, so I think CONT-PTE can use the PTE 
pagetable page lock to get a fine grained lock.

I will add CONT-PTE and CONT-PMD case in huge_pte_lockptr() in next 
version. Thanks for your comment.

> If we can use PMD page for locking, this would be much finer grain that
> lock in the mm.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-21  5:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 10:12 [PATCH 0/3] Fix some issues when looking up hugetlb page Baolin Wang
2022-08-19 10:12 ` [PATCH 1/3] mm/gup: fix races when looking up a CONT-PTE size " Baolin Wang
2022-08-19 16:34   ` kernel test robot
2022-08-19 17:46   ` kernel test robot
2022-08-19 10:12 ` [PATCH 2/3] mm/hugetlb: fix races when looking up a CONT-PMD " Baolin Wang
2022-08-19 10:12 ` [PATCH 3/3] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang
2022-08-20  0:08 ` [PATCH 0/3] Fix some issues when looking up hugetlb page Mike Kravetz
2022-08-21  5:54   ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox