* [PATCH v2] docs/mm: add more warnings around page table access
@ 2024-11-18 16:47 Jann Horn
2024-11-18 17:33 ` Lorenzo Stoakes
2024-11-19 6:53 ` Qi Zheng
0 siblings, 2 replies; 5+ messages in thread
From: Jann Horn @ 2024-11-18 16:47 UTC (permalink / raw)
To: Andrew Morton, Jonathan Corbet, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka
Cc: Alice Ryhl, Boqun Feng, Matthew Wilcox, Mike Rapoport,
Suren Baghdasaryan, Hillf Danton, Qi Zheng, SeongJae Park,
Bagas Sanjaya, linux-mm, linux-doc, linux-kernel, Matteo Rizzo,
Jann Horn
Make it clearer that holding the mmap lock in read mode is not enough
to traverse page tables, and that just having a stable VMA is not enough
to read PTEs.
Suggested-by: Matteo Rizzo <matteorizzo@google.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Jann Horn <jannh@google.com>
---
Changes in v2:
- improved based on feedback from Lorenzo
- Link to v1: https://lore.kernel.org/r/20241114-vma-docs-addition1-onv3-v1-1-ff177a0a2994@google.com
---
Documentation/mm/process_addrs.rst | 46 +++++++++++++++++++++++++++++---------
1 file changed, 36 insertions(+), 10 deletions(-)
diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
index 1bf7ad010fc063d003bb857bb3b695a3eafa0b55..1d416658d7f59ec595bd51018f42eec606f7e272 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -339,6 +339,11 @@ When **installing** page table entries, the mmap or VMA lock must be held to
keep the VMA stable. We explore why this is in the page table locking details
section below.
+.. warning:: Page tables are normally only traversed in regions covered by VMAs.
+ If you want to traverse page tables in areas that might not be
+ covered by VMAs, heavier locking is required.
+ See :c:func:`!walk_page_range_novma` for details.
+
**Freeing** page tables is an entirely internal memory management operation and
has special requirements (see the page freeing section below for more details).
@@ -450,6 +455,9 @@ the time of writing of this document.
Locking Implementation Details
------------------------------
+.. warning:: Locking rules for PTE-level page tables are very different from
+ locking rules for page tables at other levels.
+
Page table locking details
--------------------------
@@ -470,8 +478,12 @@ additional locks dedicated to page tables:
These locks represent the minimum required to interact with each page table
level, but there are further requirements.
-Importantly, note that on a **traversal** of page tables, no such locks are
-taken. Whether care is taken on reading the page table entries depends on the
+Importantly, note that on a **traversal** of page tables, sometimes no such
+locks are taken. However, at the PTE level, at least concurrent page table
+deletion must be prevented (using RCU) and the page table must be mapped into
+high memory, see below.
+
+Whether care is taken on reading the page table entries depends on the
architecture, see the section on atomicity below.
Locking rules
@@ -489,12 +501,6 @@ We establish basic locking rules when interacting with page tables:
the warning below).
* As mentioned previously, zapping can be performed while simply keeping the VMA
stable, that is holding any one of the mmap, VMA or rmap locks.
-* Special care is required for PTEs, as on 32-bit architectures these must be
- mapped into high memory and additionally, careful consideration must be
- applied to racing with THP, migration or other concurrent kernel operations
- that might steal the entire PTE table from under us. All this is handled by
- :c:func:`!pte_offset_map_lock` (see the section on page table installation
- below for more details).
.. warning:: Populating previously empty entries is dangerous as, when unmapping
VMAs, :c:func:`!vms_clear_ptes` has a window of time between
@@ -509,8 +515,28 @@ We establish basic locking rules when interacting with page tables:
There are additional rules applicable when moving page tables, which we discuss
in the section on this topic below.
-.. note:: Interestingly, :c:func:`!pte_offset_map_lock` holds an RCU read lock
- while the PTE page table lock is held.
+PTE-level page tables are different from page tables at other levels, and there
+are extra requirements for accessing them:
+
+* On 32-bit architectures, they may be in high memory (meaning they need to be
+ mapped into kernel memory to be accessible).
+* When empty, they can be unlinked and RCU-freed while holding an mmap lock or
+ rmap lock for reading in combination with the PTE and PMD page table locks.
+ In particular, this happens in :c:func:`!retract_page_tables` when handling
+ :c:macro:`!MADV_COLLAPSE`.
+ So accessing PTE-level page tables requires at least holding an RCU read lock;
+ but that only suffices for readers that can tolerate racing with concurrent
+ page table updates such that an empty PTE is observed (in a page table that
+ has actually already been detached and marked for RCU freeing) while another
+ new page table has been installed in the same location and filled with
+ entries. Writers normally need to take the PTE lock and revalidate that the
+ PMD entry still refers to the same PTE-level page table.
+
+To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lock` or
+:c:func:`!pte_offset_map` can be used depending on stability requirements.
+These map the page table into kernel memory if required, take the RCU lock, and
+depending on variant, may also look up or acquire the PTE lock.
+See the comment on :c:func:`!__pte_offset_map_lock`.
Atomicity
^^^^^^^^^
---
base-commit: 1e96a63d3022403e06cdda0213c7849b05973cd5
change-id: 20241114-vma-docs-addition1-onv3-32df4e6dffcf
--
Jann Horn <jannh@google.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] docs/mm: add more warnings around page table access
2024-11-18 16:47 [PATCH v2] docs/mm: add more warnings around page table access Jann Horn
@ 2024-11-18 17:33 ` Lorenzo Stoakes
2024-11-19 6:53 ` Qi Zheng
1 sibling, 0 replies; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-11-18 17:33 UTC (permalink / raw)
To: Jann Horn
Cc: Andrew Morton, Jonathan Corbet, Liam R . Howlett,
Vlastimil Babka, Alice Ryhl, Boqun Feng, Matthew Wilcox,
Mike Rapoport, Suren Baghdasaryan, Hillf Danton, Qi Zheng,
SeongJae Park, Bagas Sanjaya, linux-mm, linux-doc, linux-kernel,
Matteo Rizzo
On Mon, Nov 18, 2024 at 05:47:08PM +0100, Jann Horn wrote:
> Make it clearer that holding the mmap lock in read mode is not enough
> to traverse page tables, and that just having a stable VMA is not enough
> to read PTEs.
>
> Suggested-by: Matteo Rizzo <matteorizzo@google.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Jann Horn <jannh@google.com>
Nice, LGTM, thanks for this!
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> Changes in v2:
> - improved based on feedback from Lorenzo
> - Link to v1: https://lore.kernel.org/r/20241114-vma-docs-addition1-onv3-v1-1-ff177a0a2994@google.com
> ---
> Documentation/mm/process_addrs.rst | 46 +++++++++++++++++++++++++++++---------
> 1 file changed, 36 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
> index 1bf7ad010fc063d003bb857bb3b695a3eafa0b55..1d416658d7f59ec595bd51018f42eec606f7e272 100644
> --- a/Documentation/mm/process_addrs.rst
> +++ b/Documentation/mm/process_addrs.rst
> @@ -339,6 +339,11 @@ When **installing** page table entries, the mmap or VMA lock must be held to
> keep the VMA stable. We explore why this is in the page table locking details
> section below.
>
> +.. warning:: Page tables are normally only traversed in regions covered by VMAs.
> + If you want to traverse page tables in areas that might not be
> + covered by VMAs, heavier locking is required.
> + See :c:func:`!walk_page_range_novma` for details.
> +
> **Freeing** page tables is an entirely internal memory management operation and
> has special requirements (see the page freeing section below for more details).
>
> @@ -450,6 +455,9 @@ the time of writing of this document.
> Locking Implementation Details
> ------------------------------
>
> +.. warning:: Locking rules for PTE-level page tables are very different from
> + locking rules for page tables at other levels.
> +
> Page table locking details
> --------------------------
>
> @@ -470,8 +478,12 @@ additional locks dedicated to page tables:
> These locks represent the minimum required to interact with each page table
> level, but there are further requirements.
>
> -Importantly, note that on a **traversal** of page tables, no such locks are
> -taken. Whether care is taken on reading the page table entries depends on the
> +Importantly, note that on a **traversal** of page tables, sometimes no such
> +locks are taken. However, at the PTE level, at least concurrent page table
> +deletion must be prevented (using RCU) and the page table must be mapped into
> +high memory, see below.
> +
> +Whether care is taken on reading the page table entries depends on the
> architecture, see the section on atomicity below.
>
> Locking rules
> @@ -489,12 +501,6 @@ We establish basic locking rules when interacting with page tables:
> the warning below).
> * As mentioned previously, zapping can be performed while simply keeping the VMA
> stable, that is holding any one of the mmap, VMA or rmap locks.
> -* Special care is required for PTEs, as on 32-bit architectures these must be
> - mapped into high memory and additionally, careful consideration must be
> - applied to racing with THP, migration or other concurrent kernel operations
> - that might steal the entire PTE table from under us. All this is handled by
> - :c:func:`!pte_offset_map_lock` (see the section on page table installation
> - below for more details).
>
> .. warning:: Populating previously empty entries is dangerous as, when unmapping
> VMAs, :c:func:`!vms_clear_ptes` has a window of time between
> @@ -509,8 +515,28 @@ We establish basic locking rules when interacting with page tables:
> There are additional rules applicable when moving page tables, which we discuss
> in the section on this topic below.
>
> -.. note:: Interestingly, :c:func:`!pte_offset_map_lock` holds an RCU read lock
> - while the PTE page table lock is held.
> +PTE-level page tables are different from page tables at other levels, and there
> +are extra requirements for accessing them:
> +
> +* On 32-bit architectures, they may be in high memory (meaning they need to be
> + mapped into kernel memory to be accessible).
> +* When empty, they can be unlinked and RCU-freed while holding an mmap lock or
> + rmap lock for reading in combination with the PTE and PMD page table locks.
> + In particular, this happens in :c:func:`!retract_page_tables` when handling
> + :c:macro:`!MADV_COLLAPSE`.
> + So accessing PTE-level page tables requires at least holding an RCU read lock;
> + but that only suffices for readers that can tolerate racing with concurrent
> + page table updates such that an empty PTE is observed (in a page table that
> + has actually already been detached and marked for RCU freeing) while another
> + new page table has been installed in the same location and filled with
> + entries. Writers normally need to take the PTE lock and revalidate that the
> + PMD entry still refers to the same PTE-level page table.
> +
> +To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lock` or
> +:c:func:`!pte_offset_map` can be used depending on stability requirements.
> +These map the page table into kernel memory if required, take the RCU lock, and
> +depending on variant, may also look up or acquire the PTE lock.
> +See the comment on :c:func:`!__pte_offset_map_lock`.
>
> Atomicity
> ^^^^^^^^^
>
> ---
> base-commit: 1e96a63d3022403e06cdda0213c7849b05973cd5
> change-id: 20241114-vma-docs-addition1-onv3-32df4e6dffcf
>
> --
> Jann Horn <jannh@google.com>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] docs/mm: add more warnings around page table access
2024-11-18 16:47 [PATCH v2] docs/mm: add more warnings around page table access Jann Horn
2024-11-18 17:33 ` Lorenzo Stoakes
@ 2024-11-19 6:53 ` Qi Zheng
2024-11-19 7:48 ` Lorenzo Stoakes
1 sibling, 1 reply; 5+ messages in thread
From: Qi Zheng @ 2024-11-19 6:53 UTC (permalink / raw)
To: Jann Horn
Cc: Andrew Morton, Jonathan Corbet, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Alice Ryhl, Boqun Feng,
Matthew Wilcox, Mike Rapoport, Suren Baghdasaryan, Hillf Danton,
Qi Zheng, SeongJae Park, Bagas Sanjaya, linux-mm, linux-doc,
linux-kernel, Matteo Rizzo
On 2024/11/19 00:47, Jann Horn wrote:
> Make it clearer that holding the mmap lock in read mode is not enough
> to traverse page tables, and that just having a stable VMA is not enough
> to read PTEs.
>
> Suggested-by: Matteo Rizzo <matteorizzo@google.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
> +
> +* On 32-bit architectures, they may be in high memory (meaning they need to be
> + mapped into kernel memory to be accessible).
> +* When empty, they can be unlinked and RCU-freed while holding an mmap lock or
> + rmap lock for reading in combination with the PTE and PMD page table locks.
> + In particular, this happens in :c:func:`!retract_page_tables` when handling
> + :c:macro:`!MADV_COLLAPSE`.
> + So accessing PTE-level page tables requires at least holding an RCU read lock;
> + but that only suffices for readers that can tolerate racing with concurrent
> + page table updates such that an empty PTE is observed (in a page table that
> + has actually already been detached and marked for RCU freeing) while another
> + new page table has been installed in the same location and filled with
> + entries. Writers normally need to take the PTE lock and revalidate that the
> + PMD entry still refers to the same PTE-level page table.
> +
In practice, this also happens in the retract_page_tables(). Maybe can
add a note about this after my patch[1] is merged. ;)
[1].
https://lore.kernel.org/lkml/e5b321ffc3ebfcc46e53830e917ad246f7d2825f.1731566457.git.zhengqi.arch@bytedance.com/
Thanks!
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] docs/mm: add more warnings around page table access
2024-11-19 6:53 ` Qi Zheng
@ 2024-11-19 7:48 ` Lorenzo Stoakes
2024-11-19 8:02 ` Qi Zheng
0 siblings, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2024-11-19 7:48 UTC (permalink / raw)
To: Qi Zheng
Cc: Jann Horn, Andrew Morton, Jonathan Corbet, Liam R . Howlett,
Vlastimil Babka, Alice Ryhl, Boqun Feng, Matthew Wilcox,
Mike Rapoport, Suren Baghdasaryan, Hillf Danton, SeongJae Park,
Bagas Sanjaya, linux-mm, linux-doc, linux-kernel, Matteo Rizzo
On Tue, Nov 19, 2024 at 02:53:52PM +0800, Qi Zheng wrote:
>
>
> On 2024/11/19 00:47, Jann Horn wrote:
> > Make it clearer that holding the mmap lock in read mode is not enough
> > to traverse page tables, and that just having a stable VMA is not enough
> > to read PTEs.
> >
> > Suggested-by: Matteo Rizzo <matteorizzo@google.com>
> > Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
>
> > +
> > +* On 32-bit architectures, they may be in high memory (meaning they need to be
> > + mapped into kernel memory to be accessible).
> > +* When empty, they can be unlinked and RCU-freed while holding an mmap lock or
> > + rmap lock for reading in combination with the PTE and PMD page table locks.
> > + In particular, this happens in :c:func:`!retract_page_tables` when handling
> > + :c:macro:`!MADV_COLLAPSE`.
> > + So accessing PTE-level page tables requires at least holding an RCU read lock;
> > + but that only suffices for readers that can tolerate racing with concurrent
> > + page table updates such that an empty PTE is observed (in a page table that
> > + has actually already been detached and marked for RCU freeing) while another
> > + new page table has been installed in the same location and filled with
> > + entries. Writers normally need to take the PTE lock and revalidate that the
> > + PMD entry still refers to the same PTE-level page table.
> > +
>
> In practice, this also happens in the retract_page_tables(). Maybe can
> add a note about this after my patch[1] is merged. ;)
>
> [1]. https://lore.kernel.org/lkml/e5b321ffc3ebfcc46e53830e917ad246f7d2825f.1731566457.git.zhengqi.arch@bytedance.com/
You could even queue the doc change up there? :>)
I think one really nice thing with having docs in-tree like this is when we
change things that alter the doc's accuracy we can queue them up with the
patch so the doc always stays in sync.
I feel you may have accidentally self-volunteered there ;)
>
> Thanks!
>
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] docs/mm: add more warnings around page table access
2024-11-19 7:48 ` Lorenzo Stoakes
@ 2024-11-19 8:02 ` Qi Zheng
0 siblings, 0 replies; 5+ messages in thread
From: Qi Zheng @ 2024-11-19 8:02 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Jann Horn, Andrew Morton, Jonathan Corbet, Liam R . Howlett,
Vlastimil Babka, Alice Ryhl, Boqun Feng, Matthew Wilcox,
Mike Rapoport, Suren Baghdasaryan, Hillf Danton, SeongJae Park,
Bagas Sanjaya, linux-mm, linux-doc, linux-kernel, Matteo Rizzo
On 2024/11/19 15:48, Lorenzo Stoakes wrote:
> On Tue, Nov 19, 2024 at 02:53:52PM +0800, Qi Zheng wrote:
>>
>>
>> On 2024/11/19 00:47, Jann Horn wrote:
>>> Make it clearer that holding the mmap lock in read mode is not enough
>>> to traverse page tables, and that just having a stable VMA is not enough
>>> to read PTEs.
>>>
>>> Suggested-by: Matteo Rizzo <matteorizzo@google.com>
>>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> Signed-off-by: Jann Horn <jannh@google.com>
>>
>> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>
>>> +
>>> +* On 32-bit architectures, they may be in high memory (meaning they need to be
>>> + mapped into kernel memory to be accessible).
>>> +* When empty, they can be unlinked and RCU-freed while holding an mmap lock or
>>> + rmap lock for reading in combination with the PTE and PMD page table locks.
>>> + In particular, this happens in :c:func:`!retract_page_tables` when handling
>>> + :c:macro:`!MADV_COLLAPSE`.
>>> + So accessing PTE-level page tables requires at least holding an RCU read lock;
>>> + but that only suffices for readers that can tolerate racing with concurrent
>>> + page table updates such that an empty PTE is observed (in a page table that
>>> + has actually already been detached and marked for RCU freeing) while another
>>> + new page table has been installed in the same location and filled with
>>> + entries. Writers normally need to take the PTE lock and revalidate that the
>>> + PMD entry still refers to the same PTE-level page table.
>>> +
>>
>> In practice, this also happens in the retract_page_tables(). Maybe can
>> add a note about this after my patch[1] is merged. ;)
>>
>> [1]. https://lore.kernel.org/lkml/e5b321ffc3ebfcc46e53830e917ad246f7d2825f.1731566457.git.zhengqi.arch@bytedance.com/
>
> You could even queue the doc change up there? :>)
OK, I can add this note to my patch after this patch is merged.
>
> I think one really nice thing with having docs in-tree like this is when we
> change things that alter the doc's accuracy we can queue them up with the
> patch so the doc always stays in sync.
Agree.
>
> I feel you may have accidentally self-volunteered there ;)
>
>>
>> Thanks!
>>
>>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-19 8:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-18 16:47 [PATCH v2] docs/mm: add more warnings around page table access Jann Horn
2024-11-18 17:33 ` Lorenzo Stoakes
2024-11-19 6:53 ` Qi Zheng
2024-11-19 7:48 ` Lorenzo Stoakes
2024-11-19 8:02 ` Qi Zheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox