linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Alice Ryhl <aliceryhl@google.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Hillf Danton <hdanton@sina.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	SeongJae Park <sj@kernel.org>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Matteo Rizzo <matteorizzo@google.com>
Subject: Re: [PATCH v2] docs/mm: add more warnings around page table access
Date: Mon, 18 Nov 2024 17:33:42 +0000	[thread overview]
Message-ID: <904ef000-a2b0-461f-b5c3-6de0672c6ec2@lucifer.local> (raw)
In-Reply-To: <20241118-vma-docs-addition1-onv3-v2-1-c9d5395b72ee@google.com>

On Mon, Nov 18, 2024 at 05:47:08PM +0100, Jann Horn wrote:
> Make it clearer that holding the mmap lock in read mode is not enough
> to traverse page tables, and that just having a stable VMA is not enough
> to read PTEs.
>
> Suggested-by: Matteo Rizzo <matteorizzo@google.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Jann Horn <jannh@google.com>

Nice, LGTM, thanks for this!

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
> Changes in v2:
> - improved based on feedback from Lorenzo
> - Link to v1: https://lore.kernel.org/r/20241114-vma-docs-addition1-onv3-v1-1-ff177a0a2994@google.com
> ---
>  Documentation/mm/process_addrs.rst | 46 +++++++++++++++++++++++++++++---------
>  1 file changed, 36 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
> index 1bf7ad010fc063d003bb857bb3b695a3eafa0b55..1d416658d7f59ec595bd51018f42eec606f7e272 100644
> --- a/Documentation/mm/process_addrs.rst
> +++ b/Documentation/mm/process_addrs.rst
> @@ -339,6 +339,11 @@ When **installing** page table entries, the mmap or VMA lock must be held to
>  keep the VMA stable. We explore why this is in the page table locking details
>  section below.
>
> +.. warning:: Page tables are normally only traversed in regions covered by VMAs.
> +             If you want to traverse page tables in areas that might not be
> +             covered by VMAs, heavier locking is required.
> +             See :c:func:`!walk_page_range_novma` for details.
> +
>  **Freeing** page tables is an entirely internal memory management operation and
>  has special requirements (see the page freeing section below for more details).
>
> @@ -450,6 +455,9 @@ the time of writing of this document.
>  Locking Implementation Details
>  ------------------------------
>
> +.. warning:: Locking rules for PTE-level page tables are very different from
> +             locking rules for page tables at other levels.
> +
>  Page table locking details
>  --------------------------
>
> @@ -470,8 +478,12 @@ additional locks dedicated to page tables:
>  These locks represent the minimum required to interact with each page table
>  level, but there are further requirements.
>
> -Importantly, note that on a **traversal** of page tables, no such locks are
> -taken. Whether care is taken on reading the page table entries depends on the
> +Importantly, note that on a **traversal** of page tables, sometimes no such
> +locks are taken. However, at the PTE level, at least concurrent page table
> +deletion must be prevented (using RCU) and the page table must be mapped into
> +high memory, see below.
> +
> +Whether care is taken on reading the page table entries depends on the
>  architecture, see the section on atomicity below.
>
>  Locking rules
> @@ -489,12 +501,6 @@ We establish basic locking rules when interacting with page tables:
>    the warning below).
>  * As mentioned previously, zapping can be performed while simply keeping the VMA
>    stable, that is holding any one of the mmap, VMA or rmap locks.
> -* Special care is required for PTEs, as on 32-bit architectures these must be
> -  mapped into high memory and additionally, careful consideration must be
> -  applied to racing with THP, migration or other concurrent kernel operations
> -  that might steal the entire PTE table from under us. All this is handled by
> -  :c:func:`!pte_offset_map_lock` (see the section on page table installation
> -  below for more details).
>
>  .. warning:: Populating previously empty entries is dangerous as, when unmapping
>               VMAs, :c:func:`!vms_clear_ptes` has a window of time between
> @@ -509,8 +515,28 @@ We establish basic locking rules when interacting with page tables:
>  There are additional rules applicable when moving page tables, which we discuss
>  in the section on this topic below.
>
> -.. note:: Interestingly, :c:func:`!pte_offset_map_lock` holds an RCU read lock
> -          while the PTE page table lock is held.
> +PTE-level page tables are different from page tables at other levels, and there
> +are extra requirements for accessing them:
> +
> +* On 32-bit architectures, they may be in high memory (meaning they need to be
> +  mapped into kernel memory to be accessible).
> +* When empty, they can be unlinked and RCU-freed while holding an mmap lock or
> +  rmap lock for reading in combination with the PTE and PMD page table locks.
> +  In particular, this happens in :c:func:`!retract_page_tables` when handling
> +  :c:macro:`!MADV_COLLAPSE`.
> +  So accessing PTE-level page tables requires at least holding an RCU read lock;
> +  but that only suffices for readers that can tolerate racing with concurrent
> +  page table updates such that an empty PTE is observed (in a page table that
> +  has actually already been detached and marked for RCU freeing) while another
> +  new page table has been installed in the same location and filled with
> +  entries. Writers normally need to take the PTE lock and revalidate that the
> +  PMD entry still refers to the same PTE-level page table.
> +
> +To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lock` or
> +:c:func:`!pte_offset_map` can be used depending on stability requirements.
> +These map the page table into kernel memory if required, take the RCU lock, and
> +depending on variant, may also look up or acquire the PTE lock.
> +See the comment on :c:func:`!__pte_offset_map_lock`.
>
>  Atomicity
>  ^^^^^^^^^
>
> ---
> base-commit: 1e96a63d3022403e06cdda0213c7849b05973cd5
> change-id: 20241114-vma-docs-addition1-onv3-32df4e6dffcf
>
> --
> Jann Horn <jannh@google.com>
>


  reply	other threads:[~2024-11-18 17:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-18 16:47 Jann Horn
2024-11-18 17:33 ` Lorenzo Stoakes [this message]
2024-11-19  6:53 ` Qi Zheng
2024-11-19  7:48   ` Lorenzo Stoakes
2024-11-19  8:02     ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=904ef000-a2b0-461f-b5c3-6de0672c6ec2@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aliceryhl@google.com \
    --cc=bagasdotme@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=hdanton@sina.com \
    --cc=jannh@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matteorizzo@google.com \
    --cc=rppt@kernel.org \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox