linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] Documentation/mm: Initial page table documentation
       [not found] ` <ZH6uolQWeyX9kb+j@casper.infradead.org>
@ 2023-06-08  8:13   ` Linus Walleij
  2023-06-08  9:00     ` Mike Rapoport
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Walleij @ 2023-06-08  8:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Jonathan Corbet, linux-mm, linux-doc, Mike Rapoport

Hi Matthew,

I fixes up most of the comments.

On Tue, Jun 6, 2023 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> On Tue, Jun 06, 2023 at 12:10:35AM +0200, Linus Walleij wrote:

> > +- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned earlier.
> > +  The name is a bit confusing because while in Linux 1.0 this did refer to a
> > +  single page table entry in the top level page table, it was retrofitted
> > +  to be "what the level above points to". So when two-level page tables were
> > +  introduced, the *pte* became a list of pointers, which is why
> > +  `PTRS_PER_PTE` exists. This oxymoronic term can be mildly confusing.
>
> I don't think this is right.  PTRS_PER_PTE is how many pointers are in
> the PMD page table,

I don't get this. What does PTRS_PER_PMD mean then (and
then all the way up to PTRS_PER_PGD...)

> so it's how many pointers you can walk if you have a
> pte *.  Yes, it's complicated and confusing, but I don't think this
> explanation clears up any of that confusion.

I will try to reword it so this gets through.

> > +pointers*, so the **pgd** contains `PTRS_PER_PGD` pointers to the next level
> > +below, **p4d** contains `PTRS_PER_P4D` pointers to **pud** items and so on. The
> > +number of pointers on each level is architecture-defined. The most usual layout
>
> I don't think it's helpful to say this.  It's really not that usual
> (maybe half of our architectures behave that way?)
>
> I think a document like this that talks about page tables really needs to
> include a description of how some PMDs / PUDs / ... may not be pointers
> to lower levels, but direct pointers to the actual memory (ie THPs /
> hugetlb pages).

I don't understand that stuff. I suggest you patch this into the document
when the basics are in place.

> Sorry to take a wrecking ball to this, I'm sure you worked hard on it.

Don't worry about that, I'm an academic, I just rewrite.

Yours,
Linus Walleij


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Documentation/mm: Initial page table documentation
  2023-06-08  8:13   ` [PATCH] Documentation/mm: Initial page table documentation Linus Walleij
@ 2023-06-08  9:00     ` Mike Rapoport
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Rapoport @ 2023-06-08  9:00 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Matthew Wilcox, Andrew Morton, Jonathan Corbet, linux-mm, linux-doc

On Thu, Jun 08, 2023 at 10:13:49AM +0200, Linus Walleij wrote:
> Hi Matthew,
> 
> I fixes up most of the comments.
> 
> On Tue, Jun 6, 2023 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> > On Tue, Jun 06, 2023 at 12:10:35AM +0200, Linus Walleij wrote:
> 
> > > +- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned earlier.
> > > +  The name is a bit confusing because while in Linux 1.0 this did refer to a
> > > +  single page table entry in the top level page table, it was retrofitted
> > > +  to be "what the level above points to". So when two-level page tables were
> > > +  introduced, the *pte* became a list of pointers, which is why
> > > +  `PTRS_PER_PTE` exists. This oxymoronic term can be mildly confusing.
> >
> > I don't think this is right.  PTRS_PER_PTE is how many pointers are in
> > the PMD page table,
> 
> I don't get this. What does PTRS_PER_PMD mean then (and
> then all the way up to PTRS_PER_PGD...)

PTRS_PER_PTE is how many pointers in the lowest level (pte) page table and
pte_t is a "pointer" to an actual physical page mapped by the page tables.
 
> Yours,
> Linus Walleij

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Documentation/mm: Initial page table documentation
       [not found] <20230605221035.3681812-1-linus.walleij@linaro.org>
       [not found] ` <ZH6uolQWeyX9kb+j@casper.infradead.org>
@ 2023-06-08  9:31 ` Kuan-Ying Lee (李冠穎)
  2023-06-08 11:51   ` Linus Walleij
  1 sibling, 1 reply; 4+ messages in thread
From: Kuan-Ying Lee (李冠穎) @ 2023-06-08  9:31 UTC (permalink / raw)
  To: corbet, linus.walleij, akpm; +Cc: linux-mm, rppt, linux-doc

On Tue, 2023-06-06 at 00:10 +0200, Linus Walleij wrote:
> This is based on an earlier blog post at people.kernel.org,
> it describes the concepts about page tables that were hardest
> for me to grasp when dealing with them for the first time,
> such as the prevalent three-letter acronyms pfn, pgd, p4d,
> pud, pmd and pte.
> 
> I don't know if this is what people want, but it's what I would
> have wanted.
> 
> I discussed at one point with Mike Rapoport to bring this into
> the kernel documentation, so here is a small proposal.
> 
> Cc: Mike Rapoport <rppt@kernel.org>
> Link: https://people.kernel.org/linusw/arm32-page-tables
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  Documentation/mm/page_tables.rst | 125
> +++++++++++++++++++++++++++++++
>  1 file changed, 125 insertions(+)
> 
> diff --git a/Documentation/mm/page_tables.rst
> b/Documentation/mm/page_tables.rst
> index 96939571d7bc..a2e1671a0f1d 100644
> --- a/Documentation/mm/page_tables.rst
> +++ b/Documentation/mm/page_tables.rst
> @@ -3,3 +3,128 @@
>  ===========
>  Page Tables
>  ===========
> +
> +Paged virtual memory was invented along with virtual memory as a
> concept in
> +1962 on the Ferranti Atlas Computer which was the first computer
> with paged
> +virtual memory. The feature migrated to newer computers and became a
> de facto
> +feature of all Unix-like systems as time went by. In 1985 the
> feature was
> +included in the Intel 80386, which was the CPU Linux 1.0 was
> developed on.
> +
> +The first computers with virtual memory had one single page table,
> but the
> +increased size of physical memories demanded that the page tables be
> split in
> +two hierarchical levels. This happens because a single page table
> cannot cover
> +the desired amount of memory with the desired granualarity, such as
> a page size
> +of 4KB.
> +
> +The physical address corresponding to the virtual address is
> commonly
> +defined by the index point in the hierarchy, and this is called a
> **page frame
> +number** or **pfn**. The first entry on the top level to the first
> entry in the
> +second and so on down the hierarchy will point out the virtual
> address for the
> +physical memory address 0, which will be *pfn 0* and the highest pfn
> will be
> +the last page of physical memory the external address bus of the CPU
> can
> +address.
> +
> +With a page granularity of 4KB and a address range of 32 bits, pfn 0
> is at
> +address 0x00000000, pfn 1 is at address 0x00004000, pfn 2 is at
> 0x00008000
> +and so on until we reach pfn 0x3ffff at 0xffffc000.

pfn 1 is at 0x00001000.
pfn 2 is at 0x00002000.

And so on until we reach pfn 0xfffff at 0xfffff000.

> +
> +As you can see, with 4KB pages the page base address uses bits 12-31 
> of the
> +address, and this is why `PAGE_SHIFT` in this case is defined as 12
> and
> +`PAGE_SIZE` is usually defined in terms of the page shift as `(1 <<
> PAGE_SHIFT)`
> +
> +Over time a deeper hierarchy has been developed in response to
> increasing memory
> +sizes. When Linux was created, 4KB pages and a single page table
> called
> +`swapper_pg_dir` with 1024 entries was used, covering 4MB which
> coincided with
> +the fact that Torvald's first computer had 4MB of physical memory.
> Entries in
> +this single table was referred to as *PTE*:s - page table entries.
> +
> +Over time the page table hierarchy has developed into this::
> +
> +  +-----+
> +  | PGD |
> +  +-----+
> +     ^
> +     |   +-----+
> +     +---| P4D |
> +         +-----+
> +            ^
> +            |   +-----+
> +            +---| PUD |
> +                +-----+
> +                   ^
> +                   |   +-----+
> +                   +---| PMD |
> +                       +-----+
> +                          ^
> +                          |   +-----+
> +                          +---| PTE |
> +                              +-----+
> +
> +
> +Symbols on the different levels of the page table hierarchy have the
> following
> +meaning:
> +
> +- **pgd**, `pgd_t`, `pgdval_t` = **Page Global Directory** - the
> Linux kernel
> +  main page table handling the PGD for the kernel memory is still
> found in
> +  `swapper_pg_dir`, but each userspace process in the system also
> has its own
> +  memory context and thus its own *pgd*, found in `struct mm_struct`
> which
> +  in turn is referenced to in each `struct task_struct`. So tasks
> have memory
> +  context in the form of a `struct mm_struct` and this in turn has a
> +  `struct pgt_t *pgd` pointer to the corresponding page global
> directory.
> +
> +- **p4d**, `p4d_t`, `p4dval_t` = **Page Level 4 Directory** was
> introduced to
> +  handle 5-level page tables after the *pud* was introduced. Now it
> was clear
> +  that we nee to replace *pgd*, *pmd*, *pud* etc with a figure
> indicating the
> +  directory level and that we cannot go on with ad hoc names any
> more. This
> +  is only used on systems which actually have 5 levels of page
> tables.
> +
> +- **pud**, `pud_t`, `pudval_t` = **Page Upper Directory** was
> introduced after
> +  the other levels to handle 4-level page tables. Like *p4d*, it is
> potentially
> +  unused.
> +
> +- **pmd**, `pmd_t`, `pmdval_t` = **Page Middle Directory**.
> +
> +- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned
> earlier.
> +  The name is a bit confusing because while in Linux 1.0 this did
> refer to a
> +  single page table entry in the top level page table, it was
> retrofitted
> +  to be "what the level above points to". So when two-level page
> tables were
> +  introduced, the *pte* became a list of pointers, which is why
> +  `PTRS_PER_PTE` exists. This oxymoronic term can be mildly
> confusing.
> +
> +As already mentioned, each level in the page table hierarchy is a
> *list of
> +pointers*, so the **pgd** contains `PTRS_PER_PGD` pointers to the
> next level
> +below, **p4d** contains `PTRS_PER_P4D` pointers to **pud** items and
> so on. The
> +number of pointers on each level is architecture-defined. The most
> usual layout
> +is the `PAGE_SIZE` of the system divided by the number of bytes in a
> virtual
> +address on the system so each page table level is exactly one page
> worth of
> +pointers, which is usually what computer architects choose::
> +
> +    PMD
> +  +-----+           PTE
> +  | ptr |-------> +-----+
> +  | ptr |-        | ptr |-------> PAGE
> +  | ptr | \       | ptr |
> +  | ptr |  \        ...
> +  | ... |   \
> +  | ptr |    \         PTE
> +  +-----+     +----> +-----+
> +                     | ptr |-------> PAGE
> +                     | ptr |
> +                       ...
> +
> +
> +Each pointer in the lowest level of the page table hierarchy, i.e.
> each
> +`pteval_t`-entry of the `PTRS_PER_PTE` entries in a `pte_t *`, will
> map exactly
> +one `PAGE_SIZE`:d page of physical memory to exactly one page of
> virtual memory.
> +
> +The pte page table entries (pointers) on the lowest level of the
> hierarchy
> +typically contain the high bits of a virtual address in its high
> bits, and in
> +the lower bits it contains architecture-dependent control bits
> pertaining to
> +the page.
> +
> +If the architecture does not use all the page table levels, they can
> be *folded*
> +which means skipped, and all operations performed on page tables
> will be
> +compile-time augmented to just skip a level when accessing the next
> lower
> +level. Page table handling code that wish to be architecture-
> neutral, such as
> +the virtual memory manager, will however need to be written so that
> it
> +traverses all of the currently five levels.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Documentation/mm: Initial page table documentation
  2023-06-08  9:31 ` Kuan-Ying Lee (李冠穎)
@ 2023-06-08 11:51   ` Linus Walleij
  0 siblings, 0 replies; 4+ messages in thread
From: Linus Walleij @ 2023-06-08 11:51 UTC (permalink / raw)
  To: Kuan-Ying Lee (李冠穎)
  Cc: corbet, akpm, linux-mm, rppt, linux-doc

On Thu, Jun 8, 2023 at 11:32 AM Kuan-Ying Lee (李冠穎)
<Kuan-Ying.Lee@mediatek.com> wrote:

> > +With a page granularity of 4KB and a address range of 32 bits, pfn 0
> > is at
> > +address 0x00000000, pfn 1 is at address 0x00004000, pfn 2 is at
> > 0x00008000
> > +and so on until we reach pfn 0x3ffff at 0xffffc000.
>
> pfn 1 is at 0x00001000.
> pfn 2 is at 0x00002000.
>
> And so on until we reach pfn 0xfffff at 0xfffff000.

It seems I went immediately for 16K pages... Thanks, I'll fix it up.

Yours,
Linus Walleij


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-08 11:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230605221035.3681812-1-linus.walleij@linaro.org>
     [not found] ` <ZH6uolQWeyX9kb+j@casper.infradead.org>
2023-06-08  8:13   ` [PATCH] Documentation/mm: Initial page table documentation Linus Walleij
2023-06-08  9:00     ` Mike Rapoport
2023-06-08  9:31 ` Kuan-Ying Lee (李冠穎)
2023-06-08 11:51   ` Linus Walleij

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox