linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
  • * Re: [PATCH] Documentation/mm: Initial page table documentation
           [not found] <20230605221035.3681812-1-linus.walleij@linaro.org>
           [not found] ` <ZH6uolQWeyX9kb+j@casper.infradead.org>
    @ 2023-06-08  9:31 ` Kuan-Ying Lee (李冠穎)
      2023-06-08 11:51   ` Linus Walleij
      1 sibling, 1 reply; 4+ messages in thread
    From: Kuan-Ying Lee (李冠穎) @ 2023-06-08  9:31 UTC (permalink / raw)
      To: corbet, linus.walleij, akpm; +Cc: linux-mm, rppt, linux-doc
    
    On Tue, 2023-06-06 at 00:10 +0200, Linus Walleij wrote:
    > This is based on an earlier blog post at people.kernel.org,
    > it describes the concepts about page tables that were hardest
    > for me to grasp when dealing with them for the first time,
    > such as the prevalent three-letter acronyms pfn, pgd, p4d,
    > pud, pmd and pte.
    > 
    > I don't know if this is what people want, but it's what I would
    > have wanted.
    > 
    > I discussed at one point with Mike Rapoport to bring this into
    > the kernel documentation, so here is a small proposal.
    > 
    > Cc: Mike Rapoport <rppt@kernel.org>
    > Link: https://people.kernel.org/linusw/arm32-page-tables
    > Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
    > ---
    >  Documentation/mm/page_tables.rst | 125
    > +++++++++++++++++++++++++++++++
    >  1 file changed, 125 insertions(+)
    > 
    > diff --git a/Documentation/mm/page_tables.rst
    > b/Documentation/mm/page_tables.rst
    > index 96939571d7bc..a2e1671a0f1d 100644
    > --- a/Documentation/mm/page_tables.rst
    > +++ b/Documentation/mm/page_tables.rst
    > @@ -3,3 +3,128 @@
    >  ===========
    >  Page Tables
    >  ===========
    > +
    > +Paged virtual memory was invented along with virtual memory as a
    > concept in
    > +1962 on the Ferranti Atlas Computer which was the first computer
    > with paged
    > +virtual memory. The feature migrated to newer computers and became a
    > de facto
    > +feature of all Unix-like systems as time went by. In 1985 the
    > feature was
    > +included in the Intel 80386, which was the CPU Linux 1.0 was
    > developed on.
    > +
    > +The first computers with virtual memory had one single page table,
    > but the
    > +increased size of physical memories demanded that the page tables be
    > split in
    > +two hierarchical levels. This happens because a single page table
    > cannot cover
    > +the desired amount of memory with the desired granualarity, such as
    > a page size
    > +of 4KB.
    > +
    > +The physical address corresponding to the virtual address is
    > commonly
    > +defined by the index point in the hierarchy, and this is called a
    > **page frame
    > +number** or **pfn**. The first entry on the top level to the first
    > entry in the
    > +second and so on down the hierarchy will point out the virtual
    > address for the
    > +physical memory address 0, which will be *pfn 0* and the highest pfn
    > will be
    > +the last page of physical memory the external address bus of the CPU
    > can
    > +address.
    > +
    > +With a page granularity of 4KB and a address range of 32 bits, pfn 0
    > is at
    > +address 0x00000000, pfn 1 is at address 0x00004000, pfn 2 is at
    > 0x00008000
    > +and so on until we reach pfn 0x3ffff at 0xffffc000.
    
    pfn 1 is at 0x00001000.
    pfn 2 is at 0x00002000.
    
    And so on until we reach pfn 0xfffff at 0xfffff000.
    
    > +
    > +As you can see, with 4KB pages the page base address uses bits 12-31 
    > of the
    > +address, and this is why `PAGE_SHIFT` in this case is defined as 12
    > and
    > +`PAGE_SIZE` is usually defined in terms of the page shift as `(1 <<
    > PAGE_SHIFT)`
    > +
    > +Over time a deeper hierarchy has been developed in response to
    > increasing memory
    > +sizes. When Linux was created, 4KB pages and a single page table
    > called
    > +`swapper_pg_dir` with 1024 entries was used, covering 4MB which
    > coincided with
    > +the fact that Torvald's first computer had 4MB of physical memory.
    > Entries in
    > +this single table was referred to as *PTE*:s - page table entries.
    > +
    > +Over time the page table hierarchy has developed into this::
    > +
    > +  +-----+
    > +  | PGD |
    > +  +-----+
    > +     ^
    > +     |   +-----+
    > +     +---| P4D |
    > +         +-----+
    > +            ^
    > +            |   +-----+
    > +            +---| PUD |
    > +                +-----+
    > +                   ^
    > +                   |   +-----+
    > +                   +---| PMD |
    > +                       +-----+
    > +                          ^
    > +                          |   +-----+
    > +                          +---| PTE |
    > +                              +-----+
    > +
    > +
    > +Symbols on the different levels of the page table hierarchy have the
    > following
    > +meaning:
    > +
    > +- **pgd**, `pgd_t`, `pgdval_t` = **Page Global Directory** - the
    > Linux kernel
    > +  main page table handling the PGD for the kernel memory is still
    > found in
    > +  `swapper_pg_dir`, but each userspace process in the system also
    > has its own
    > +  memory context and thus its own *pgd*, found in `struct mm_struct`
    > which
    > +  in turn is referenced to in each `struct task_struct`. So tasks
    > have memory
    > +  context in the form of a `struct mm_struct` and this in turn has a
    > +  `struct pgt_t *pgd` pointer to the corresponding page global
    > directory.
    > +
    > +- **p4d**, `p4d_t`, `p4dval_t` = **Page Level 4 Directory** was
    > introduced to
    > +  handle 5-level page tables after the *pud* was introduced. Now it
    > was clear
    > +  that we nee to replace *pgd*, *pmd*, *pud* etc with a figure
    > indicating the
    > +  directory level and that we cannot go on with ad hoc names any
    > more. This
    > +  is only used on systems which actually have 5 levels of page
    > tables.
    > +
    > +- **pud**, `pud_t`, `pudval_t` = **Page Upper Directory** was
    > introduced after
    > +  the other levels to handle 4-level page tables. Like *p4d*, it is
    > potentially
    > +  unused.
    > +
    > +- **pmd**, `pmd_t`, `pmdval_t` = **Page Middle Directory**.
    > +
    > +- **pte**, `pte_t`, `pteval_t` = **Page Table Entry** - mentioned
    > earlier.
    > +  The name is a bit confusing because while in Linux 1.0 this did
    > refer to a
    > +  single page table entry in the top level page table, it was
    > retrofitted
    > +  to be "what the level above points to". So when two-level page
    > tables were
    > +  introduced, the *pte* became a list of pointers, which is why
    > +  `PTRS_PER_PTE` exists. This oxymoronic term can be mildly
    > confusing.
    > +
    > +As already mentioned, each level in the page table hierarchy is a
    > *list of
    > +pointers*, so the **pgd** contains `PTRS_PER_PGD` pointers to the
    > next level
    > +below, **p4d** contains `PTRS_PER_P4D` pointers to **pud** items and
    > so on. The
    > +number of pointers on each level is architecture-defined. The most
    > usual layout
    > +is the `PAGE_SIZE` of the system divided by the number of bytes in a
    > virtual
    > +address on the system so each page table level is exactly one page
    > worth of
    > +pointers, which is usually what computer architects choose::
    > +
    > +    PMD
    > +  +-----+           PTE
    > +  | ptr |-------> +-----+
    > +  | ptr |-        | ptr |-------> PAGE
    > +  | ptr | \       | ptr |
    > +  | ptr |  \        ...
    > +  | ... |   \
    > +  | ptr |    \         PTE
    > +  +-----+     +----> +-----+
    > +                     | ptr |-------> PAGE
    > +                     | ptr |
    > +                       ...
    > +
    > +
    > +Each pointer in the lowest level of the page table hierarchy, i.e.
    > each
    > +`pteval_t`-entry of the `PTRS_PER_PTE` entries in a `pte_t *`, will
    > map exactly
    > +one `PAGE_SIZE`:d page of physical memory to exactly one page of
    > virtual memory.
    > +
    > +The pte page table entries (pointers) on the lowest level of the
    > hierarchy
    > +typically contain the high bits of a virtual address in its high
    > bits, and in
    > +the lower bits it contains architecture-dependent control bits
    > pertaining to
    > +the page.
    > +
    > +If the architecture does not use all the page table levels, they can
    > be *folded*
    > +which means skipped, and all operations performed on page tables
    > will be
    > +compile-time augmented to just skip a level when accessing the next
    > lower
    > +level. Page table handling code that wish to be architecture-
    > neutral, such as
    > +the virtual memory manager, will however need to be written so that
    > it
    > +traverses all of the currently five levels.
    
    ^ permalink raw reply	[flat|nested] 4+ messages in thread

  • end of thread, other threads:[~2023-06-08 11:51 UTC | newest]
    
    Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
    -- links below jump to the message on this page --
         [not found] <20230605221035.3681812-1-linus.walleij@linaro.org>
         [not found] ` <ZH6uolQWeyX9kb+j@casper.infradead.org>
    2023-06-08  8:13   ` [PATCH] Documentation/mm: Initial page table documentation Linus Walleij
    2023-06-08  9:00     ` Mike Rapoport
    2023-06-08  9:31 ` Kuan-Ying Lee (李冠穎)
    2023-06-08 11:51   ` Linus Walleij
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox