Re: [LSF/MM/BPF TOPIC] Multi-sized THP performance benchmarks and analysis on ARM64

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Christoph Lameter (Ampere)" <cl@linux.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>,
	 Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	 lsf-pc@lists.linux-foundation.org,
	olivier.singla@amperecomputing.com,
	 Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@suse.com>,
	 Dan Williams <dan.j.williams@intel.com>,
	 Matthew Wilcox <willy@infradead.org>, Zi Yan <ziy@nvidia.com>
Subject: Re: [LSF/MM/BPF TOPIC] Multi-sized THP performance benchmarks and analysis on ARM64
Date: Tue, 25 Jun 2024 11:11:46 -0700 (PDT)	[thread overview]
Message-ID: <f7b60a17-4dd8-aef5-e2d5-8c268595f0d1@linux.com> (raw)
In-Reply-To: <7a8bcd48-47b4-4bc7-a38f-45cef9adc221@arm.com>

On Tue, 25 Jun 2024, Ryan Roberts wrote:

> But I also want to raise a more general point; We are not done with the
> optimizations yet. contpte can also improve performance for iTLB, but this
> requires a change to the page cache to store text in (at least) 64K folios.
> Typically the iTLB is under a lot of pressure and this can help reduce it. This
> change is not in mainline yet (and I still need to figure out how to make the
> patch acceptable), but is worth another ~1.5% for the 4KPS case. I suspect this
> will also move the needle on the other benchmarks you ran. See [3] - I'd
> appreciate any thoughts you have on how to get something like this accepted.
>
> [3] https://lore.kernel.org/lkml/20240111154106.3692206-1-ryan.roberts@arm.com/

The discussion here seems to indicate that readahead is already ok for 
order-2 (16K mTHP size?). So this is only for 64K mTHP on 4K?

From what I read in the ARM64 manuals it seems that CONT_PTE can only be 
used for 64K mTHP on 4K kernels. The 16K case will not benefit from 
CONT_PTE nor any other intermediate size than 64K.

Quoting:

https://developer.arm.com/documentation/ddi0406/c/System-Level-Architecture/Virtual-Memory-System-Architecture--VMSA-/Memory-region-attributes/Long-descriptor-format-memory-region-attributes?lang=en#BEIIBEIJ

"Contiguous hint

The Long-descriptor translation table format descriptors contain a 
Contiguous hint bit. Setting this bit to 1 indicates that 16 adjacent 
translation table entries point to a contiguous output address range. 
These 16 entries must be aligned in the translation table so that the top 
5 bits of their input addresses, that index their position in the 
translation table, are the same. For example, referring to Figure 12.21, 
to use this hint for a block of 16 entries in the third-level translation 
table, bits[20:16] of the input addresses for the 16 entries must be the 
same.

The contiguous output address range must be aligned to size of 16 
translation table entries at the same translation table level.

Use of this hint means that the TLB can cache a single entry to cover the 
16 translation table entries.

This bit is only a hint bit. The architecture does not require a processor 
to cache TLB entries in this way. To avoid TLB coherency issues, any TLB 
maintenance by address must not assume any optimization of the TLB tables 
that might result from use of the hint bit.

next prev parent reply	other threads:[~2024-06-25 18:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 16:47 Yang Shi
2024-04-01 18:16 ` Jonathan Cameron
2024-04-02 20:04   ` Yang Shi
2024-04-04 18:57   ` Christoph Lameter (Ampere)
2024-04-04 19:33     ` David Hildenbrand
2024-04-09 18:41       ` Yang Shi
2024-04-09 18:44         ` David Hildenbrand
2024-04-30 14:41       ` Michal Hocko
2024-05-01 16:37         ` Yang Shi
2024-04-08 16:30     ` Matthew Wilcox
2024-04-08 18:56       ` Zi Yan
2024-04-09 10:47         ` Ryan Roberts
2024-06-25 11:12           ` Ryan Roberts
2024-06-25 18:11             ` Christoph Lameter (Ampere) [this message]
2024-06-26 10:47               ` Ryan Roberts
2024-06-27 20:54             ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7b60a17-4dd8-aef5-e2d5-8c268595f0d1@linux.com \
    --to=cl@linux.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@suse.com \
    --cc=olivier.singla@amperecomputing.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox