On 19/03/2025 11:38, Ryan Roberts wrote:
> Hi All,
> 
> I know this is very last minute, but I was hoping that it might be possible to
> squeeze in a session to discuss the following?
> 
> Summary/Background:
> 
> On arm64, physically contiguous and naturally aligned regions can take advantage
> of contpte mappings (e.g. 64 KB) to reduce iTLB pressure. However, for file
> regions containing text, current readahead behaviour often yields small,
> misaligned folios, preventing this optimization. This proposal introduces a
> special-case path for executable mappings, performing synchronous reads of an
> architecture-chosen size into large folios (64 KB on arm64). Early performance
> tests on real-world workloads (e.g. nginx, redis, kernel compilation) show ~2-9%
> gains.
> 
> I’ve previously posted attempts to enable this performance improvement ([1],
> [2]), but there were objections and conversation fizzled out. Now that I have
> more compelling performance data, I’m hoping there is now stronger
> justification, and we can find a path forwards.
> 
> What I’d Like to Cover:
> 
>  - Describe how text memory should ideally be mapped and why it benefits
>    performance.
> 
>  - Brief review of performance data.
> 
>  - Discuss options for the best way to encourage text into large folios:
>      - Let the architecture request a preferred size
>      - Extend VMA attributes to include preferred THP size hint
>      - Provide a sysfs knob
>      - Plug into the “mapping min folio order” infrastructure
>      - Other approaches?

Slides from session attached. Includes fix to diagram on slide 3; Matthew was
correct that we don't align to the exact sync/async boundary, but extend the
async region down to the previous folio boundary.

> 
> [1] https://lore.kernel.org/all/20240215154059.2863126-1-ryan.roberts@arm.com/
> [2] https://lore.kernel.org/all/20240717071257.4141363-1-ryan.roberts@arm.com/
> 
> Thanks,
> Ryan