On 19/03/2025 11:38, Ryan Roberts wrote: > Hi All, > > I know this is very last minute, but I was hoping that it might be possible to > squeeze in a session to discuss the following? > > Summary/Background: > > On arm64, physically contiguous and naturally aligned regions can take advantage > of contpte mappings (e.g. 64 KB) to reduce iTLB pressure. However, for file > regions containing text, current readahead behaviour often yields small, > misaligned folios, preventing this optimization. This proposal introduces a > special-case path for executable mappings, performing synchronous reads of an > architecture-chosen size into large folios (64 KB on arm64). Early performance > tests on real-world workloads (e.g. nginx, redis, kernel compilation) show ~2-9% > gains. > > I’ve previously posted attempts to enable this performance improvement ([1], > [2]), but there were objections and conversation fizzled out. Now that I have > more compelling performance data, I’m hoping there is now stronger > justification, and we can find a path forwards. > > What I’d Like to Cover: > > - Describe how text memory should ideally be mapped and why it benefits > performance. > > - Brief review of performance data. > > - Discuss options for the best way to encourage text into large folios: > - Let the architecture request a preferred size > - Extend VMA attributes to include preferred THP size hint > - Provide a sysfs knob > - Plug into the “mapping min folio order” infrastructure > - Other approaches? Slides from session attached. Includes fix to diagram on slide 3; Matthew was correct that we don't align to the exact sync/async boundary, but extend the async region down to the previous folio boundary. > > [1] https://lore.kernel.org/all/20240215154059.2863126-1-ryan.roberts@arm.com/ > [2] https://lore.kernel.org/all/20240717071257.4141363-1-ryan.roberts@arm.com/ > > Thanks, > Ryan