linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Luke Yang <luyang@redhat.com>
To: dev.jain@arm.com
Cc: jhladky@redhat.com, akpm@linux-foundation.org,
	Liam.Howlett@oracle.com, willy@infradead.org, surenb@google.com,
	vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [REGRESSION] mm/mprotect: 2x+ slowdown for >=400KiB regions since PTE batching (cac1db8c3aad)
Date: Fri, 13 Feb 2026 10:08:14 -0500	[thread overview]
Message-ID: <aY8-XuFZ7zCvXulB@luyang-thinkpadp1gen7.toromso.csb> (raw)

Hello,

we have bisected a significant mprotect() performance regression in
6.17-rc1 to:

cac1db8c3aad ("mm: optimize mprotect() by PTE batching")

The regression becomes clearly visible starting around 400 KiB region
sizes and above. It is also still present in the latest 6.19 kernel.

## Test description

The reproducer repeatedly toggles protection (PROT_NONE <->
PROT_READ|PROT_WRITE) over a single mapped region in a tight loop. All
pages change protection in each iteration.

The benchmark sweeps region sizes from 4 KiB up to 40 GiB.

We bisected between 6.16 and 6.17-rc1 and confirmed that reverting
cac1db8c3aad on top of 6.17-rc1 largely restores the 6.16 performance
characteristics.

## perf observations

In 6.17-rc1, commit_anon_folio_batch() becomes hot and accounts for a
significant portion of cycles inside change_pte_range(). Instruction
count in change_pte_range() increases noticeably in 6.17-rc1.
commit_anon_folio_batch() was added as part of cac1db8c3aad.

The regression is also present for the following servers: AMD EPYC 2 (Rome),
AMD EPYC3 (Milan), AMD EPYC3 (Milanx), AMD EPYC4 (Zen4c Bergamo), Ampere Mt
Snow Altra with KVM virt type (ARM Neoverse-N1) , Lenovo Thinkpad T460p (Intel
Skylake 6820HQ).

## Results (nsec per mprotect call) collected on AMD EPYC Zen3 (Milan)
server.

v6.16
size_kib | nsec_per_call
4        | 1713
40       | 2071
400      | 3453
4000     | 18804
40000    | 172613
400000   | 1699301
4000000  | 17021882
40000000 | 169677478

v6.17-rc1
size_kib | nsec_per_call
4        | 1775
40       | 2362
400      | 5993
4000     | 44116
40000    | 427731
400000   | 4252714
4000000  | 42512805
40000000 | 424995500

v6.17-rc1 with cac1db8c3aad reverted
size_kib | nsec_per_call
4        | 1750
40       | 2126
400      | 3800
4000     | 22227
40000    | 205446
400000   | 2011634
4000000  | 20144468
40000000 | 200764472

This workload appears to be the worst case for the new batching logic,
where batching overhead dominates, and no amortization benefit is
achieved.

We will provide the following minimal reproducers:

* mprot_tw4m_regsize_sweep_one_region.sh
* mprot_tw4m_regsize.c

Please let us know if additional data would be useful.

Reported-by: Luke Yang luyang@redhat.com
Reported-by: Jirka Hladky jhladky@redhat.com

Thank you
Luke

Reproducer
----------


mprot_tw4m_regsize_sweep_one_region.sh
--- cut here ---
#!/bin/bash
gcc -Wall -Wextra -O1 -o mprot_tw4m_regsize mprot_tw4m_regsize.c
if ! [ -x "./mprot_tw4m_regsize" ]; then
 echo "No ./mprot_tw4m_regsize binary, compilation failed?"
 exit 1
fi

DIR="$(date '+%Y-%b-%d_%Hh%Mm%Ss')_$(uname -r)"
mkdir -p "$DIR"

# Sweep region size from 4K to 4G (10x each step), 1 region.
# Iterations decrease by 10x to keep runtime roughly constant.
#   size_kib   iterations
runs=(
   "4          40000000"
   "40         4000000"
   "400        400000"
   "4000       40000"
   "40000      4000"
   "400000     400"
   "4000000    40"
   "40000000   4"
)

for entry in "${runs[@]}"; do
   read -r size_kib iters <<< "$entry"
   logfile="$DIR/regsize_${size_kib}k.log"
   echo "=== Region size: ${size_kib} KiB, iterations: ${iters} ==="
   sync; sync
   echo 3 > /proc/sys/vm/drop_caches
   taskset -c 0 ./mprot_tw4m_regsize "$size_kib" 1 "$iters" 2>&1 | tee "$logfile"
   echo ""
done

# Create CSV summary from log files
csv="$DIR/summary.csv"
echo "size_kib,runtime_sec,nsec_per_call" > "$csv"
for entry in "${runs[@]}"; do
   read -r size_kib _ <<< "$entry"
   logfile="$DIR/regsize_${size_kib}k.log"
   runtime=$(grep -oP 'Runtime: \K[0-9.]+' "$logfile")
   nsec=$(grep -oP 'Avg: \K[0-9.]+(?= nsec/call)' "$logfile")
   echo "${size_kib},${runtime},${nsec}" >> "$csv"
done

echo "Results saved in $DIR/"
echo "CSV summary:"
cat "$csv"
--- cut here ---

mprot_tw4m_regsize.c
--- cut here ---
/*
* Reproduce libmicro mprot_tw4m benchmark - Time mprotect() with configurable region size
* gcc -Wall -Wextra -O1 mprot_tw4m_regsize.c -o mprot_tw4m_regsize
* DEBUG: gcc -Wall -Wextra -g -fsanitize=undefined -O1 mprot_tw4m_regsize.c -o mprot_tw4m_regsize
* ./mprot_tw4m_regsize <region_size_kib> <region_count> <iterations>
*/

#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <strings.h>
#include <time.h>

typedef volatile char vchar_t;

static __inline__ u_int64_t start_clock();
static __inline__ u_int64_t stop_clock();

int main(int argc, char **argv)
{
   int i, j, ret;
   long long k;

   if (argc < 4) {
       printf("USAGE: %s region_size_kib region_count iterations\n", argv[0]);
       printf("Creates multiple regions and times mprotect() calls\n");
       return 1;
   }

   long region_size = atol(argv[1]) * 1024L;
   int region_count = atoi(argv[2]);
   int iterations = atoi(argv[3]);

   int pagesize = sysconf(_SC_PAGESIZE);

   vchar_t **regions = malloc(region_count * sizeof(vchar_t*));
   if (!regions) {
       perror("malloc");
       return 1;
   }

   for (i = 0; i < region_count; i++) {
       regions[i] = (vchar_t *) mmap(NULL, region_size,
                     PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0L);

       if (regions[i] == MAP_FAILED) {
           perror("mmap");
           exit(1);
       }

       for (k = 0; k < region_size; k += pagesize) {
           regions[i][k] = 1;
       }
   }

   printf("Created %d regions of %ldKiB each. Starting %d mprotect operations per region...\n",
          region_count, region_size / 1024, iterations);

   struct timespec start_time, end_time;
   clock_gettime(CLOCK_MONOTONIC, &start_time);
   u_int64_t start_rdtsc = start_clock();

   for (j = 0; j < iterations; j++) {
       for (i = 0; i < region_count; i++) {
           int prot;

           if ((i + j) % 2 == 0) {
               prot = PROT_NONE;
           } else {
               prot = PROT_READ | PROT_WRITE;
           }

           ret = mprotect((void *)regions[i], region_size, prot);
           if (ret != 0) {
               perror("mprotect");
               printf("mprotect error at region %d, iteration %d\n", i, j);
           }
       }
   }

   u_int64_t stop_rdtsc = stop_clock();
   clock_gettime(CLOCK_MONOTONIC, &end_time);
   u_int64_t diff = stop_rdtsc - start_rdtsc;

   long total_calls = (long)region_count * iterations;
   double runtime_sec = (end_time.tv_sec - start_time.tv_sec) +
                       (end_time.tv_nsec - start_time.tv_nsec) / 1000000000.0;

   double nsec_per_call = (runtime_sec * 1e9) / total_calls;

   printf("TSC for %ld mprotect calls on %d x %ldKiB regions: %ld K-cycles.  Avg: %g K-cycles/call\n",
          total_calls,
          region_count,
          region_size / 1024,
          diff/1000,
          ((double)(diff)/(double)(total_calls))/1000.0);
   printf("Runtime: %.6f seconds.  Avg: %.3f nsec/call\n", runtime_sec, nsec_per_call);

   for (i = 0; i < region_count; i++) {
       munmap((void *)regions[i], region_size);
   }
   free(regions);

   return 0;
}

static __inline__ u_int64_t start_clock() {
   // See: Intel Doc #324264, "How to Benchmark Code Execution Times on Intel...",
   u_int32_t hi, lo;
   __asm__ __volatile__ (
       "CPUID\n\t"
       "RDTSC\n\t"
       "mov %%edx, %0\n\t"
       "mov %%eax, %1\n\t": "=r" (hi), "=r" (lo)::
       "%rax", "%rbx", "%rcx", "%rdx");
   return ( (u_int64_t)lo) | ( ((u_int64_t)hi) << 32);
}

static __inline__ u_int64_t stop_clock() {
   // See: Intel Doc #324264, "How to Benchmark Code Execution Times on Intel...",
   u_int32_t hi, lo;
   __asm__ __volatile__(
       "RDTSCP\n\t"
       "mov %%edx, %0\n\t"
       "mov %%eax, %1\n\t"
       "CPUID\n\t": "=r" (hi), "=r" (lo)::
       "%rax", "%rbx", "%rcx", "%rdx");
   return ( (u_int64_t)lo) | ( ((u_int64_t)hi) << 32);
}
--- cut here ---



             reply	other threads:[~2026-02-13 15:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-13 15:08 Luke Yang [this message]
2026-02-13 15:47 ` David Hildenbrand (Arm)
2026-02-13 16:24   ` Pedro Falcato
2026-02-13 17:16     ` Suren Baghdasaryan
2026-02-13 17:26       ` David Hildenbrand (Arm)
2026-02-16 10:12         ` Dev Jain
2026-02-16 14:56           ` Pedro Falcato
2026-02-17 17:43           ` Luke Yang
2026-02-17 18:08             ` Pedro Falcato
2026-02-18  5:01               ` Dev Jain
2026-02-18 10:06                 ` Pedro Falcato
2026-02-18 10:38                   ` Dev Jain
2026-02-18 10:46                     ` David Hildenbrand (Arm)
2026-02-18 11:58                       ` Pedro Falcato
2026-02-18 12:24                         ` David Hildenbrand (Arm)
2026-02-19 12:15                           ` Pedro Falcato
2026-02-19 13:02                             ` David Hildenbrand (Arm)
2026-02-19 15:00                               ` Pedro Falcato
2026-02-19 15:29                                 ` David Hildenbrand (Arm)
2026-02-20  4:12                                 ` Dev Jain
2026-02-18 11:52                     ` Pedro Falcato
2026-02-18  4:50             ` Dev Jain
2026-02-18 13:29 ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aY8-XuFZ7zCvXulB@luyang-thinkpadp1gen7.toromso.csb \
    --to=luyang@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dev.jain@arm.com \
    --cc=jhladky@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox