linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* mm: swapin read-ahead and zram
@ 2025-08-15  5:51 Sergey Senozhatsky
  2025-08-15  6:20 ` Sergey Senozhatsky
  0 siblings, 1 reply; 2+ messages in thread
From: Sergey Senozhatsky @ 2025-08-15  5:51 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Michal Hocko
  Cc: Suren Baghdasaryan, Suleiman Souhlal, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello,

We are seeing an unexpected behavior under standard memory pressure
test with zram being configured as a swap device (I tested on several
LTS kernels: 6.12, 6.6, 5.4).  Namely, we observe multiple, repetitive
reads of (compressed) zram entries, sometimes in a very short time span:

...
[ 1523.345784] zram: decompress entry idx:1615265 zsmalloc handle:ffffa28c8be3ee70 obj_size:986 num_reads:188
[ 1523.365401] zram: decompress entry idx:1615265 zsmalloc handle:ffffa28c8be3ee70 obj_size:986 num_reads:189
[ 1523.385934] zram: decompress entry idx:1307291 zsmalloc handle:ffffa28c70100b50 obj_size:788 num_reads:227
[ 1523.405098] zram: decompress entry idx:150916 zsmalloc handle:ffffa28c70114fc0 obj_size:436 num_reads:230
[ 1523.475162] zram: decompress entry idx:266372 zsmalloc handle:ffffa28c4566e5e0 obj_size:437 num_reads:192
[ 1523.476785] zram: decompress entry idx:1615262 zsmalloc handle:ffffa28c8be3efe0 obj_size:518 num_reads:99
[ 1523.476899] zram: decompress entry idx:1294524 zsmalloc handle:ffffa28c475825d0 obj_size:436 num_reads:97
[ 1523.477323] zram: decompress entry idx:266373 zsmalloc handle:ffffa28c4566e828 obj_size:434 num_reads:111
[ 1523.478081] zram: decompress entry idx:1638538 zsmalloc handle:ffffa28c70100c40 obj_size:930 num_reads:40
[ 1523.478631] zram: decompress entry idx:1307301 zsmalloc handle:ffffa28c70100348 obj_size:0 num_reads:87
[ 1523.507349] zram: decompress entry idx:1307293 zsmalloc handle:ffffa28c701007c8 obj_size:989 num_reads:98
[ 1523.540930] zram: decompress entry idx:1294528 zsmalloc handle:ffffa28c47582e60 obj_size:441 num_reads:386
[ 1523.540930] zram: decompress entry idx:266372 zsmalloc handle:ffffa28c4566e5e0 obj_size:437 num_reads:193
[ 1523.540958] zram: decompress entry idx:1294534 zsmalloc handle:ffffa28c47582b30 obj_size:520 num_reads:176
[ 1523.540998] zram: decompress entry idx:1615262 zsmalloc handle:ffffa28c8be3efe0 obj_size:518 num_reads:100
[ 1523.541063] zram: decompress entry idx:1615259 zsmalloc handle:ffffa28c8be3e970 obj_size:428 num_reads:171
[ 1523.541101] zram: decompress entry idx:1294524 zsmalloc handle:ffffa28c475825d0 obj_size:436 num_reads:98
[ 1523.541212] zram: decompress entry idx:150916 zsmalloc handle:ffffa28c70114fc0 obj_size:436 num_reads:231
[ 1523.541379] zram: decompress entry idx:1638538 zsmalloc handle:ffffa28c70100c40 obj_size:930 num_reads:41
[ 1523.541412] zram: decompress entry idx:1294521 zsmalloc handle:ffffa28c47582548 obj_size:936 num_reads:70
[ 1523.541771] zram: decompress entry idx:1592754 zsmalloc handle:ffffa28c43a94738 obj_size:0 num_reads:72
[ 1523.541840] zram: decompress entry idx:1615265 zsmalloc handle:ffffa28c8be3ee70 obj_size:986 num_reads:190
[ 1523.547630] zram: decompress entry idx:1307298 zsmalloc handle:ffffa28c70100940 obj_size:797 num_reads:112
[ 1523.547771] zram: decompress entry idx:1307291 zsmalloc handle:ffffa28c70100b50 obj_size:788 num_reads:228
[ 1523.550138] zram: decompress entry idx:1307296 zsmalloc handle:ffffa28c70100f20 obj_size:682 num_reads:61
[ 1523.555016] zram: decompress entry idx:266385 zsmalloc handle:ffffa28c4566e7c0 obj_size:679 num_reads:103
[ 1523.566361] zram: decompress entry idx:1294524 zsmalloc handle:ffffa28c475825d0 obj_size:436 num_reads:99
[ 1523.566428] zram: decompress entry idx:1294528 zsmalloc handle:ffffa28c47582e60 obj_size:441 num_reads:387
...

For instance, notice how entry 1615265 is read, decompressed, then
presumably evicted from the memory, and read/decompressed again
soon after, almost immediately.  Also notice how that entry 1615265
has already went through this cycle 189 times.  It's not entirely
clear why this happens.

As far as I can tell, it seems that these extra zram reads are coming from
the swapin read-ahead:
 handle_mm_fault
  do_swap_page
   swapin_readahead
    swap_read_folio
     submit_bio_wait
      submit_bio_noacct_nocheck
       __submit_bio
        zram_submit_bio
         zram_read_page
          zram_read_from_zspool

There are several issues with this.

First, on systems with zram powered swap devices, these extra reads result
in extra decompressions, which translates into excessive CPU (S/W compression)
and battery usage.  Along with the fact that each decompression first requires
a zsmalloc map() call, which may result in memcpy() (if compressed object
spans two physical pages).

Second, the read-ahead pages are likely to increase memory pressure, as
each read-ahead object decompresses into a PAGE_SIZE object, while we
also hold the compressed object in zsmalloc pool (until slot-free
notification).

Setting `sysctl -w vm.page-cluster=0` doesn't seem to help, because
page-cluster 0 limits the number of pages read-ahead to 1, so we still
read-ahead.

Can swapin read-ahead be entirely disabled for zram swap devices?


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-15  6:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-15  5:51 mm: swapin read-ahead and zram Sergey Senozhatsky
2025-08-15  6:20 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox