* [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads
@ 2025-01-30 11:10 Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 01/11] zram: switch to non-atomic entry locking Sergey Senozhatsky
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
This is Part I which only changes zram. In order to make reads()
preemptible we also need Part II [1] which changes zsmalloc.
There are more details in the commit messages, but in short:
Currently zram runs compression and decompression in non-preemptible
sections, e.g.
zcomp_stream_get() // grabs CPU local lock
zcomp_compress()
or
zram_slot_lock() // grabs entry spin-lock
zcomp_stream_get() // grabs CPU local lock
zs_map_object() // grabs rwlock and CPU local lock
zcomp_decompress()
Potentially a little troublesome for a number of reasons.
For instance, this makes it impossible to use async compression
algorithms or/and H/W compression algorithms, which can wait for OP
completion or resource availability. This also restricts what
compression algorithms can do internally, for example, zstd can
allocate internal state memory for C/D dictionaries:
do_fsync()
do_writepages()
zram_bio_write()
zram_write_page() // become non-preemptible
zcomp_compress()
zstd_compress()
ZSTD_compress_usingCDict()
ZSTD_compressBegin_usingCDict_internal()
ZSTD_resetCCtx_usingCDict()
ZSTD_resetCCtx_internal()
zstd_custom_alloc() // memory allocation
Not to mention that the system can be configured to maximize
compression ratio at a cost of CPU/HW time (e.g. lz4hc or deflate
with very high compression level) so zram can stay in non-preemptible
section (even under spin-lock or/and rwlock) for an extended period
of time. Aside from compression algorithms, this also restricts what
zram can do. One particular example is zram_write_page() zsmalloc
handle allocation, which has an optimistic allocation (disallowing
direct reclaim) and a pessimistic fallback path, which then forces
zram to compress the page one more time.
This series changes zram to not directly impose atomicity restrictions
on compression algorithms (and on itself), which makes zram write()
fully preemptible; zram read(), sadly, is not always preemptible. There
are still indirect atomicity restrictions imposed by zsmalloc(). Changing
zsmalloc to permit preemption under zs_map_object() is a separate effort
(Part II [1]) and will be posted shortly.
[1] https://lore.kernel.org/linux-mm/20250130044455.2642465-1-senozhatsky@chromium.org
v2 -> v3:
- Fixed a potential double stream-put during recompression (Dan Carpenter)
- Added a patch to filter out recompression slots by compression priority
- Tweaked cmpxchg loops (based on similar zsmalloc feedback from Uros
Bizjak)
- Added some more sanity checks to recompression
Sergey Senozhatsky (11):
zram: switch to non-atomic entry locking
zram: do not use per-CPU compression streams
zram: remove crypto include
zram: remove max_comp_streams device attr
zram: remove two-staged handle allocation
zram: permit reclaim in zstd custom allocator
zram: permit reclaim in recompression handle allocation
zram: remove writestall zram_stats member
zram: limit max recompress prio to num_active_comps
zram: filter out recomp targets based on priority
zram: unlock slot during recompression
Documentation/ABI/testing/sysfs-block-zram | 8 -
Documentation/admin-guide/blockdev/zram.rst | 36 +-
drivers/block/zram/backend_zstd.c | 11 +-
drivers/block/zram/zcomp.c | 165 +++++----
drivers/block/zram/zcomp.h | 17 +-
drivers/block/zram/zram_drv.c | 353 ++++++++++----------
drivers/block/zram/zram_drv.h | 9 +-
include/linux/cpuhotplug.h | 1 -
8 files changed, 283 insertions(+), 317 deletions(-)
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 01/11] zram: switch to non-atomic entry locking
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 02/11] zram: do not use per-CPU compression streams Sergey Senozhatsky
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Concurrent modifications of meta table entries is now handled
by per-entry spin-lock. This has a number of shortcomings.
First, this imposes atomic requirements on compression backends.
zram can call both zcomp_compress() and zcomp_decompress() under
entry spin-lock, which implies that we can use only compression
algorithms that don't schedule/sleep/wait during compression and
decompression. This, for instance, makes it impossible to use
some of the ASYNC compression algorithms (H/W compression, etc.)
implementations.
Second, this can potentially trigger watchdogs. For example,
entry re-compression with secondary algorithms is performed
under entry spin-lock. Given that we chain secondary
compression algorithms and that some of them can be configured
for best compression ratio (and worst compression speed) zram
can stay under spin-lock for quite some time.
Do not use per-entry spin-locks and instead convert it to an
atomic_t variable which open codes reader-writer type of lock.
This permits preemption from slot_lock section.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 148 ++++++++++++++++++++--------------
drivers/block/zram/zram_drv.h | 7 +-
2 files changed, 91 insertions(+), 64 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9f5020b077c5..a8d22ae2a066 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -58,19 +58,50 @@ static void zram_free_page(struct zram *zram, size_t index);
static int zram_read_from_zspool(struct zram *zram, struct page *page,
u32 index);
-static int zram_slot_trylock(struct zram *zram, u32 index)
+static bool zram_slot_try_write_lock(struct zram *zram, u32 index)
{
- return spin_trylock(&zram->table[index].lock);
+ atomic_t *lock = &zram->table[index].lock;
+ int old = ZRAM_ENTRY_UNLOCKED;
+
+ return atomic_try_cmpxchg(lock, &old, ZRAM_ENTRY_WRLOCKED);
}
-static void zram_slot_lock(struct zram *zram, u32 index)
+static void zram_slot_write_lock(struct zram *zram, u32 index)
{
- spin_lock(&zram->table[index].lock);
+ atomic_t *lock = &zram->table[index].lock;
+ int old = atomic_read(lock);
+
+ do {
+ if (old != ZRAM_ENTRY_UNLOCKED) {
+ cond_resched();
+ old = atomic_read(lock);
+ continue;
+ }
+ } while (!atomic_try_cmpxchg(lock, &old, ZRAM_ENTRY_WRLOCKED));
+}
+
+static void zram_slot_write_unlock(struct zram *zram, u32 index)
+{
+ atomic_set(&zram->table[index].lock, ZRAM_ENTRY_UNLOCKED);
+}
+
+static void zram_slot_read_lock(struct zram *zram, u32 index)
+{
+ atomic_t *lock = &zram->table[index].lock;
+ int old = atomic_read(lock);
+
+ do {
+ if (old == ZRAM_ENTRY_WRLOCKED) {
+ cond_resched();
+ old = atomic_read(lock);
+ continue;
+ }
+ } while (!atomic_try_cmpxchg(lock, &old, old + 1));
}
-static void zram_slot_unlock(struct zram *zram, u32 index)
+static void zram_slot_read_unlock(struct zram *zram, u32 index)
{
- spin_unlock(&zram->table[index].lock);
+ atomic_dec(&zram->table[index].lock);
}
static inline bool init_done(struct zram *zram)
@@ -93,7 +124,6 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle)
zram->table[index].handle = handle;
}
-/* flag operations require table entry bit_spin_lock() being held */
static bool zram_test_flag(struct zram *zram, u32 index,
enum zram_pageflags flag)
{
@@ -229,9 +259,9 @@ static void release_pp_slot(struct zram *zram, struct zram_pp_slot *pps)
{
list_del_init(&pps->entry);
- zram_slot_lock(zram, pps->index);
+ zram_slot_write_lock(zram, pps->index);
zram_clear_flag(zram, pps->index, ZRAM_PP_SLOT);
- zram_slot_unlock(zram, pps->index);
+ zram_slot_write_unlock(zram, pps->index);
kfree(pps);
}
@@ -394,11 +424,11 @@ static void mark_idle(struct zram *zram, ktime_t cutoff)
*
* And ZRAM_WB slots simply cannot be ZRAM_IDLE.
*/
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
if (!zram_allocated(zram, index) ||
zram_test_flag(zram, index, ZRAM_WB) ||
zram_test_flag(zram, index, ZRAM_SAME)) {
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
continue;
}
@@ -410,7 +440,7 @@ static void mark_idle(struct zram *zram, ktime_t cutoff)
zram_set_flag(zram, index, ZRAM_IDLE);
else
zram_clear_flag(zram, index, ZRAM_IDLE);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
}
}
@@ -709,7 +739,7 @@ static int scan_slots_for_writeback(struct zram *zram, u32 mode,
INIT_LIST_HEAD(&pps->entry);
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
if (!zram_allocated(zram, index))
goto next;
@@ -731,7 +761,7 @@ static int scan_slots_for_writeback(struct zram *zram, u32 mode,
place_pp_slot(zram, ctl, pps);
pps = NULL;
next:
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
}
kfree(pps);
@@ -822,7 +852,7 @@ static ssize_t writeback_store(struct device *dev,
}
index = pps->index;
- zram_slot_lock(zram, index);
+ zram_slot_read_lock(zram, index);
/*
* scan_slots() sets ZRAM_PP_SLOT and relases slot lock, so
* slots can change in the meantime. If slots are accessed or
@@ -833,7 +863,7 @@ static ssize_t writeback_store(struct device *dev,
goto next;
if (zram_read_from_zspool(zram, page, index))
goto next;
- zram_slot_unlock(zram, index);
+ zram_slot_read_unlock(zram, index);
bio_init(&bio, zram->bdev, &bio_vec, 1,
REQ_OP_WRITE | REQ_SYNC);
@@ -860,7 +890,7 @@ static ssize_t writeback_store(struct device *dev,
}
atomic64_inc(&zram->stats.bd_writes);
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
/*
* Same as above, we release slot lock during writeback so
* slot can change under us: slot_free() or slot_free() and
@@ -882,7 +912,7 @@ static ssize_t writeback_store(struct device *dev,
zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12);
spin_unlock(&zram->wb_limit_lock);
next:
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
release_pp_slot(zram, pps);
cond_resched();
@@ -1001,7 +1031,7 @@ static ssize_t read_block_state(struct file *file, char __user *buf,
for (index = *ppos; index < nr_pages; index++) {
int copied;
- zram_slot_lock(zram, index);
+ zram_slot_read_lock(zram, index);
if (!zram_allocated(zram, index))
goto next;
@@ -1019,13 +1049,13 @@ static ssize_t read_block_state(struct file *file, char __user *buf,
ZRAM_INCOMPRESSIBLE) ? 'n' : '.');
if (count <= copied) {
- zram_slot_unlock(zram, index);
+ zram_slot_read_unlock(zram, index);
break;
}
written += copied;
count -= copied;
next:
- zram_slot_unlock(zram, index);
+ zram_slot_read_unlock(zram, index);
*ppos += 1;
}
@@ -1455,33 +1485,31 @@ static void zram_meta_free(struct zram *zram, u64 disksize)
static bool zram_meta_alloc(struct zram *zram, u64 disksize)
{
- size_t num_pages, index;
+ size_t num_ents, index;
- num_pages = disksize >> PAGE_SHIFT;
- zram->table = vzalloc(array_size(num_pages, sizeof(*zram->table)));
+ num_ents = disksize >> PAGE_SHIFT;
+ zram->table = vzalloc(array_size(num_ents, sizeof(*zram->table)));
if (!zram->table)
- return false;
+ goto error;
zram->mem_pool = zs_create_pool(zram->disk->disk_name);
- if (!zram->mem_pool) {
- vfree(zram->table);
- zram->table = NULL;
- return false;
- }
+ if (!zram->mem_pool)
+ goto error;
if (!huge_class_size)
huge_class_size = zs_huge_class_size(zram->mem_pool);
- for (index = 0; index < num_pages; index++)
- spin_lock_init(&zram->table[index].lock);
+ for (index = 0; index < num_ents; index++)
+ atomic_set(&zram->table[index].lock, ZRAM_ENTRY_UNLOCKED);
+
return true;
+
+error:
+ vfree(zram->table);
+ zram->table = NULL;
+ return false;
}
-/*
- * To protect concurrent access to the same index entry,
- * caller should hold this table index entry's bit_spinlock to
- * indicate this index entry is accessing.
- */
static void zram_free_page(struct zram *zram, size_t index)
{
unsigned long handle;
@@ -1602,17 +1630,17 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index,
{
int ret;
- zram_slot_lock(zram, index);
+ zram_slot_read_lock(zram, index);
if (!zram_test_flag(zram, index, ZRAM_WB)) {
/* Slot should be locked through out the function call */
ret = zram_read_from_zspool(zram, page, index);
- zram_slot_unlock(zram, index);
+ zram_slot_read_unlock(zram, index);
} else {
/*
* The slot should be unlocked before reading from the backing
* device.
*/
- zram_slot_unlock(zram, index);
+ zram_slot_read_unlock(zram, index);
ret = read_from_bdev(zram, page, zram_get_handle(zram, index),
parent);
@@ -1655,10 +1683,10 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
static int write_same_filled_page(struct zram *zram, unsigned long fill,
u32 index)
{
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
zram_set_flag(zram, index, ZRAM_SAME);
zram_set_handle(zram, index, fill);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
atomic64_inc(&zram->stats.same_pages);
atomic64_inc(&zram->stats.pages_stored);
@@ -1693,11 +1721,11 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
kunmap_local(src);
zs_unmap_object(zram->mem_pool, handle);
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
zram_set_flag(zram, index, ZRAM_HUGE);
zram_set_handle(zram, index, handle);
zram_set_obj_size(zram, index, PAGE_SIZE);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
atomic64_add(PAGE_SIZE, &zram->stats.compr_data_size);
atomic64_inc(&zram->stats.huge_pages);
@@ -1718,9 +1746,9 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
bool same_filled;
/* First, free memory allocated to this slot (if any) */
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
zram_free_page(zram, index);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
mem = kmap_local_page(page);
same_filled = page_same_filled(mem, &element);
@@ -1790,10 +1818,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
zs_unmap_object(zram->mem_pool, handle);
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
zram_set_handle(zram, index, handle);
zram_set_obj_size(zram, index, comp_len);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
/* Update stats */
atomic64_inc(&zram->stats.pages_stored);
@@ -1850,7 +1878,7 @@ static int scan_slots_for_recompress(struct zram *zram, u32 mode,
INIT_LIST_HEAD(&pps->entry);
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
if (!zram_allocated(zram, index))
goto next;
@@ -1871,7 +1899,7 @@ static int scan_slots_for_recompress(struct zram *zram, u32 mode,
place_pp_slot(zram, ctl, pps);
pps = NULL;
next:
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
}
kfree(pps);
@@ -2162,7 +2190,7 @@ static ssize_t recompress_store(struct device *dev,
if (!num_recomp_pages)
break;
- zram_slot_lock(zram, pps->index);
+ zram_slot_write_lock(zram, pps->index);
if (!zram_test_flag(zram, pps->index, ZRAM_PP_SLOT))
goto next;
@@ -2170,7 +2198,7 @@ static ssize_t recompress_store(struct device *dev,
&num_recomp_pages, threshold,
prio, prio_max);
next:
- zram_slot_unlock(zram, pps->index);
+ zram_slot_write_unlock(zram, pps->index);
release_pp_slot(zram, pps);
if (err) {
@@ -2217,9 +2245,9 @@ static void zram_bio_discard(struct zram *zram, struct bio *bio)
}
while (n >= PAGE_SIZE) {
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
zram_free_page(zram, index);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
atomic64_inc(&zram->stats.notify_free);
index++;
n -= PAGE_SIZE;
@@ -2248,9 +2276,9 @@ static void zram_bio_read(struct zram *zram, struct bio *bio)
}
flush_dcache_page(bv.bv_page);
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
zram_accessed(zram, index);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
bio_advance_iter_single(bio, &iter, bv.bv_len);
} while (iter.bi_size);
@@ -2278,9 +2306,9 @@ static void zram_bio_write(struct zram *zram, struct bio *bio)
break;
}
- zram_slot_lock(zram, index);
+ zram_slot_write_lock(zram, index);
zram_accessed(zram, index);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
bio_advance_iter_single(bio, &iter, bv.bv_len);
} while (iter.bi_size);
@@ -2321,13 +2349,13 @@ static void zram_slot_free_notify(struct block_device *bdev,
zram = bdev->bd_disk->private_data;
atomic64_inc(&zram->stats.notify_free);
- if (!zram_slot_trylock(zram, index)) {
+ if (!zram_slot_try_write_lock(zram, index)) {
atomic64_inc(&zram->stats.miss_free);
return;
}
zram_free_page(zram, index);
- zram_slot_unlock(zram, index);
+ zram_slot_write_unlock(zram, index);
}
static void zram_comp_params_reset(struct zram *zram)
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index db78d7c01b9a..3436ddf8ab23 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -15,7 +15,6 @@
#ifndef _ZRAM_DRV_H_
#define _ZRAM_DRV_H_
-#include <linux/rwsem.h>
#include <linux/zsmalloc.h>
#include <linux/crypto.h>
@@ -28,7 +27,6 @@
#define ZRAM_SECTOR_PER_LOGICAL_BLOCK \
(1 << (ZRAM_LOGICAL_BLOCK_SHIFT - SECTOR_SHIFT))
-
/*
* ZRAM is mainly used for memory efficiency so we want to keep memory
* footprint small and thus squeeze size and zram pageflags into a flags
@@ -58,13 +56,14 @@ enum zram_pageflags {
__NR_ZRAM_PAGEFLAGS,
};
-/*-- Data structures */
+#define ZRAM_ENTRY_UNLOCKED 0
+#define ZRAM_ENTRY_WRLOCKED (-1)
/* Allocated for each disk page */
struct zram_table_entry {
unsigned long handle;
unsigned int flags;
- spinlock_t lock;
+ atomic_t lock;
#ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME
ktime_t ac_time;
#endif
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 02/11] zram: do not use per-CPU compression streams
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 01/11] zram: switch to non-atomic entry locking Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 03/11] zram: remove crypto include Sergey Senozhatsky
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Similarly to per-entry spin-lock per-CPU compression streams
also have a number of shortcoming.
First, per-CPU stream access has to be done from a non-preemptible
(atomic) section, which imposes the same atomicity requirements on
compression backends as entry spin-lock do and makes it impossible
to use algorithms that can schedule/wait/sleep during compression
and decompression.
Second, per-CPU streams noticeably increase memory usage (actually
more like wastage) of secondary compression streams. The problem
is that secondary compression streams are allocated per-CPU, just
like the primary streams are. Yet we never use more that one
secondary stream at a time, because recompression is a single
threaded action. Which means that remaining num_online_cpu() - 1
streams are allocated for nothing, and this is per-priority list
(we can have several secondary compression algorithms). Depending
on the algorithm this may lead to a significant memory wastage, in
addition each stream also carries a workmem buffer (2 physical
pages).
Instead of per-CPU streams, maintain a list of idle compression
streams and allocate new streams on-demand (something that we
used to do many years ago). So that zram read() and write() become
non-atomic and ease requirements on the compression algorithm
implementation. This also means that we now should have only one
secondary stream per-priority list.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zcomp.c | 164 +++++++++++++++++++---------------
drivers/block/zram/zcomp.h | 17 ++--
drivers/block/zram/zram_drv.c | 29 +++---
include/linux/cpuhotplug.h | 1 -
4 files changed, 109 insertions(+), 102 deletions(-)
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index bb514403e305..982c769d5831 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -6,7 +6,7 @@
#include <linux/slab.h>
#include <linux/wait.h>
#include <linux/sched.h>
-#include <linux/cpu.h>
+#include <linux/cpumask.h>
#include <linux/crypto.h>
#include <linux/vmalloc.h>
@@ -43,31 +43,40 @@ static const struct zcomp_ops *backends[] = {
NULL
};
-static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
+static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *strm)
{
- comp->ops->destroy_ctx(&zstrm->ctx);
- vfree(zstrm->buffer);
- zstrm->buffer = NULL;
+ comp->ops->destroy_ctx(&strm->ctx);
+ vfree(strm->buffer);
+ kfree(strm);
}
-static int zcomp_strm_init(struct zcomp *comp, struct zcomp_strm *zstrm)
+static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
{
+ struct zcomp_strm *strm;
int ret;
- ret = comp->ops->create_ctx(comp->params, &zstrm->ctx);
- if (ret)
- return ret;
+ strm = kzalloc(sizeof(*strm), GFP_KERNEL);
+ if (!strm)
+ return NULL;
+
+ INIT_LIST_HEAD(&strm->entry);
+
+ ret = comp->ops->create_ctx(comp->params, &strm->ctx);
+ if (ret) {
+ kfree(strm);
+ return NULL;
+ }
/*
- * allocate 2 pages. 1 for compressed data, plus 1 extra for the
- * case when compressed size is larger than the original one
+ * allocate 2 pages. 1 for compressed data, plus 1 extra in case if
+ * compressed data is larger than the original one.
*/
- zstrm->buffer = vzalloc(2 * PAGE_SIZE);
- if (!zstrm->buffer) {
- zcomp_strm_free(comp, zstrm);
- return -ENOMEM;
+ strm->buffer = vzalloc(2 * PAGE_SIZE);
+ if (!strm->buffer) {
+ zcomp_strm_free(comp, strm);
+ return NULL;
}
- return 0;
+ return strm;
}
static const struct zcomp_ops *lookup_backend_ops(const char *comp)
@@ -109,13 +118,59 @@ ssize_t zcomp_available_show(const char *comp, char *buf)
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
{
- local_lock(&comp->stream->lock);
- return this_cpu_ptr(comp->stream);
+ struct zcomp_strm *strm;
+
+ might_sleep();
+
+ while (1) {
+ spin_lock(&comp->strm_lock);
+ if (!list_empty(&comp->idle_strm)) {
+ strm = list_first_entry(&comp->idle_strm,
+ struct zcomp_strm,
+ entry);
+ list_del(&strm->entry);
+ spin_unlock(&comp->strm_lock);
+ return strm;
+ }
+
+ /* cannot allocate new stream, wait for an idle one */
+ if (comp->avail_strm >= num_online_cpus()) {
+ spin_unlock(&comp->strm_lock);
+ wait_event(comp->strm_wait,
+ !list_empty(&comp->idle_strm));
+ continue;
+ }
+
+ /* allocate new stream */
+ comp->avail_strm++;
+ spin_unlock(&comp->strm_lock);
+
+ strm = zcomp_strm_alloc(comp);
+ if (strm)
+ break;
+
+ spin_lock(&comp->strm_lock);
+ comp->avail_strm--;
+ spin_unlock(&comp->strm_lock);
+ wait_event(comp->strm_wait, !list_empty(&comp->idle_strm));
+ }
+
+ return strm;
}
-void zcomp_stream_put(struct zcomp *comp)
+void zcomp_stream_put(struct zcomp *comp, struct zcomp_strm *strm)
{
- local_unlock(&comp->stream->lock);
+ spin_lock(&comp->strm_lock);
+ if (comp->avail_strm <= num_online_cpus()) {
+ list_add(&strm->entry, &comp->idle_strm);
+ spin_unlock(&comp->strm_lock);
+ wake_up(&comp->strm_wait);
+ return;
+ }
+
+ comp->avail_strm--;
+ spin_unlock(&comp->strm_lock);
+ zcomp_strm_free(comp, strm);
}
int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
@@ -148,61 +203,19 @@ int zcomp_decompress(struct zcomp *comp, struct zcomp_strm *zstrm,
return comp->ops->decompress(comp->params, &zstrm->ctx, &req);
}
-int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
-{
- struct zcomp *comp = hlist_entry(node, struct zcomp, node);
- struct zcomp_strm *zstrm;
- int ret;
-
- zstrm = per_cpu_ptr(comp->stream, cpu);
- local_lock_init(&zstrm->lock);
-
- ret = zcomp_strm_init(comp, zstrm);
- if (ret)
- pr_err("Can't allocate a compression stream\n");
- return ret;
-}
-
-int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
-{
- struct zcomp *comp = hlist_entry(node, struct zcomp, node);
- struct zcomp_strm *zstrm;
-
- zstrm = per_cpu_ptr(comp->stream, cpu);
- zcomp_strm_free(comp, zstrm);
- return 0;
-}
-
-static int zcomp_init(struct zcomp *comp, struct zcomp_params *params)
-{
- int ret;
-
- comp->stream = alloc_percpu(struct zcomp_strm);
- if (!comp->stream)
- return -ENOMEM;
-
- comp->params = params;
- ret = comp->ops->setup_params(comp->params);
- if (ret)
- goto cleanup;
-
- ret = cpuhp_state_add_instance(CPUHP_ZCOMP_PREPARE, &comp->node);
- if (ret < 0)
- goto cleanup;
-
- return 0;
-
-cleanup:
- comp->ops->release_params(comp->params);
- free_percpu(comp->stream);
- return ret;
-}
-
void zcomp_destroy(struct zcomp *comp)
{
- cpuhp_state_remove_instance(CPUHP_ZCOMP_PREPARE, &comp->node);
+ struct zcomp_strm *strm;
+
+ while (!list_empty(&comp->idle_strm)) {
+ strm = list_first_entry(&comp->idle_strm,
+ struct zcomp_strm,
+ entry);
+ list_del(&strm->entry);
+ zcomp_strm_free(comp, strm);
+ }
+
comp->ops->release_params(comp->params);
- free_percpu(comp->stream);
kfree(comp);
}
@@ -229,7 +242,12 @@ struct zcomp *zcomp_create(const char *alg, struct zcomp_params *params)
return ERR_PTR(-EINVAL);
}
- error = zcomp_init(comp, params);
+ INIT_LIST_HEAD(&comp->idle_strm);
+ init_waitqueue_head(&comp->strm_wait);
+ spin_lock_init(&comp->strm_lock);
+
+ comp->params = params;
+ error = comp->ops->setup_params(comp->params);
if (error) {
kfree(comp);
return ERR_PTR(error);
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index ad5762813842..62330829db3f 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -3,10 +3,10 @@
#ifndef _ZCOMP_H_
#define _ZCOMP_H_
-#include <linux/local_lock.h>
-
#define ZCOMP_PARAM_NO_LEVEL INT_MIN
+#include <linux/wait.h>
+
/*
* Immutable driver (backend) parameters. The driver may attach private
* data to it (e.g. driver representation of the dictionary, etc.).
@@ -31,7 +31,7 @@ struct zcomp_ctx {
};
struct zcomp_strm {
- local_lock_t lock;
+ struct list_head entry;
/* compression buffer */
void *buffer;
struct zcomp_ctx ctx;
@@ -60,16 +60,15 @@ struct zcomp_ops {
const char *name;
};
-/* dynamic per-device compression frontend */
struct zcomp {
- struct zcomp_strm __percpu *stream;
+ struct list_head idle_strm;
+ spinlock_t strm_lock;
+ u32 avail_strm;
+ wait_queue_head_t strm_wait;
const struct zcomp_ops *ops;
struct zcomp_params *params;
- struct hlist_node node;
};
-int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node);
-int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node);
ssize_t zcomp_available_show(const char *comp, char *buf);
bool zcomp_available_algorithm(const char *comp);
@@ -77,7 +76,7 @@ struct zcomp *zcomp_create(const char *alg, struct zcomp_params *params);
void zcomp_destroy(struct zcomp *comp);
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp);
-void zcomp_stream_put(struct zcomp *comp);
+void zcomp_stream_put(struct zcomp *comp, struct zcomp_strm *strm);
int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
const void *src, unsigned int *dst_len);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index a8d22ae2a066..9ba3f8d97310 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -31,7 +31,6 @@
#include <linux/idr.h>
#include <linux/sysfs.h>
#include <linux/debugfs.h>
-#include <linux/cpuhotplug.h>
#include <linux/part_stat.h>
#include <linux/kernel_read_file.h>
@@ -1603,7 +1602,7 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
kunmap_local(dst);
zs_unmap_object(zram->mem_pool, handle);
- zcomp_stream_put(zram->comps[prio]);
+ zcomp_stream_put(zram->comps[prio], zstrm);
return ret;
}
@@ -1764,14 +1763,14 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
kunmap_local(mem);
if (unlikely(ret)) {
- zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
+ zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
pr_err("Compression failed! err=%d\n", ret);
zs_free(zram->mem_pool, handle);
return ret;
}
if (comp_len >= huge_class_size) {
- zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
+ zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
return write_incompressible_page(zram, page, index);
}
@@ -1795,7 +1794,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
__GFP_HIGHMEM |
__GFP_MOVABLE);
if (IS_ERR_VALUE(handle)) {
- zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
+ zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
atomic64_inc(&zram->stats.writestall);
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
@@ -1807,7 +1806,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
}
if (!zram_can_store_page(zram)) {
- zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
+ zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
zs_free(zram->mem_pool, handle);
return -ENOMEM;
}
@@ -1815,7 +1814,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
memcpy(dst, zstrm->buffer, comp_len);
- zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
+ zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
zs_unmap_object(zram->mem_pool, handle);
zram_slot_write_lock(zram, index);
@@ -1974,7 +1973,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
kunmap_local(src);
if (ret) {
- zcomp_stream_put(zram->comps[prio]);
+ zcomp_stream_put(zram->comps[prio], zstrm);
return ret;
}
@@ -1984,7 +1983,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
/* Continue until we make progress */
if (class_index_new >= class_index_old ||
(threshold && comp_len_new >= threshold)) {
- zcomp_stream_put(zram->comps[prio]);
+ zcomp_stream_put(zram->comps[prio], zstrm);
continue;
}
@@ -2042,13 +2041,13 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
__GFP_HIGHMEM |
__GFP_MOVABLE);
if (IS_ERR_VALUE(handle_new)) {
- zcomp_stream_put(zram->comps[prio]);
+ zcomp_stream_put(zram->comps[prio], zstrm);
return PTR_ERR((void *)handle_new);
}
dst = zs_map_object(zram->mem_pool, handle_new, ZS_MM_WO);
memcpy(dst, zstrm->buffer, comp_len_new);
- zcomp_stream_put(zram->comps[prio]);
+ zcomp_stream_put(zram->comps[prio], zstrm);
zs_unmap_object(zram->mem_pool, handle_new);
@@ -2796,7 +2795,6 @@ static void destroy_devices(void)
zram_debugfs_destroy();
idr_destroy(&zram_index_idr);
unregister_blkdev(zram_major, "zram");
- cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
}
static int __init zram_init(void)
@@ -2806,15 +2804,9 @@ static int __init zram_init(void)
BUILD_BUG_ON(__NR_ZRAM_PAGEFLAGS > sizeof(zram_te.flags) * 8);
- ret = cpuhp_setup_state_multi(CPUHP_ZCOMP_PREPARE, "block/zram:prepare",
- zcomp_cpu_up_prepare, zcomp_cpu_dead);
- if (ret < 0)
- return ret;
-
ret = class_register(&zram_control_class);
if (ret) {
pr_err("Unable to register zram-control class\n");
- cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
return ret;
}
@@ -2823,7 +2815,6 @@ static int __init zram_init(void)
if (zram_major <= 0) {
pr_err("Unable to get major number\n");
class_unregister(&zram_control_class);
- cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
return -EBUSY;
}
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 6cc5e484547c..092ace7db8ee 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -119,7 +119,6 @@ enum cpuhp_state {
CPUHP_MM_ZS_PREPARE,
CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_KVM_PPC_BOOK3S_PREPARE,
- CPUHP_ZCOMP_PREPARE,
CPUHP_TIMERS_PREPARE,
CPUHP_TMIGR_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 03/11] zram: remove crypto include
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 01/11] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 02/11] zram: do not use per-CPU compression streams Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 04/11] zram: remove max_comp_streams device attr Sergey Senozhatsky
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Remove a leftover crypto header include.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zcomp.c | 1 -
drivers/block/zram/zram_drv.c | 4 +++-
drivers/block/zram/zram_drv.h | 1 -
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 982c769d5831..efd5919808d9 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -7,7 +7,6 @@
#include <linux/wait.h>
#include <linux/sched.h>
#include <linux/cpumask.h>
-#include <linux/crypto.h>
#include <linux/vmalloc.h>
#include "zcomp.h"
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9ba3f8d97310..31b653ad6d0e 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -43,6 +43,8 @@ static DEFINE_MUTEX(zram_index_mutex);
static int zram_major;
static const char *default_compressor = CONFIG_ZRAM_DEF_COMP;
+#define ZRAM_MAX_ALGO_NAME_SZ 64
+
/* Module params (documentation at end) */
static unsigned int num_devices = 1;
/*
@@ -1141,7 +1143,7 @@ static int __comp_algorithm_store(struct zram *zram, u32 prio, const char *buf)
size_t sz;
sz = strlen(buf);
- if (sz >= CRYPTO_MAX_ALG_NAME)
+ if (sz >= ZRAM_MAX_ALGO_NAME_SZ)
return -E2BIG;
compressor = kstrdup(buf, GFP_KERNEL);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 3436ddf8ab23..045032acb22f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -16,7 +16,6 @@
#define _ZRAM_DRV_H_
#include <linux/zsmalloc.h>
-#include <linux/crypto.h>
#include "zcomp.h"
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 04/11] zram: remove max_comp_streams device attr
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (2 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 03/11] zram: remove crypto include Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 05/11] zram: remove two-staged handle allocation Sergey Senozhatsky
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
max_comp_streams device attribute has been defunct since
May 2016 when zram switched to per-CPU compression streams,
remove it.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
Documentation/ABI/testing/sysfs-block-zram | 8 -----
Documentation/admin-guide/blockdev/zram.rst | 36 ++++++---------------
drivers/block/zram/zram_drv.c | 23 -------------
3 files changed, 10 insertions(+), 57 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 1ef69e0271f9..36c57de0a10a 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -22,14 +22,6 @@ Description:
device. The reset operation frees all the memory associated
with this device.
-What: /sys/block/zram<id>/max_comp_streams
-Date: February 2014
-Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
-Description:
- The max_comp_streams file is read-write and specifies the
- number of backend's zcomp_strm compression streams (number of
- concurrent compress operations).
-
What: /sys/block/zram<id>/comp_algorithm
Date: February 2014
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 1576fb93f06c..9bdb30901a93 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -54,7 +54,7 @@ The list of possible return codes:
If you use 'echo', the returned value is set by the 'echo' utility,
and, in general case, something like::
- echo 3 > /sys/block/zram0/max_comp_streams
+ echo foo > /sys/block/zram0/comp_algorithm
if [ $? -ne 0 ]; then
handle_error
fi
@@ -73,21 +73,7 @@ This creates 4 devices: /dev/zram{0,1,2,3}
num_devices parameter is optional and tells zram how many devices should be
pre-created. Default: 1.
-2) Set max number of compression streams
-========================================
-
-Regardless of the value passed to this attribute, ZRAM will always
-allocate multiple compression streams - one per online CPU - thus
-allowing several concurrent compression operations. The number of
-allocated compression streams goes down when some of the CPUs
-become offline. There is no single-compression-stream mode anymore,
-unless you are running a UP system or have only 1 CPU online.
-
-To find out how many streams are currently available::
-
- cat /sys/block/zram0/max_comp_streams
-
-3) Select compression algorithm
+2) Select compression algorithm
===============================
Using comp_algorithm device attribute one can see available and
@@ -107,7 +93,7 @@ Examples::
For the time being, the `comp_algorithm` content shows only compression
algorithms that are supported by zram.
-4) Set compression algorithm parameters: Optional
+3) Set compression algorithm parameters: Optional
=================================================
Compression algorithms may support specific parameters which can be
@@ -138,7 +124,7 @@ better the compression ratio, it even can take negatives values for some
algorithms), for other algorithms `level` is acceleration level (the higher
the value the lower the compression ratio).
-5) Set Disksize
+4) Set Disksize
===============
Set disk size by writing the value to sysfs node 'disksize'.
@@ -158,7 +144,7 @@ There is little point creating a zram of greater than twice the size of memory
since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
size of the disk when not in use so a huge zram is wasteful.
-6) Set memory limit: Optional
+5) Set memory limit: Optional
=============================
Set memory limit by writing the value to sysfs node 'mem_limit'.
@@ -177,7 +163,7 @@ Examples::
# To disable memory limit
echo 0 > /sys/block/zram0/mem_limit
-7) Activate
+6) Activate
===========
::
@@ -188,7 +174,7 @@ Examples::
mkfs.ext4 /dev/zram1
mount /dev/zram1 /tmp
-8) Add/remove zram devices
+7) Add/remove zram devices
==========================
zram provides a control interface, which enables dynamic (on-demand) device
@@ -208,7 +194,7 @@ execute::
echo X > /sys/class/zram-control/hot_remove
-9) Stats
+8) Stats
========
Per-device statistics are exported as various nodes under /sys/block/zram<id>/
@@ -228,8 +214,6 @@ mem_limit WO specifies the maximum amount of memory ZRAM can
writeback_limit WO specifies the maximum amount of write IO zram
can write out to backing device as 4KB unit
writeback_limit_enable RW show and set writeback_limit feature
-max_comp_streams RW the number of possible concurrent compress
- operations
comp_algorithm RW show and change the compression algorithm
algorithm_params WO setup compression algorithm parameters
compact WO trigger memory compaction
@@ -310,7 +294,7 @@ a single line of text and contains the following stats separated by whitespace:
Unit: 4K bytes
============== =============================================================
-10) Deactivate
+9) Deactivate
==============
::
@@ -318,7 +302,7 @@ a single line of text and contains the following stats separated by whitespace:
swapoff /dev/zram0
umount /dev/zram1
-11) Reset
+10) Reset
=========
Write any positive value to 'reset' sysfs node::
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 31b653ad6d0e..0c9b68fa8783 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1096,27 +1096,6 @@ static void zram_debugfs_register(struct zram *zram) {};
static void zram_debugfs_unregister(struct zram *zram) {};
#endif
-/*
- * We switched to per-cpu streams and this attr is not needed anymore.
- * However, we will keep it around for some time, because:
- * a) we may revert per-cpu streams in the future
- * b) it's visible to user space and we need to follow our 2 years
- * retirement rule; but we already have a number of 'soon to be
- * altered' attrs, so max_comp_streams need to wait for the next
- * layoff cycle.
- */
-static ssize_t max_comp_streams_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
-}
-
-static ssize_t max_comp_streams_store(struct device *dev,
- struct device_attribute *attr, const char *buf, size_t len)
-{
- return len;
-}
-
static void comp_algorithm_set(struct zram *zram, u32 prio, const char *alg)
{
/* Do not free statically defined compression algorithms */
@@ -2535,7 +2514,6 @@ static DEVICE_ATTR_WO(reset);
static DEVICE_ATTR_WO(mem_limit);
static DEVICE_ATTR_WO(mem_used_max);
static DEVICE_ATTR_WO(idle);
-static DEVICE_ATTR_RW(max_comp_streams);
static DEVICE_ATTR_RW(comp_algorithm);
#ifdef CONFIG_ZRAM_WRITEBACK
static DEVICE_ATTR_RW(backing_dev);
@@ -2557,7 +2535,6 @@ static struct attribute *zram_disk_attrs[] = {
&dev_attr_mem_limit.attr,
&dev_attr_mem_used_max.attr,
&dev_attr_idle.attr,
- &dev_attr_max_comp_streams.attr,
&dev_attr_comp_algorithm.attr,
#ifdef CONFIG_ZRAM_WRITEBACK
&dev_attr_backing_dev.attr,
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 05/11] zram: remove two-staged handle allocation
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (3 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 04/11] zram: remove max_comp_streams device attr Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 06/11] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Previously zram write() was atomic which required us to pass
__GFP_KSWAPD_RECLAIM to zsmalloc handle allocation on a fast
path and attempt a slow path allocation (with recompression)
when the fast path failed.
Since it's not atomic anymore we can permit direct reclaim
during allocation, and remove fast allocation path and, also,
drop the recompression path (which should reduce CPU/battery
usage).
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 41 +++++------------------------------
1 file changed, 6 insertions(+), 35 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0c9b68fa8783..28e5b02dc3a1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1718,11 +1718,11 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
static int zram_write_page(struct zram *zram, struct page *page, u32 index)
{
int ret = 0;
- unsigned long handle = -ENOMEM;
- unsigned int comp_len = 0;
+ unsigned long handle;
+ unsigned int comp_len;
void *dst, *mem;
struct zcomp_strm *zstrm;
- unsigned long element = 0;
+ unsigned long element;
bool same_filled;
/* First, free memory allocated to this slot (if any) */
@@ -1736,7 +1736,6 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
if (same_filled)
return write_same_filled_page(zram, element, index);
-compress_again:
zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
mem = kmap_local_page(page);
ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
@@ -1745,8 +1744,6 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
if (unlikely(ret)) {
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
- pr_err("Compression failed! err=%d\n", ret);
- zs_free(zram->mem_pool, handle);
return ret;
}
@@ -1755,36 +1752,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
return write_incompressible_page(zram, page, index);
}
- /*
- * handle allocation has 2 paths:
- * a) fast path is executed with preemption disabled (for
- * per-cpu streams) and has __GFP_DIRECT_RECLAIM bit clear,
- * since we can't sleep;
- * b) slow path enables preemption and attempts to allocate
- * the page with __GFP_DIRECT_RECLAIM bit set. we have to
- * put per-cpu compression stream and, thus, to re-do
- * the compression once handle is allocated.
- *
- * if we have a 'non-null' handle here then we are coming
- * from the slow path and handle has already been allocated.
- */
+ handle = zs_malloc(zram->mem_pool, comp_len,
+ GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
if (IS_ERR_VALUE(handle))
- handle = zs_malloc(zram->mem_pool, comp_len,
- __GFP_KSWAPD_RECLAIM |
- __GFP_NOWARN |
- __GFP_HIGHMEM |
- __GFP_MOVABLE);
- if (IS_ERR_VALUE(handle)) {
- zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
- atomic64_inc(&zram->stats.writestall);
- handle = zs_malloc(zram->mem_pool, comp_len,
- GFP_NOIO | __GFP_HIGHMEM |
- __GFP_MOVABLE);
- if (IS_ERR_VALUE(handle))
- return PTR_ERR((void *)handle);
-
- goto compress_again;
- }
+ return PTR_ERR((void *)handle);
if (!zram_can_store_page(zram)) {
zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP], zstrm);
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 06/11] zram: permit reclaim in zstd custom allocator
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (4 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 05/11] zram: remove two-staged handle allocation Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 07/11] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
When configured with pre-trained compression/decompression
dictionary support, zstd requires custom memory allocator,
which it calls internally from compression()/decompression()
routines. This was a tad problematic, because that would
mean allocation from atomic context (either under entry
spin-lock, or per-CPU local-lock or both). Now, with
non-atomic zram write(), those limitations are relaxed and
we can allow direct and indirect reclaim during allocations.
The tricky part is zram read() path, which is still atomic in
one particular case (read_compressed_page()), due to zsmalloc
handling of object mapping. However, in zram in order to read()
something one has to write() it first, and write() is when zstd
allocates required internal state memory, and write() path is
non-atomic. Because of this write() allocation, in theory, zstd
should not call its allocator from the atomic read() path. Keep
the non-preemptible branch, just in case if zstd allocates memory
from read(), but WARN_ON_ONCE() if it happens.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/backend_zstd.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/block/zram/backend_zstd.c b/drivers/block/zram/backend_zstd.c
index 1184c0036f44..53431251ea62 100644
--- a/drivers/block/zram/backend_zstd.c
+++ b/drivers/block/zram/backend_zstd.c
@@ -24,19 +24,14 @@ struct zstd_params {
/*
* For C/D dictionaries we need to provide zstd with zstd_custom_mem,
* which zstd uses internally to allocate/free memory when needed.
- *
- * This means that allocator.customAlloc() can be called from zcomp_compress()
- * under local-lock (per-CPU compression stream), in which case we must use
- * GFP_ATOMIC.
- *
- * Another complication here is that we can be configured as a swap device.
*/
static void *zstd_custom_alloc(void *opaque, size_t size)
{
- if (!preemptible())
+ /* Technically this should not happen */
+ if (WARN_ON_ONCE(!preemptible()))
return kvzalloc(size, GFP_ATOMIC);
- return kvzalloc(size, __GFP_KSWAPD_RECLAIM | __GFP_NOWARN);
+ return kvzalloc(size, GFP_NOIO | __GFP_NOWARN);
}
static void zstd_custom_free(void *opaque, void *address)
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 07/11] zram: permit reclaim in recompression handle allocation
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (5 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 06/11] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 08/11] zram: remove writestall zram_stats member Sergey Senozhatsky
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Recompression path can now permit direct reclaim during
new zs_handle allocation, because it's not atomic anymore.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 28e5b02dc3a1..ac8b4f47a496 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1981,17 +1981,11 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
return 0;
/*
- * No direct reclaim (slow path) for handle allocation and no
- * re-compression attempt (unlike in zram_write_bvec()) since
- * we already have stored that object in zsmalloc. If we cannot
- * alloc memory for recompressed object then we bail out and
- * simply keep the old (existing) object in zsmalloc.
+ * If we cannot alloc memory for recompressed object then we bail out
+ * and simply keep the old (existing) object in zsmalloc.
*/
handle_new = zs_malloc(zram->mem_pool, comp_len_new,
- __GFP_KSWAPD_RECLAIM |
- __GFP_NOWARN |
- __GFP_HIGHMEM |
- __GFP_MOVABLE);
+ GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
if (IS_ERR_VALUE(handle_new)) {
zcomp_stream_put(zram->comps[prio], zstrm);
return PTR_ERR((void *)handle_new);
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 08/11] zram: remove writestall zram_stats member
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (6 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 07/11] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 09/11] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
There is no zsmalloc handle allocation slow path now and
writestall is not possible any longer. Remove it from
zram_stats.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 3 +--
drivers/block/zram/zram_drv.h | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index ac8b4f47a496..af4bce1e5fe1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1430,9 +1430,8 @@ static ssize_t debug_stat_show(struct device *dev,
down_read(&zram->init_lock);
ret = scnprintf(buf, PAGE_SIZE,
- "version: %d\n%8llu %8llu\n",
+ "version: %d\n0 %8llu\n",
version,
- (u64)atomic64_read(&zram->stats.writestall),
(u64)atomic64_read(&zram->stats.miss_free));
up_read(&zram->init_lock);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 045032acb22f..b7e250d6fa02 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -78,7 +78,6 @@ struct zram_stats {
atomic64_t huge_pages_since; /* no. of huge pages since zram set up */
atomic64_t pages_stored; /* no. of pages currently stored */
atomic_long_t max_used_pages; /* no. of maximum pages stored */
- atomic64_t writestall; /* no. of write slow paths */
atomic64_t miss_free; /* no. of missed free */
#ifdef CONFIG_ZRAM_WRITEBACK
atomic64_t bd_count; /* no. of pages in backing device */
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 09/11] zram: limit max recompress prio to num_active_comps
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (7 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 08/11] zram: remove writestall zram_stats member Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 10/11] zram: filter out recomp targets based on priority Sergey Senozhatsky
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Use the actual number of algorithms zram was configure with
instead of theoretical limit of ZRAM_MAX_COMPS.
Also make sure that min prio is not above max prio.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index af4bce1e5fe1..3e91fd1dcfc7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2011,16 +2011,19 @@ static ssize_t recompress_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- u32 prio = ZRAM_SECONDARY_COMP, prio_max = ZRAM_MAX_COMPS;
struct zram *zram = dev_to_zram(dev);
char *args, *param, *val, *algo = NULL;
u64 num_recomp_pages = ULLONG_MAX;
struct zram_pp_ctl *ctl = NULL;
struct zram_pp_slot *pps;
u32 mode = 0, threshold = 0;
+ u32 prio, prio_max;
struct page *page;
ssize_t ret;
+ prio = ZRAM_SECONDARY_COMP;
+ prio_max = zram->num_active_comps;
+
args = skip_spaces(buf);
while (*args) {
args = next_arg(args, ¶m, &val);
@@ -2073,7 +2076,7 @@ static ssize_t recompress_store(struct device *dev,
if (prio == ZRAM_PRIMARY_COMP)
prio = ZRAM_SECONDARY_COMP;
- prio_max = min(prio + 1, ZRAM_MAX_COMPS);
+ prio_max = prio + 1;
continue;
}
}
@@ -2087,6 +2090,12 @@ static ssize_t recompress_store(struct device *dev,
goto release_init_lock;
}
+ prio_max = min(prio_max, (u32)zram->num_active_comps);
+ if (prio >= prio_max) {
+ ret = -EINVAL;
+ goto release_init_lock;
+ }
+
/* Do not permit concurrent post-processing actions. */
if (atomic_xchg(&zram->pp_in_progress, 1)) {
up_read(&zram->init_lock);
@@ -2101,7 +2110,7 @@ static ssize_t recompress_store(struct device *dev,
continue;
if (!strcmp(zram->comp_algs[prio], algo)) {
- prio_max = min(prio + 1, ZRAM_MAX_COMPS);
+ prio_max = prio + 1;
found = true;
break;
}
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 10/11] zram: filter out recomp targets based on priority
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (8 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 09/11] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 11/11] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-30 14:18 ` [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Do no select for post processing slots that are already
compressed with same or higher priority compression
algorithm.
This should save some memory, as previously we would still
put those entries into corresponding post-processing buckets
and filter them out later in recompress_slot().
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3e91fd1dcfc7..500d6c8b17fc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1813,7 +1813,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
#define RECOMPRESS_IDLE (1 << 0)
#define RECOMPRESS_HUGE (1 << 1)
-static int scan_slots_for_recompress(struct zram *zram, u32 mode,
+static int scan_slots_for_recompress(struct zram *zram, u32 mode, u32 prio_max,
struct zram_pp_ctl *ctl)
{
unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
@@ -1845,6 +1845,10 @@ static int scan_slots_for_recompress(struct zram *zram, u32 mode,
zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE))
goto next;
+ /* Already compressed with same of higher priority */
+ if (zram_get_priority(zram, index) + 1 >= prio_max)
+ goto next;
+
pps->index = index;
place_pp_slot(zram, ctl, pps);
pps = NULL;
@@ -2134,7 +2138,7 @@ static ssize_t recompress_store(struct device *dev,
goto release_init_lock;
}
- scan_slots_for_recompress(zram, mode, ctl);
+ scan_slots_for_recompress(zram, mode, prio_max, ctl);
ret = len;
while ((pps = select_pp_slot(ctl))) {
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv3 11/11] zram: unlock slot during recompression
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (9 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 10/11] zram: filter out recomp targets based on priority Sergey Senozhatsky
@ 2025-01-30 11:10 ` Sergey Senozhatsky
2025-01-30 14:18 ` [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 11:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Minchan Kim, linux-mm, linux-kernel, Sergey Senozhatsky
Recompression, like writeback, makes a local copy of slot data
(we need to decompress it anyway) before post-processing so we
can unlock slot-entry once we have that local copy.
Unlock the entry write-lock before recompression loop (secondary
algorithms can be tried out one by one, in order of priority) and
re-acquire it right after the loop.
There is one more potentially costly operation recompress_slot()
does - new zs_handle allocation, which can schedule(). Release
the slot-entry write-lock before zsmalloc allocation and grab it
again after the allocation.
In both cases, once the slot-lock is re-acquired we examine slot's
ZRAM_PP_SLOT flag to make sure that the slot has not been modified
by a concurrent operation.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 80 +++++++++++++++++++----------------
1 file changed, 44 insertions(+), 36 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 500d6c8b17fc..a6bc1c2dfbe6 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1871,14 +1871,13 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
u64 *num_recomp_pages, u32 threshold, u32 prio,
u32 prio_max)
{
- struct zcomp_strm *zstrm = NULL;
+ struct zcomp_strm *zstrm;
unsigned long handle_old;
unsigned long handle_new;
unsigned int comp_len_old;
unsigned int comp_len_new;
unsigned int class_index_old;
unsigned int class_index_new;
- u32 num_recomps = 0;
void *src, *dst;
int ret;
@@ -1905,6 +1904,13 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
zram_clear_flag(zram, index, ZRAM_IDLE);
class_index_old = zs_lookup_class_index(zram->mem_pool, comp_len_old);
+ prio = max(prio, zram_get_priority(zram, index) + 1);
+ /* Slot data copied out - unlock its bucket */
+ zram_slot_write_unlock(zram, index);
+ /* Recompression slots scan takes care of this, but just in case */
+ if (prio >= prio_max)
+ return 0;
+
/*
* Iterate the secondary comp algorithms list (in order of priority)
* and try to recompress the page.
@@ -1913,24 +1919,14 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
if (!zram->comps[prio])
continue;
- /*
- * Skip if the object is already re-compressed with a higher
- * priority algorithm (or same algorithm).
- */
- if (prio <= zram_get_priority(zram, index))
- continue;
-
- num_recomps++;
zstrm = zcomp_stream_get(zram->comps[prio]);
src = kmap_local_page(page);
ret = zcomp_compress(zram->comps[prio], zstrm,
src, &comp_len_new);
kunmap_local(src);
- if (ret) {
- zcomp_stream_put(zram->comps[prio], zstrm);
- return ret;
- }
+ if (ret)
+ break;
class_index_new = zs_lookup_class_index(zram->mem_pool,
comp_len_new);
@@ -1939,6 +1935,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
if (class_index_new >= class_index_old ||
(threshold && comp_len_new >= threshold)) {
zcomp_stream_put(zram->comps[prio], zstrm);
+ zstrm = NULL;
continue;
}
@@ -1946,14 +1943,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
break;
}
- /*
- * We did not try to recompress, e.g. when we have only one
- * secondary algorithm and the page is already recompressed
- * using that algorithm
- */
- if (!zstrm)
- return 0;
-
+ zram_slot_write_lock(zram, index);
/*
* Decrement the limit (if set) on pages we can recompress, even
* when current recompression was unsuccessful or did not compress
@@ -1963,37 +1953,55 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
if (*num_recomp_pages)
*num_recomp_pages -= 1;
- if (class_index_new >= class_index_old) {
+ /* Compression error */
+ if (ret) {
+ zcomp_stream_put(zram->comps[prio], zstrm);
+ return ret;
+ }
+
+ if (!zstrm) {
/*
* Secondary algorithms failed to re-compress the page
- * in a way that would save memory, mark the object as
- * incompressible so that we will not try to compress
- * it again.
+ * in a way that would save memory.
*
- * We need to make sure that all secondary algorithms have
- * failed, so we test if the number of recompressions matches
- * the number of active secondary algorithms.
+ * Mark the object incompressible if the max-priority
+ * algorithm couldn't re-compress it.
*/
- if (num_recomps == zram->num_active_comps - 1)
+ if (prio < zram->num_active_comps)
+ return 0;
+ if (zram_test_flag(zram, index, ZRAM_PP_SLOT))
zram_set_flag(zram, index, ZRAM_INCOMPRESSIBLE);
return 0;
}
- /* Successful recompression but above threshold */
- if (threshold && comp_len_new >= threshold)
+ /* Slot has been modified concurrently */
+ if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) {
+ zcomp_stream_put(zram->comps[prio], zstrm);
return 0;
+ }
- /*
- * If we cannot alloc memory for recompressed object then we bail out
- * and simply keep the old (existing) object in zsmalloc.
- */
+ /* zsmalloc handle allocation can schedule, unlock slot's bucket */
+ zram_slot_write_unlock(zram, index);
handle_new = zs_malloc(zram->mem_pool, comp_len_new,
GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
+ zram_slot_write_lock(zram, index);
+
+ /*
+ * If we couldn't allocate memory for recompressed object then bail
+ * out and simply keep the old (existing) object in mempool.
+ */
if (IS_ERR_VALUE(handle_new)) {
zcomp_stream_put(zram->comps[prio], zstrm);
return PTR_ERR((void *)handle_new);
}
+ /* Slot has been modified concurrently */
+ if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) {
+ zcomp_stream_put(zram->comps[prio], zstrm);
+ zs_free(zram->mem_pool, handle_new);
+ return 0;
+ }
+
dst = zs_map_object(zram->mem_pool, handle_new, ZS_MM_WO);
memcpy(dst, zstrm->buffer, comp_len_new);
zcomp_stream_put(zram->comps[prio], zstrm);
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
` (10 preceding siblings ...)
2025-01-30 11:10 ` [PATCHv3 11/11] zram: unlock slot during recompression Sergey Senozhatsky
@ 2025-01-30 14:18 ` Sergey Senozhatsky
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2025-01-30 14:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Sergey Senozhatsky, Minchan Kim, linux-mm, linux-kernel
On (25/01/30 20:10), Sergey Senozhatsky wrote:
> This is Part I which only changes zram. In order to make reads()
> preemptible we also need Part II [1] which changes zsmalloc.
I think it's a good time/chance to combine the two series (this one is
and the zsmalloc one [1]) and send out as one patch set, because there
is a little of dependency between them.
[1] https://lore.kernel.org/linux-mm/20250130044455.2642465-1-senozhatsky@chromium.org
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-01-30 14:19 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-30 11:10 [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 01/11] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 02/11] zram: do not use per-CPU compression streams Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 03/11] zram: remove crypto include Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 04/11] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 05/11] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 06/11] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 07/11] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 08/11] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 09/11] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 10/11] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-01-30 11:10 ` [PATCHv3 11/11] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-30 14:18 ` [PATCHv3 00/11] zram: preemptible writes and occasionally preemptible reads Sergey Senozhatsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox