From: Vitaly Wool <vitaly.wool@konsulko.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
Cc: Shakeel Butt <shakeelb@google.com>,
Minchan Kim <minchan@kernel.org>, Mike Galbraith <efault@gmx.de>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
NitinGupta <ngupta@vflare.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
"tiantao (H)" <tiantao6@hisilicon.com>
Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
Date: Tue, 22 Dec 2020 02:57:08 +0100 [thread overview]
Message-ID: <CAM4kBBK=5eBdCjWc5VJXcdr=Z4PV1=ZQ2n8fZmJ6ahJbpUyv2A@mail.gmail.com> (raw)
In-Reply-To: <4490cb6a7e2243fba374e40652979e46@hisilicon.com>
[-- Attachment #1: Type: text/plain, Size: 14730 bytes --]
On Tue, 22 Dec 2020, 02:42 Song Bao Hua (Barry Song), <
song.bao.hua@hisilicon.com> wrote:
>
>
> > -----Original Message-----
> > From: Song Bao Hua (Barry Song)
> > Sent: Tuesday, December 22, 2020 2:06 PM
> > To: 'Vitaly Wool' <vitaly.wool@konsulko.com>
> > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <minchan@kernel.org>;
> Mike
> > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>; linux-mm
> > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>;
> > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > <akpm@linux-foundation.org>
> > Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock
> >
> >
> >
> > > -----Original Message-----
> > > From: Vitaly Wool [mailto:vitaly.wool@konsulko.com]
> > > Sent: Tuesday, December 22, 2020 2:00 PM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <
> minchan@kernel.org>;
> > Mike
> > > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>;
> linux-mm
> > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <bigeasy@linutronix.de
> >;
> > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > <akpm@linux-foundation.org>
> > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > >
> > > On Tue, Dec 22, 2020 at 12:37 AM Song Bao Hua (Barry Song)
> > > <song.bao.hua@hisilicon.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Song Bao Hua (Barry Song)
> > > > > Sent: Tuesday, December 22, 2020 11:38 AM
> > > > > To: 'Vitaly Wool' <vitaly.wool@konsulko.com>
> > > > > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <
> minchan@kernel.org>;
> > > Mike
> > > > > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>;
> linux-mm
> > > > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <
> bigeasy@linutronix.de>;
> > > > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > <akpm@linux-foundation.org>
> > > > > Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Vitaly Wool [mailto:vitaly.wool@konsulko.com]
> > > > > > Sent: Tuesday, December 22, 2020 11:12 AM
> > > > > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > > > > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim
> > <minchan@kernel.org>;
> > > > > Mike
> > > > > > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>;
> > linux-mm
> > > > > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de>;
> > > > > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > > <akpm@linux-foundation.org>
> > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > >
> > > > > > On Mon, Dec 21, 2020 at 10:30 PM Song Bao Hua (Barry Song)
> > > > > > <song.bao.hua@hisilicon.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Shakeel Butt [mailto:shakeelb@google.com]
> > > > > > > > Sent: Tuesday, December 22, 2020 10:03 AM
> > > > > > > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > > > > > > Cc: Vitaly Wool <vitaly.wool@konsulko.com>; Minchan Kim
> > > > > > <minchan@kernel.org>;
> > > > > > > > Mike Galbraith <efault@gmx.de>; LKML <
> linux-kernel@vger.kernel.org>;
> > > > > > linux-mm
> > > > > > > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior
> > > <bigeasy@linutronix.de>;
> > > > > > > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > > > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > > > > <akpm@linux-foundation.org>
> > > > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > > > >
> > > > > > > > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> > > > > > > > <song.bao.hua@hisilicon.com> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Shakeel Butt [mailto:shakeelb@google.com]
> > > > > > > > > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > > > > > > > > To: Vitaly Wool <vitaly.wool@konsulko.com>
> > > > > > > > > > Cc: Minchan Kim <minchan@kernel.org>; Mike Galbraith
> > > <efault@gmx.de>;
> > > > > > LKML
> > > > > > > > > > <linux-kernel@vger.kernel.org>; linux-mm <
> linux-mm@kvack.org>;
> > > Song
> > > > > > Bao
> > > > > > > > Hua
> > > > > > > > > > (Barry Song) <song.bao.hua@hisilicon.com>; Sebastian
> Andrzej
> > > Siewior
> > > > > > > > > > <bigeasy@linutronix.de>; NitinGupta <ngupta@vflare.org>;
> Sergey
> > > > > > > > Senozhatsky
> > > > > > > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > > > > > > <akpm@linux-foundation.org>
> > > > > > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool
> > > <vitaly.wool@konsulko.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <
> minchan@kernel.org>
> > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly
> Wool wrote:
> > > > > > > > > > > > > zsmalloc takes bit spinlock in its _map() callback
> and
> > releases
> > > > > > it
> > > > > > > > > > > > > only in unmap() which is unsafe and leads to zswap
> complaining
> > > > > > > > > > > > > about scheduling in atomic context.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To fix that and to improve RT properties of
> zsmalloc,
> > remove
> > > > > that
> > > > > > > > > > > > > bit spinlock completely and use a bit flag instead.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't want to use such open code for the lock.
> > > > > > > > > > > >
> > > > > > > > > > > > I see from Mike's patch, recent zswap change
> introduced
> > the
> > > lockdep
> > > > > > > > > > > > splat bug and you want to improve zsmalloc to fix
> the zswap
> > > bug
> > > > > > and
> > > > > > > > > > > > introduce this patch with allowing preemption
> enabling.
> > > > > > > > > > >
> > > > > > > > > > > This understanding is upside down. The code in zswap
> you are
> > > referring
> > > > > > > > > > > to is not buggy. You may claim that it is suboptimal
> but there
> > > is
> > > > > > > > > > > nothing wrong in taking a mutex.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Is this suboptimal for all or just the hardware
> accelerators?
> > > Sorry,
> > > > > > I
> > > > > > > > > > am not very familiar with the crypto API. If I select
> lzo or
> > lz4
> > > as
> > > > > > a
> > > > > > > > > > zswap compressor will the [de]compression be async or
> sync?
> > > > > > > > >
> > > > > > > > > Right now, in crypto subsystem, new drivers are required
> to write
> > > based
> > > > > > on
> > > > > > > > > async APIs. The old sync API can't work in new accelerator
> drivers
> > > as
> > > > > > they
> > > > > > > > > are not supported at all.
> > > > > > > > >
> > > > > > > > > Old drivers are used to sync, but they've got async
> wrappers to
> > > support
> > > > > > async
> > > > > > > > > APIs. Eg.
> > > > > > > > > crypto: acomp - add support for lz4 via scomp
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > > > > > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> > > > > > > > >
> > > > > > > > > crypto: acomp - add support for lzo via scomp
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > > > > > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> > > > > > > > >
> > > > > > > > > so they are supporting async APIs but they are still
> working in
> > > sync
> > > > > mode
> > > > > > > > as
> > > > > > > > > those old drivers don't sleep.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Good to know that those are sync because I want them to be
> sync.
> > > > > > > > Please note that zswap is a cache in front of a real swap
> and the
> > > load
> > > > > > > > operation is latency sensitive as it comes in the page fault
> path
> > > and
> > > > > > > > directly impacts the applications. I doubt decompressing
> synchronously
> > > > > > > > a 4k page on a cpu will be costlier than asynchronously
> decompressing
> > > > > > > > the same page from hardware accelerators.
> > > > > > >
> > > > > > > If you read the old paper:
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://www.ibm.com/support/pages/new-linux-zswap-compression-functionalit
> > > > > > y
> > > > > > > Because the hardware accelerator speeds up compression,
> looking at
> > the
> > > zswap
> > > > > > > metrics we observed that there were more store and load
> requests in
> > > a given
> > > > > > > amount of time, which filled up the zswap pool faster than a
> software
> > > > > > > compression run. Because of this behavior, we set the
> max_pool_percent
> > > > > > > parameter to 30 for the hardware compression runs - this means
> that
> > > zswap
> > > > > > > can use up to 30% of the 10GB of total memory.
> > > > > > >
> > > > > > > So using hardware accelerators, we get a chance to speed up
> compression
> > > > > > > while decreasing cpu utilization.
> > > > > > >
> > > > > > > BTW, If it is not easy to change zsmalloc, one quick
> workaround we
> > might
> > > > > do
> > > > > > > in zswap is adding the below after applying Mike's original
> patch:
> > > > > > >
> > > > > > > if(in_atomic()) /* for zsmalloc */
> > > > > > > while(!try_wait_for_completion(&req->done);
> > > > > > > else /* for zbud, z3fold */
> > > > > > > crypto_wait_req(....);
> > > > > >
> > > > > > I don't think I'm going to ack this, sorry.
> > > > > >
> > > > >
> > > > > Fair enough. And I am also thinking if we can move
> zpool_unmap_handle()
> > > > > quite after zpool_map_handle() as below:
> > > > >
> > > > > dlen = PAGE_SIZE;
> > > > > src = zpool_map_handle(entry->pool->zpool, entry->handle,
> > > ZPOOL_MM_RO);
> > > > > if (zpool_evictable(entry->pool->zpool))
> > > > > src += sizeof(struct zswap_header);
> > > > > + zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > > > >
> > > > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > > > > mutex_lock(acomp_ctx->mutex);
> > > > > sg_init_one(&input, src, entry->length);
> > > > > sg_init_table(&output, 1);
> > > > > sg_set_page(&output, page, PAGE_SIZE, 0);
> > > > > acomp_request_set_params(acomp_ctx->req, &input, &output,
> > > entry->length,
> > > > > dlen);
> > > > > ret =
> crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
> > > > > &acomp_ctx->wait);
> > > > > mutex_unlock(acomp_ctx->mutex);
> > > > >
> > > > > - zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > > > >
> > > > > Since src is always low memory and we only need its virtual address
> > > > > to get the page of src in sg_init_one(). We don't actually read it
> > > > > by CPU anywhere.
> > > >
> > > > The below code might be better:
> > > >
> > > > dlen = PAGE_SIZE;
> > > > src = zpool_map_handle(entry->pool->zpool, entry->handle,
> > > ZPOOL_MM_RO);
> > > > if (zpool_evictable(entry->pool->zpool))
> > > > src += sizeof(struct zswap_header);
> > > >
> > > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > > >
> > > > + zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > > >
> > > > mutex_lock(acomp_ctx->mutex);
> > > > sg_init_one(&input, src, entry->length);
> > > > sg_init_table(&output, 1);
> > > > sg_set_page(&output, page, PAGE_SIZE, 0);
> > > > acomp_request_set_params(acomp_ctx->req, &input, &output,
> > > entry->length, dlen);
> > > > ret =
> crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
> > > &acomp_ctx->wait);
> > > > mutex_unlock(acomp_ctx->mutex);
> > > >
> > > > - zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > >
> > > I don't see how this is going to work since we can't guarantee src
> > > will be a valid pointer after the zpool_unmap_handle() call, can we?
> > > Could you please elaborate?
> >
> > A valid pointer is for cpu to read and write. Here, cpu doesn't read
> > and write it, we only need to get page struct from the address.
> >
> > void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int
> buflen)
> > {
> > sg_init_table(sg, 1);
> > sg_set_buf(sg, buf, buflen);
> > }
> >
> > static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
> > unsigned int buflen)
> > {
> > #ifdef CONFIG_DEBUG_SG
> > BUG_ON(!virt_addr_valid(buf));
> > #endif
> > sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
> > }
> >
> > sg_init_one() is always using an address which has a linear mapping
> > with physical address.
> > So once we get the value of src, we can get the page struct.
> >
> > src has a linear mapping with physical address. It doesn't require
> > page table walk which vmalloc_to_page() wants.
> >
> > The req only requires page to initialize sg table, I think if
> > we are going to use a cpu-based (de)compression, the crypto
> > driver will kmap it again.
>
> Probably I made another bug here. for zsmalloc, it is possible to
> get highmem for zpool since its malloc_support_movable = true.
>
> if (zpool_malloc_support_movable(entry->pool->zpool))
> gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
> ret = zpool_malloc(entry->pool->zpool, hlen + dlen, gfp, &handle);
>
> For 64bit system, there is never a highmem. For 32bit system, we may
> trigger this bug.
>
> So actually zswap should have used kmap_to_page() which can support
> both linear mapping and non-linear mapping. sg_init_one() only supports
> linear mapping.
> But it does't change the fact: Once req is initialized with page
> struct, we can unmap src. If we are going to use a HW accelerator,
> it would be a DMA; if we are going to use CPU decompression, crypto
> driver will kmap() again.
>
I'm still not convinced. Will kmap what, src? At this point src might
become just a bogus pointer. Why couldn't the object have been moved
somewhere else (due to the compaction mechanism for instance) at the time
DMA kicks in?
> >
> > >
> > > ~Vitaly
> >
>
> Thanks
> Barry
>
[-- Attachment #2: Type: text/html, Size: 27129 bytes --]
next prev parent reply other threads:[~2020-12-22 1:57 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-19 10:04 [patch] zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat Mike Galbraith
2020-12-19 10:12 ` Mike Galbraith
2020-12-19 10:20 ` Vitaly Wool
2020-12-19 10:27 ` Mike Galbraith
2020-12-19 10:46 ` Vitaly Wool
2020-12-19 10:59 ` Mike Galbraith
2020-12-19 11:03 ` Mike Galbraith
2020-12-20 0:22 ` [PATCH] zsmalloc: do not use bit_spin_lock Vitaly Wool
2020-12-20 1:18 ` Matthew Wilcox
2020-12-20 7:21 ` Vitaly Wool
2021-01-14 16:17 ` Sebastian Andrzej Siewior
2020-12-20 1:23 ` Mike Galbraith
2020-12-20 4:11 ` Mike Galbraith
2020-12-20 7:47 ` Mike Galbraith
2020-12-20 21:20 ` Song Bao Hua (Barry Song)
2020-12-20 22:10 ` Mike Galbraith
2020-12-20 1:56 ` Mike Galbraith
2020-12-21 17:24 ` Minchan Kim
2020-12-21 19:20 ` Vitaly Wool
2020-12-21 19:50 ` Shakeel Butt
2020-12-21 20:05 ` Song Bao Hua (Barry Song)
2020-12-21 21:02 ` Shakeel Butt
2020-12-21 21:25 ` Song Bao Hua (Barry Song)
2020-12-21 22:11 ` Vitaly Wool
2020-12-21 22:42 ` Song Bao Hua (Barry Song)
2020-12-21 23:35 ` Song Bao Hua (Barry Song)
2020-12-22 0:59 ` Vitaly Wool
2020-12-22 1:10 ` Song Bao Hua (Barry Song)
2020-12-22 1:42 ` Song Bao Hua (Barry Song)
2020-12-22 1:57 ` Vitaly Wool [this message]
2020-12-22 2:07 ` Song Bao Hua (Barry Song)
2020-12-22 2:10 ` Song Bao Hua (Barry Song)
2020-12-22 9:44 ` Vitaly Wool
2020-12-22 21:06 ` Song Bao Hua (Barry Song)
2020-12-23 0:11 ` Vitaly Wool
2020-12-23 12:44 ` tiantao (H)
2020-12-23 18:25 ` Vitaly Wool
2021-01-14 16:18 ` Sebastian Andrzej Siewior
2021-01-14 16:29 ` Vitaly Wool
2021-01-14 16:56 ` Sebastian Andrzej Siewior
2021-01-14 17:15 ` Vitaly Wool
2021-01-14 17:18 ` Sebastian Andrzej Siewior
2020-12-21 22:46 ` Shakeel Butt
2020-12-21 23:02 ` Song Bao Hua (Barry Song)
2020-12-22 9:20 ` David Laight
2020-12-22 9:32 ` Vitaly Wool
2020-12-21 20:22 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM4kBBK=5eBdCjWc5VJXcdr=Z4PV1=ZQ2n8fZmJ6ahJbpUyv2A@mail.gmail.com' \
--to=vitaly.wool@konsulko.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ngupta@vflare.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=shakeelb@google.com \
--cc=song.bao.hua@hisilicon.com \
--cc=tiantao6@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox