From: Weijie Yang <weijie.yang.kh@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Weijie Yang <weijie.yang@samsung.com>,
Davidlohr Bueso <davidlohr@hp.com>,
Andrew Morton <akpm@linux-foundation.org>,
Seth Jennings <sjennings@variantweb.net>,
Nitin Gupta <ngupta@vflare.org>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Bob Liu <bob.liu@oracle.com>, Dan Streetman <ddstreet@ieee.org>,
Heesub Shin <heesub.shin@samsung.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] zram: remove global tb_lock by using lock-free CAS
Date: Wed, 7 May 2014 17:16:52 +0800 [thread overview]
Message-ID: <CAL1ERfOXNrfKqMVs-Yz8yJjKKU3L5fjUEOb0Aeyqc37py-BWEg@mail.gmail.com> (raw)
In-Reply-To: <20140507085743.GA31680@bbox>
On Wed, May 7, 2014 at 4:57 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Wed, May 07, 2014 at 03:51:35PM +0800, Weijie Yang wrote:
>> On Tue, May 6, 2014 at 6:22 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>> > On Mon, 2014-05-05 at 13:46 -0700, Andrew Morton wrote:
>> >> On Mon, 05 May 2014 11:00:44 -0700 Davidlohr Bueso <davidlohr@hp.com> wrote:
>> >>
>> >> > > > @@ -339,12 +338,14 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
>> >> > > > unsigned long handle;
>> >> > > > u16 size;
>> >> > > >
>> >> > > > - read_lock(&meta->tb_lock);
>> >> > > > + while(atomic_cmpxchg(&meta->table[index].state, IDLE, ACCESS) != IDLE)
>> >> > > > + cpu_relax();
>> >> > > > +
>> >> > >
>> >> > > So... this might be dumb question, but this looks like a spinlock
>> >> > > implementation.
>> >> > >
>> >> > > What advantage does this have over a standard spinlock?
>> >> >
>> >> > I was wondering the same thing. Furthermore by doing this you'll loose
>> >> > the benefits of sharing the lock... your numbers do indicate that it is
>> >> > for the better. Also, note that hopefully rwlock_t will soon be updated
>> >> > to be fair and perform up to par with spinlocks, something which is long
>> >> > overdue. So you could reduce the critical region by implementing the
>> >> > same granularity, just don't implement your own locking schemes, like
>> >> > this.
>>
>> Actually, the main reason I use a CAS rather than a standard lock here is
>> that I want to minimize the meta table memory overhead. A tiny reason is
>> my fuzzy memory that CAS is more efficient than spinlock (please correct me
>> if I am wrong).
>>
>> Anyway, I changed the CAS to spinlock and rwlock, re-test them:
>>
>> Test lock-free spinlock rwlock
>> ------------------------------------------------------
>> Initial write 1424141.62 1426372.84 1423019.21
>> Rewrite 1652504.81 1623307.14 1653682.04
>> Read 11404668.35 11242885.05 10938125.00
>> Re-read 11555483.75 11253906.6 10837773.50
>> Reverse Read 8394478.17 8277250.34 7768057.39
>> Stride read 9372229.95 9010498.53 8692871.77
>> Random read 9187221.90 8988080.55 8661184.60
>> Mixed workload 5843370.85 5414729.54 5451055.03
>> Random write 1608947.04 1572276.64 1588866.51
>> Pwrite 1311055.32 1302463.04 1302001.06
>> Pread 4652056.11 4555802.18 4469672.34
>
> I'd like to clear it out.
> The spinlock and rwlock you mentioned is per-meta entry lock like state
> you added or global lock for meta? If it's latter, rwlock means base?
The spinlock and rwlock is per-meta entry lock like state.
Because the base data is added in the first mail, I don't mention
them here.
>>
>> And I cann't say which one is the best, they have the similar performance.
>
> Most popular use of zram is the in-memory swap for small embedded system
> so I don't want to increase memory footprint without good reason although
> it makes synthetic benchmark. Alhought it's 1M for 1G, it isn't small if we
> consider compression ratio and real free memory after boot(But data I have
> an interest is mixed workload enhancement. It would be important for heavy
> memory pressure for latency so it attractives me a lot. Anyway, I need number
> for back up the justification with real swap usecase rather than zram-blk.
> Recently, I have considered per-process reclaim based on zram so maybe I will
> have a test for that).
> But recently, I have received private mail from some server folks to use
> zram-blk, not zram-swap so in case of that, such enhancement would be
> desirable so my point is I'm not saying the drop of the patch and let's
> find proper solution to meet both and gather more data.
>
>>
>> Wait, iozone will create temporary files for every test thread, so there is no
>> possibility that these threads access the same table[index] concurrenctly.
>> So, I use fio to test the raw zram block device.
>> To enhance the possibility of access the same table[index] conflictly, I set zram
>> with a small disksize(10M) and let thread run with large loop count.
>>
>> On the same test machine, the fio test command is:
>> fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
>> --scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
>> --filename=/dev/zram0 --name=seq-write --rw=write --stonewall
>> --name=seq-read --rw=read --stonewall --name=seq-readwrite
>> --rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
>>
>> Test base lock-free spinlock rwlock
>> ------------------------------------------------------
>> seq-write 935109.2 999580.5 998134.8 994384.6
>> seq-read 5598064.6 6444011.5 6243184.6 6197514.2
>> seq-rw 1403963.0 1635673.0 1633823.0 1635972.2
>> rand-rw 1389864.4 1612520.4 1613403.6 1612129.8
>
> What's the difference between base and rwlock?
> Base means global rwlock while rwlock means per-meta entry rwlock?
>
Base means global rwlock, it is the 3.15.0-rc3 code.
rwlock means per-meta entry rwlock.
Sorry to confuse you.
>>
>> This result(KB/s, average of 5 tests) shows the performance improvement
>> on base version, however, I cann't say which method is the best.
>>
>> >>
>> >> It sounds like seqlocks will match this access pattern pretty well?
>> >
>> > Indeed. And after a closer look, except for zram_slot_free_notify(),
>> > that lock is always shared. So, unless fine graining it implies taking
>> > the lock exclusively like in this patch (if so, that needs to be
>> > explicitly documented in the changelog), we would ideally continue to
>> > share it. That _should_ provide nicer performance numbers when using the
>> > correct lock.
>> >
>>
>> Andrew mentioned seqlocks, however, I think it is hard the use seqlocks here
>> after I recheck the codes. No matter use it as a meta global lock or a
>> table[index] lock. The main reason is the writer will free the handle rather than
>> just change some value.
>> If I misunderstand you, please let me know.
>>
>> Now, I am in a delimma. For minimizing the memory overhead, I like to use CAS.
>> However, it is not a standard way.
>>
>> Any complaint or suggestions are welcomed.
>>
>> Regards,
>>
>> >
>> >
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-05-07 9:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-05 4:01 Weijie Yang
2014-05-05 10:32 ` Sergey Senozhatsky
2014-05-05 15:20 ` Seth Jennings
2014-05-05 18:00 ` Davidlohr Bueso
2014-05-05 20:46 ` Andrew Morton
2014-05-05 22:22 ` Davidlohr Bueso
2014-05-07 7:51 ` Weijie Yang
2014-05-07 8:57 ` Minchan Kim
2014-05-07 9:16 ` Weijie Yang [this message]
2014-05-07 14:52 ` Joonsoo Kim
2014-05-08 6:24 ` Minchan Kim
2014-05-10 6:10 ` Weijie Yang
2014-05-12 5:15 ` Minchan Kim
2014-05-12 14:49 ` Davidlohr Bueso
2014-05-13 0:03 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL1ERfOXNrfKqMVs-Yz8yJjKKU3L5fjUEOb0Aeyqc37py-BWEg@mail.gmail.com \
--to=weijie.yang.kh@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bob.liu@oracle.com \
--cc=davidlohr@hp.com \
--cc=ddstreet@ieee.org \
--cc=heesub.shin@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ngupta@vflare.org \
--cc=sergey.senozhatsky@gmail.com \
--cc=sjennings@variantweb.net \
--cc=weijie.yang@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox