linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Weijie Yang <weijie.yang.kh@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Weijie Yang <weijie.yang@samsung.com>,
	Davidlohr Bueso <davidlohr@hp.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Seth Jennings <sjennings@variantweb.net>,
	Nitin Gupta <ngupta@vflare.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Bob Liu <bob.liu@oracle.com>, Dan Streetman <ddstreet@ieee.org>,
	Heesub Shin <heesub.shin@samsung.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] zram: remove global tb_lock by using lock-free CAS
Date: Wed, 7 May 2014 17:16:52 +0800	[thread overview]
Message-ID: <CAL1ERfOXNrfKqMVs-Yz8yJjKKU3L5fjUEOb0Aeyqc37py-BWEg@mail.gmail.com> (raw)
In-Reply-To: <20140507085743.GA31680@bbox>

On Wed, May 7, 2014 at 4:57 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Wed, May 07, 2014 at 03:51:35PM +0800, Weijie Yang wrote:
>> On Tue, May 6, 2014 at 6:22 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>> > On Mon, 2014-05-05 at 13:46 -0700, Andrew Morton wrote:
>> >> On Mon, 05 May 2014 11:00:44 -0700 Davidlohr Bueso <davidlohr@hp.com> wrote:
>> >>
>> >> > > > @@ -339,12 +338,14 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
>> >> > > >         unsigned long handle;
>> >> > > >         u16 size;
>> >> > > >
>> >> > > > -       read_lock(&meta->tb_lock);
>> >> > > > +       while(atomic_cmpxchg(&meta->table[index].state, IDLE, ACCESS) != IDLE)
>> >> > > > +               cpu_relax();
>> >> > > > +
>> >> > >
>> >> > > So... this might be dumb question, but this looks like a spinlock
>> >> > > implementation.
>> >> > >
>> >> > > What advantage does this have over a standard spinlock?
>> >> >
>> >> > I was wondering the same thing. Furthermore by doing this you'll loose
>> >> > the benefits of sharing the lock... your numbers do indicate that it is
>> >> > for the better. Also, note that hopefully rwlock_t will soon be updated
>> >> > to be fair and perform up to par with spinlocks, something which is long
>> >> > overdue. So you could reduce the critical region by implementing the
>> >> > same granularity, just don't implement your own locking schemes, like
>> >> > this.
>>
>> Actually, the main reason I use a CAS rather than a standard lock here is
>> that I want to minimize the meta table memory overhead. A tiny reason is
>> my fuzzy memory that CAS is more efficient than spinlock (please correct me
>> if I am wrong).
>>
>> Anyway, I changed the CAS to spinlock and rwlock, re-test them:
>>
>>       Test       lock-free       spinlock     rwlock
>> ------------------------------------------------------
>>  Initial write   1424141.62   1426372.84   1423019.21
>>        Rewrite   1652504.81   1623307.14   1653682.04
>>           Read  11404668.35  11242885.05  10938125.00
>>        Re-read  11555483.75   11253906.6  10837773.50
>>   Reverse Read   8394478.17   8277250.34   7768057.39
>>    Stride read   9372229.95   9010498.53   8692871.77
>>    Random read   9187221.90   8988080.55   8661184.60
>> Mixed workload   5843370.85   5414729.54   5451055.03
>>   Random write   1608947.04   1572276.64   1588866.51
>>         Pwrite   1311055.32   1302463.04   1302001.06
>>          Pread   4652056.11   4555802.18   4469672.34
>
> I'd like to clear it out.
> The spinlock and rwlock you mentioned is per-meta entry lock like state
> you added or global lock for meta? If it's latter, rwlock means base?

The spinlock and rwlock is per-meta entry lock like state.
Because the base data is added in the first mail, I don't mention
them here.

>>
>> And I cann't say which one is the best, they have the similar performance.
>
> Most popular use of zram is the in-memory swap for small embedded system
> so I don't want to increase memory footprint without good reason although
> it makes synthetic benchmark. Alhought it's 1M for 1G, it isn't small if we
> consider compression ratio and real free memory after boot(But data I have
> an interest is mixed workload enhancement. It would be important for heavy
> memory pressure for latency so it attractives me a lot. Anyway, I need number
> for back up the justification with real swap usecase rather than zram-blk.
> Recently, I have considered per-process reclaim  based on zram so maybe I will
> have a test for that).
> But recently, I have received private mail from some server folks to use
> zram-blk, not zram-swap so in case of that, such enhancement would be
> desirable so my point is I'm not saying the drop of the patch and let's
> find proper solution to meet both and gather more data.
>
>>
>> Wait, iozone will create temporary files for every test thread, so there is no
>> possibility that these threads access the same table[index] concurrenctly.
>> So, I use fio to test the raw zram block device.
>> To enhance the possibility of access the same table[index] conflictly, I set zram
>> with a small disksize(10M) and let thread run with large loop count.
>>
>> On the same test machine, the fio test command is:
>> fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
>> --scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
>> --filename=/dev/zram0 --name=seq-write --rw=write --stonewall
>> --name=seq-read --rw=read --stonewall --name=seq-readwrite
>> --rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
>>
>>     Test      base    lock-free   spinlock   rwlock
>> ------------------------------------------------------
>> seq-write   935109.2   999580.5   998134.8   994384.6
>>  seq-read  5598064.6  6444011.5  6243184.6  6197514.2
>>    seq-rw  1403963.0  1635673.0  1633823.0  1635972.2
>>   rand-rw  1389864.4  1612520.4  1613403.6  1612129.8
>
> What's the difference between base and rwlock?
> Base means global rwlock while rwlock means per-meta entry rwlock?
>

Base means  global rwlock, it is the 3.15.0-rc3 code.
rwlock means per-meta entry rwlock.

Sorry to confuse you.

>>
>> This result(KB/s, average of 5 tests) shows the performance improvement
>> on base version, however, I cann't say which method is the best.
>>
>> >>
>> >> It sounds like seqlocks will match this access pattern pretty well?
>> >
>> > Indeed. And after a closer look, except for zram_slot_free_notify(),
>> > that lock is always shared. So, unless fine graining it implies taking
>> > the lock exclusively like in this patch (if so, that needs to be
>> > explicitly documented in the changelog), we would ideally continue to
>> > share it. That _should_ provide nicer performance numbers when using the
>> > correct lock.
>> >
>>
>> Andrew mentioned seqlocks, however, I think it is hard the use seqlocks here
>> after I recheck the codes. No matter use it as a meta global lock or a
>> table[index] lock. The main reason is the writer will free the handle rather than
>> just change some value.
>> If I misunderstand you, please let me know.
>>
>> Now, I am in a delimma. For minimizing the memory overhead, I like to use CAS.
>> However, it is not a standard way.
>>
>> Any complaint or suggestions are welcomed.
>>
>> Regards,
>>
>> >
>> >
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-05-07  9:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-05  4:01 Weijie Yang
2014-05-05 10:32 ` Sergey Senozhatsky
2014-05-05 15:20 ` Seth Jennings
2014-05-05 18:00   ` Davidlohr Bueso
2014-05-05 20:46     ` Andrew Morton
2014-05-05 22:22       ` Davidlohr Bueso
2014-05-07  7:51         ` Weijie Yang
2014-05-07  8:57           ` Minchan Kim
2014-05-07  9:16             ` Weijie Yang [this message]
2014-05-07 14:52               ` Joonsoo Kim
2014-05-08  6:24                 ` Minchan Kim
2014-05-10  6:10                   ` Weijie Yang
2014-05-12  5:15                     ` Minchan Kim
2014-05-12 14:49                       ` Davidlohr Bueso
2014-05-13  0:03                         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL1ERfOXNrfKqMVs-Yz8yJjKKU3L5fjUEOb0Aeyqc37py-BWEg@mail.gmail.com \
    --to=weijie.yang.kh@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bob.liu@oracle.com \
    --cc=davidlohr@hp.com \
    --cc=ddstreet@ieee.org \
    --cc=heesub.shin@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=sjennings@variantweb.net \
    --cc=weijie.yang@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox