From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f197.google.com (mail-ua0-f197.google.com [209.85.217.197]) by kanga.kvack.org (Postfix) with ESMTP id 197836B0645 for ; Wed, 2 Aug 2017 19:49:40 -0400 (EDT) Received: by mail-ua0-f197.google.com with SMTP id 91so22543588uau.10 for ; Wed, 02 Aug 2017 16:49:40 -0700 (PDT) Received: from mail-vk0-x243.google.com (mail-vk0-x243.google.com. [2607:f8b0:400c:c05::243]) by mx.google.com with ESMTPS id l29si4822808uai.166.2017.08.02.16.49.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Aug 2017 16:49:39 -0700 (PDT) Received: by mail-vk0-x243.google.com with SMTP id i133so2901315vka.5 for ; Wed, 02 Aug 2017 16:49:39 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20170802160716.f5d1072873799a3a420f6538@linux-foundation.org> References: <20170802122505.e41d5c778a873375bcb0cc19@gmail.com> <20170802160716.f5d1072873799a3a420f6538@linux-foundation.org> From: Vitaly Wool Date: Thu, 3 Aug 2017 01:49:38 +0200 Message-ID: Subject: Re: [PATCH] z3fold: use per-cpu unbuddied lists Content-Type: multipart/alternative; boundary="001a114dbac8bfdef50555cdea5b" Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Oleksiy.Avramchenko@sony.com, Linux-MM , Dan Streetman , linux-kernel@vger.kernel.org --001a114dbac8bfdef50555cdea5b Content-Type: text/plain; charset="UTF-8" On Aug 3, 2017 01:07, "Andrew Morton" wrote: On Wed, 2 Aug 2017 12:25:05 +0200 Vitaly Wool wrote: > z3fold is operating on unbuddied lists in a simple manner: in fact, > it only takes the first entry off the list on a hot path. So if the > z3fold pool is big enough and balanced well enough, considering > only the lists local to the current CPU won't be an issue in any > way, while random I/O performance will go up. Has the performance benefit been measured? It's a large patch. Yes, mostly by running fio in randrw mode. We can see the performance more than doubling on a 8-core ARM64 system. > This patch also introduces two worker threads which: one for async > in-page object layout optimization and one for releasing freed > pages. Why? What are the runtime effects of this change? Does this turn currently-synchronous operations into now-async operations? If so, what are the implications of this if, say, the workqueue doesn't get serviced for a while? The biggest benefit is that it usually ends up with one call to compact_page instead of two. Also, we use z3fold as a zram backend and zram likes to free pages on a critical path so removing compaction from this critical path is definitely a nice thing. If compaction workqueue doesn't get serviced for a significant while, the ratio will go down a bit, no bad things will happen. And z3fold_alloc tries to take new pages from the stale list first, so even if release workqueue is not called, the pages will be reused by z3fold_alloc. etc. Sorry, but I'm not seeing anywhere near enough information and testing results to justify merging such a large and intrusive patch. I understand. Would it help if I add fio results and some explanations from this reply to the commit message?. Thanks, Vitaly --001a114dbac8bfdef50555cdea5b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Aug 3, 2017 01:07, "Andrew Morton" <akpm@linux-foundation.org> wrote:
O= n Wed, 2 Aug 2017 12:25:05 +0200 Vitaly Wool <vitalywool@gmail.com> wrote:

> z3fold is operating on unbuddied lists in a simple manner: in fact, > it only takes the first entry off the list on a hot path. So if the > z3fold pool is big enough and balanced well enough, considering
> only the lists local to the current CPU won't be an issue in any > way, while random I/O performance will go up.

Has the performance benefit been measured?=C2=A0 It's a large pat= ch.

Yes, mostly by running fio in randrw mode. We can see the perform= ance more than doubling on a 8-core ARM64 system.=C2=A0

> This patch also introduces two worker threads which: one for async
> in-page object layout optimization and one for releasing freed
> pages.

Why?=C2=A0 What are the runtime effects of this change?=C2=A0 Does th= is turn
currently-synchronous operations into now-async operations?=C2=A0 If so, what are the implications of this if, say, the workqueue doesn't get serviced for a while?
<= br>
The biggest benefit is that it usually ends up w= ith one call to compact_page instead of two. Also, we use z3fold as a zram = backend and zram likes to free pages on a critical path so removing compact= ion from this critical path is definitely a nice thing.=C2=A0

If compaction workqueue doesn't g= et serviced for a significant while, the ratio will go down a bit, no bad t= hings will happen. And z3fold_alloc tries to take new pages from the stale = list first, so even if release workqueue is not called, the pages will be r= eused by z3fold_alloc.=C2=A0

etc.=C2=A0 Sorry, but I'm not seeing anywhere near enough information a= nd
testing results to justify merging such a large and intrusive patch.

I understand. Would it help if I add fio results and som= e explanations from this reply to the commit message?.=C2=A0

Thanks,=C2=A0
=C2=A0 Vit= aly
--001a114dbac8bfdef50555cdea5b-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org