From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f179.google.com (mail-lb0-f179.google.com [209.85.217.179]) by kanga.kvack.org (Postfix) with ESMTP id 045066B006E for ; Thu, 8 Jan 2015 11:32:44 -0500 (EST) Received: by mail-lb0-f179.google.com with SMTP id z11so3780513lbi.10 for ; Thu, 08 Jan 2015 08:32:42 -0800 (PST) Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com. [209.85.217.169]) by mx.google.com with ESMTPS id vp9si9098988lbb.134.2015.01.08.08.32.41 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 08 Jan 2015 08:32:41 -0800 (PST) Received: by mail-lb0-f169.google.com with SMTP id p9so3790197lbv.0 for ; Thu, 08 Jan 2015 08:32:41 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20150106004714.6d63023c.akpm@linux-foundation.org> References: <1414185652-28663-1-git-send-email-matthew.r.wilcox@intel.com> <20141210140347.GA23252@infradead.org> <20141210141211.GD2220@wil.cx> <20150105184143.GA665@infradead.org> <20150106004714.6d63023c.akpm@linux-foundation.org> Date: Thu, 8 Jan 2015 11:27:09 -0500 Message-ID: Subject: Re: [PATCH v12 00/20] DAX: Page cache bypass for filesystems on memory storage From: Milosz Tanski Content-Type: multipart/alternative; boundary=089e014940aefa0366050c26827e Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Christoph Hellwig , Matthew Wilcox , Matthew Wilcox , "linux-fsdevel@vger.kernel.org" , LKML , linux-mm@kvack.org, Linus Torvalds --089e014940aefa0366050c26827e Content-Type: text/plain; charset=UTF-8 On Tue, Jan 6, 2015 at 3:47 AM, Andrew Morton wrote: > On Mon, 5 Jan 2015 10:41:43 -0800 Christoph Hellwig > wrote: > > > On Wed, Dec 10, 2014 at 09:12:11AM -0500, Matthew Wilcox wrote: > > > On Wed, Dec 10, 2014 at 06:03:47AM -0800, Christoph Hellwig wrote: > > > > What is the status of this patch set? > > > > > > I have no outstanding bug reports against it. Linus told me that he > > > wants to see it come through Andrew's tree. I have an email two weeks > > > ago from Andrew saying that it's on his list. I would love to see it > > > merged since it's almost a year old at this point. > > > > And since then another month and aother merge window has passed. Is > > there any way to speed up merging big patch sets like this one? > > I took a look at dax last time and found it to be unreviewable due to > lack of design description, objectives and code comments. Hopefully > that's been addressed - I should get back to it fairly soon as I chew > through merge window and holiday backlog. > > > Another one is non-blocking read one that has real life use on one > > of the biggest server side webapp frameworks but doesn't seem to make > > progress, which is a bit frustrating. > > I took a look at pread2() as well and I have two main issues: > > - The patchset includes a pwrite2() syscall which has nothing to do > with nonblocking reads and which was poorly described and had little > justification for inclusion. > > - We've talked for years about implementing this via fincore+pread > and at least two fincore implementations are floating about. Now > along comes pread2() which does it all in one hit. > > Which approach is best? I expect fincore+pread is simpler, more > flexible and more maintainable. But pread2() will have lower CPU > consumption and lower average-case latency. > > But how *much* better is pread2()? I expect the difference will be > minor because these operations are associated with a great big > cache-stomping memcpy. If the pread2() advantage is "insignificant > for real world workloads" then perhaps it isn't the best way to go. > > I just don't know, and diligence requires that we answer the > question. But all I've seen in response to these questions is > handwaving. It would be a shame to make a mistake because nobody > found the time to perform the investigation. > > Also, integration of pread2() into xfstests is (or was) happening and > the results of that aren't yet known. > Andrew I got busier with my other job related things between the Thanksgiving & Christmas then anticipated. However, I have updated and taken apart the patchset into two pieces (preadv2 and pwritev2). That should make evaluating the two separately easier. With the help of Volker I hacked up preadv2 support into samba and I hopefully have some numbers from it soon. Finally, I'm putting together a test case for the typical webapp middle-tier service (epoll + threadpool for diskio). Haven't stopped, just progressing on that slower due to external factors. -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: milosz@adfin.com --089e014940aefa0366050c26827e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Tue, Jan 6, 2015 at 3:47 AM, Andrew Morton <akpm@linux-foun= dation.org> wrote:
On Mon, = 5 Jan 2015 10:41:43 -0800 Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Dec 10, 2014 at 09:12:11AM -0500, Matthew Wilcox wrote:
> > On Wed, Dec 10, 2014 at 06:03:47AM -0800, Christoph Hellwig wrote= :
> > > What is the status of this patch set?
> >
> > I have no outstanding bug reports against it.=C2=A0 Linus told me= that he
> > wants to see it come through Andrew's tree.=C2=A0 I have an e= mail two weeks
> > ago from Andrew saying that it's on his list.=C2=A0 I would l= ove to see it
> > merged since it's almost a year old at this point.
>
> And since then another month and aother merge window has passed.=C2=A0= Is
> there any way to speed up merging big patch sets like this one?

I took a look at dax last time and found it to be unreviewable due to
lack of design description, objectives and code comments.=C2=A0 Hopefully that's been addressed - I should get back to it fairly soon as I chew through merge window and holiday backlog.

> Another one is non-blocking read one that has real life use on one
> of the biggest server side webapp frameworks but doesn't seem to m= ake
> progress, which is a bit frustrating.

I took a look at pread2() as well and I have two main issues:

- The patchset includes a pwrite2() syscall which has nothing to do
=C2=A0 with nonblocking reads and which was poorly described and had little=
=C2=A0 justification for inclusion.

- We've talked for years about implementing this via fincore+pread
=C2=A0 and at least two fincore implementations are floating about.=C2=A0 N= ow
=C2=A0 along comes pread2() which does it all in one hit.

=C2=A0 Which approach is best?=C2=A0 I expect fincore+pread is simpler, mor= e
=C2=A0 flexible and more maintainable.=C2=A0 But pread2() will have lower C= PU
=C2=A0 consumption and lower average-case latency.

=C2=A0 But how *much* better is pread2()?=C2=A0 I expect the difference wil= l be
=C2=A0 minor because these operations are associated with a great big
=C2=A0 cache-stomping memcpy.=C2=A0 If the pread2() advantage is "insi= gnificant
=C2=A0 for real world workloads" then perhaps it isn't the best wa= y to go.

=C2=A0 I just don't know, and diligence requires that we answer the
=C2=A0 question.=C2=A0 But all I've seen in response to these questions= is
=C2=A0 handwaving.=C2=A0 It would be a shame to make a mistake because nobo= dy
=C2=A0 found the time to perform the investigation.

Also, integration of pread2() into xfstests is (or was) happening and
the results of that aren't yet known.

Andrew I =C2=A0got busier with my other job related things between the T= hanksgiving & Christmas then anticipated. However, I have updated and t= aken apart the patchset into two pieces (preadv2 and pwritev2). That should= make evaluating the two separately easier. With the help of Volker I hacke= d up preadv2 support into samba and I hopefully have some numbers from it s= oon. Finally, I'm putting together a test case for the typical webapp m= iddle-tier service (epoll + threadpool for diskio).

Haven't stopped, just progressing on that slower due to external fact= ors.

--
Milosz TanskiCTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253= -9055
e: milos= z@adfin.com
--089e014940aefa0366050c26827e-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org