linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Lang <david@lang.hm>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andres Freund <andres@2ndquadrant.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	lsf@lists.linux-foundation.org,
	Wu Fengguang <fengguang.wu@intel.com>,
	rhaas@anarazel.de
Subject: Re: [Lsf] Postgresql performance problems with IO latency, especially during fsync()
Date: Wed, 26 Mar 2014 15:35:42 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.02.1403261532360.2190@nftneq.ynat.uz> (raw)
In-Reply-To: <CALCETrVEjpFpKhY6=CEG-9Prm=uBDLS936imb=+hyWN4fXPjtg@mail.gmail.com>

On Wed, 26 Mar 2014, Andy Lutomirski wrote:

>>> I'm not sure I understand the request queue stuff, but here's an idea.
>>>  The block core contains this little bit of code:
>>
>> I haven't read enough of the code yet, to comment intelligently ;)
>
> My little patch doesn't seem to help.  I'm either changing the wrong
> piece of code entirely or I'm penalizing readers and writers too much.
>
> Hopefully some real block layer people can comment as to whether a
> refinement of this idea could work.  The behavior I want is for
> writeback to be limited to using a smallish fraction of the total
> request queue size -- I think that writeback should be able to enqueue
> enough requests to get decent sorting performance but not enough
> requests to prevent the io scheduler from doing a good job on
> non-writeback I/O.

The thing is that if there are no reads that are waiting, why not use every bit 
of disk I/O available to write? If you can do that reliably with only using part 
of the queue, fine, but aren't you getting fairly close to just having separate 
queues for reading and writing with such a restriction?

> As an even more radical idea, what if there was a way to submit truly
> enormous numbers of lightweight requests, such that the queue will
> give the requester some kind of callback when the request is nearly
> ready for submission so the requester can finish filling in the
> request?  This would allow things like dm-crypt to get the benefit of
> sorting without needing to encrypt hundreds of MB of data in advance
> of having that data actually be to the backing device.  It might also
> allow writeback to submit multiple gigabytes of writes, in arbitrarily
> large pieces, but not to need to pin pages or do whatever expensive
> things are needed until the IO actually happens.

the problem with a callback is that you then need to wait for that source to get 
the CPU and finish doing it's work. What happens if that takes long enough for 
you to run out of data to write? And is it worth the extra context switches to 
bounce around when the writing process was finished with that block already.

David Lang

> For reference, here's my patch that doesn't work well:
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 4cd5ffc..c0dedc3 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -941,11 +941,11 @@ static struct request *__get_request(struct request_list *
>        }
>
>        /*
> -        * Only allow batching queuers to allocate up to 50% over the defined
> -        * limit of requests, otherwise we could have thousands of requests
> -        * allocated with any setting of ->nr_requests
> +        * Only allow batching queuers to allocate up to 50% of the
> +        * defined limit of requests, so that non-batching queuers can
> +        * get into the queue and thus be scheduled properly.
>         */
> -       if (rl->count[is_sync] >= (3 * q->nr_requests / 2))
> +       if (rl->count[is_sync] >= (q->nr_requests + 3) / 4)
>                return NULL;
>
>        q->nr_rqs[is_sync]++;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-03-26 22:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-26 19:11 Andres Freund
2014-03-26 21:41 ` [Lsf] " Andy Lutomirski
2014-03-26 21:55   ` Andres Freund
2014-03-26 22:26     ` Andy Lutomirski
2014-03-26 22:35       ` David Lang [this message]
2014-03-26 23:11         ` Andy Lutomirski
2014-03-26 23:28           ` Andy Lutomirski
2014-03-27 15:50     ` Jan Kara
2014-03-27 18:10       ` Fernando Luis Vazquez Cao
2014-03-27 15:52 ` Jan Kara
2014-04-09  9:20 ` Dave Chinner
2014-04-12 13:24   ` Andres Freund
2014-04-28 23:47   ` [Lsf] " Dave Chinner
2014-04-28 23:57     ` Andres Freund
2014-05-23  6:42       ` Dave Chinner
2014-06-04 20:06         ` Andres Freund

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1403261532360.2190@nftneq.ynat.uz \
    --to=david@lang.hm \
    --cc=andres@2ndquadrant.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf@lists.linux-foundation.org \
    --cc=luto@amacapital.net \
    --cc=rhaas@anarazel.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox