linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, Maxim Patlasov <MPatlasov@parallels.com>,
	hmh@hmh.eng.br, mel@csn.ul.ie, t.artem@lycos.com,
	Theodore Ts'o <tytso@mit.edu>, Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [patch 15/15] mm: add strictlimit knob
Date: Thu, 7 Dec 2017 18:15:47 +0800	[thread overview]
Message-ID: <20171207101547.ljfayqfp3lczhfvi@wfg-t540p.sh.intel.com> (raw)
In-Reply-To: <CAJfpegsE-jUOWjpMVQv76cDxp3aLpAfxrMa-vutMFa0KhVKrHw@mail.gmail.com>

On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
>On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>> CC fuse maintainer, too.
>>
>> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
>>>
>>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:
>>>
>>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
>>>> > From: Maxim Patlasov <MPatlasov@parallels.com>
>>>> > Subject: mm: add strictlimit knob
>>>> >
>>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
>>>> > limits
>>>> > for FUSE which sets bdi max_ratio to 1% by default:
>>>> >
>>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
>>>> >
>>>> > However the feature can be useful for other relatively slow or
>>>> > untrusted
>>>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
>>>> > the feature:
>>>> >
>>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
>>>> >
>>>> > Being enabled, the feature enforces bdi max_ratio limit even if global
>>>> > (10%) dirty limit is not reached.  Of course, the effect is not visible
>>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
>>>> > value.
>>>>
>>>> In principle I have nothing against this and the usecase sounds
>>>> reasonable
>>>> (in fact I believe the lack of a feature like this is one of reasons why
>>>> desktop automounters usually mount USB devices with 'sync' mount option).
>>>> So feel free to add:
>>>>
>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>
>>>
>>> Cc Jens, who may be vaguely interested in plans to finally merge this
>>> three-year-old patch?
>>>
>>>
>>>
>>> From: Maxim Patlasov <MPatlasov@parallels.com>
>>> Subject: mm: add strictlimit knob
>>>
>>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
>>> for FUSE which sets bdi max_ratio to 1% by default:
>>>
>>> http://article.gmane.org/gmane.linux.kernel.mm/105809
>>
>>
>> That link is invalid for now, possibly due to the gmane site rebuild.
>> I find an email thread here which looks relevant:
>>
>> https://sourceforge.net/p/fuse/mailman/message/35254883/
>>
>> Where Maxim has an interesting point:
>>
>>        > Did any one try increasing the limit and did see any better/worse
>>> performance ?
>>
>>        We've used 20% as default value in OpenVZ kernel for a long while (1%
>> was not enough to saturate our distributed parallel storage).
>>
>> So the knob will also enable people to _disable_ the 1% fuse limit to
>> increase performance.
>>
>> So people can use the exposed knob in 2 ways to fit their needs, which
>> is in general a good thing.
>>
>> However the comment in wb_position_ratio() says
>>
>>                        Without strictlimit feature, fuse writeback may
>>          * consume arbitrary amount of RAM because it is accounted in
>>          * NR_WRITEBACK_TEMP which is not involved in calculating
>> "nr_dirty".
>>
>> How dangerous would that be if some user disabled the 1% fuse limit
>> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
>> beyond the user's expectation (20% max dirty limit)?
>>
>> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
>> WB_WRITEBACK, which should be throttled when bdi_write_congested().
>> The congested flag will be set on
>>
>>        fuse_conn.num_background >= fuse_conn.congestion_threshold
>>        So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
>> it's not included in the 20% dirty limit.
>
>Only balance_dirty_pages_ratelimited() is going to limit the
>generation of dirty pages, I don't think congestion flags will do
>that.

Right. However my concern is something to limit the generation of
fuse's _writeback_ pages.

The normal writeback pages are limited in 2 ways:

- balance_dirty_pages_ratelimited()'s dirty throttling:

  nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit

- block layer's nr_requests queue limit

However fuse's NR_WRITEBACK_TEMP looks special and has none of such
limits. The congested bit merely affect the vmscan pageout path.

        pageout
          may_write_to_inode
            inode_write_congested
              wb_congested

I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
Either explicitly or implicitly, there has to be some hard limit.

>And (AFAICS) for fuse only  BDI_CAP_STRICTLIMIT will allow
>accounting temp writeback pages when throttling dirty page generation.
>So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
>So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
>permit disabling it for fuse).

So fuse relies on small nr_dirty. Does fuse impose any explicit or
implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)?
Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed.

For example, is it possible for some process (eg. dd) to dirty pages
as fast as possible while some other kernel logic to convert PG_dirty
to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi
strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP)
cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP?

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-12-07 10:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30 22:15 akpm
2017-12-01 12:29 ` Jan Kara
2017-12-07  1:09   ` Andrew Morton
2017-12-07  4:14     ` Fengguang Wu
2017-12-07  8:50       ` Miklos Szeredi
2017-12-07 10:15         ` Fengguang Wu [this message]
2017-12-07 10:32           ` Miklos Szeredi
2018-01-31 22:58             ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171207101547.ljfayqfp3lczhfvi@wfg-t540p.sh.intel.com \
    --to=fengguang.wu@intel.com \
    --cc=MPatlasov@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hmh@hmh.eng.br \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=miklos@szeredi.hu \
    --cc=t.artem@lycos.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox