* Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off
[not found] ` <6303e492-62f8-cbcc-4536-81350f2e9a86@gmail.com>
@ 2016-11-30 18:16 ` Marc MERLIN
2016-12-01 15:49 ` Michal Hocko
0 siblings, 1 reply; 2+ messages in thread
From: Marc MERLIN @ 2016-11-30 18:16 UTC (permalink / raw)
To: Austin S. Hemmelgarn
Cc: Btrfs BTRFS, Michal Hocko, Vlastimil Babka, linux-mm,
Joonsoo Kim, torvalds
+folks from linux-mm thread for your suggestion
On Wed, Nov 30, 2016 at 01:00:45PM -0500, Austin S. Hemmelgarn wrote:
> > swraid5 < bcache < dmcrypt < btrfs
> >
> > Copying with btrfs send/receive causes massive hangs on the system.
> > Please see this explanation from Linus on why the workaround was
> > suggested:
> > https://lkml.org/lkml/2016/11/29/667
> And Linux' assessment is absolutely correct (at least, the general
> assessment is, I have no idea about btrfs_start_shared_extent, but I'm more
> than willing to bet he's correct that that's the culprit).
> > All of this mostly went away with Linus' suggestion:
> > echo 2 > /proc/sys/vm/dirty_ratio
> > echo 1 > /proc/sys/vm/dirty_background_ratio
> >
> > But that's hiding the symptom which I think is that btrfs is piling up too many I/O
> > requests during btrfs send/receive and btrfs scrub (probably balance too) and not
> > looking at resulting impact to system health.
> I see pretty much identical behavior using any number of other storage
> configurations on a USB 2.0 flash drive connected to a system with 16GB of
> RAM with the default dirty ratios because it's trying to cache up to 3.2GB
> of data for writeback. While BTRFS is doing highly sub-optimal things here,
> the ancient default writeback ratios are just as much a culprit. I would
> suggest that get changed to 200MB or 20% of RAM, whichever is smaller, which
> would give overall almost identical behavior to x86-32, which in turn works
> reasonably well for most cases. I sadly don't have the time, patience, or
> expertise to write up such a patch myself though.
Dear linux-mm folks, is that something you could consider (changing the
dirty_ratio defaults) given that it affects at least bcache and btrfs
(with or without bcache)?
By the way, on the 200MB max suggestion, when I had 2 and 1% (or 480MB
and 240MB on my 24GB system), this was enough to make btrfs behave
sanely, but only if I had bcache turned off.
With bcache enabled, those values were just enough so that bcache didn't
crash my system, but not enough that prevent undesirable behaviour
(things hanging, 100+ bcache kworkers piled up, and more). However, the
copy did succeed, despite the relative impact on the system, so it's
better than nothing :)
But the impact from bcache probably goes beyond what btrfs is
responsible for, so I have a separate thread on the bcache list:
http://marc.info/?l=linux-bcache&m=148052441423532&w=2
http://marc.info/?l=linux-bcache&m=148052620524162&w=2
On the plus side, btrfs did ok with 0 visible impact to my system with
those 480 and 240MB dirty ratio values.
Thanks for your reply, Austin.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off
2016-11-30 18:16 ` btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off Marc MERLIN
@ 2016-12-01 15:49 ` Michal Hocko
0 siblings, 0 replies; 2+ messages in thread
From: Michal Hocko @ 2016-12-01 15:49 UTC (permalink / raw)
To: Marc MERLIN
Cc: Austin S. Hemmelgarn, Btrfs BTRFS, Vlastimil Babka, linux-mm,
Joonsoo Kim, torvalds
On Wed 30-11-16 10:16:53, Marc MERLIN wrote:
> +folks from linux-mm thread for your suggestion
>
> On Wed, Nov 30, 2016 at 01:00:45PM -0500, Austin S. Hemmelgarn wrote:
> > > swraid5 < bcache < dmcrypt < btrfs
> > >
> > > Copying with btrfs send/receive causes massive hangs on the system.
> > > Please see this explanation from Linus on why the workaround was
> > > suggested:
> > > https://lkml.org/lkml/2016/11/29/667
> > And Linux' assessment is absolutely correct (at least, the general
> > assessment is, I have no idea about btrfs_start_shared_extent, but I'm more
> > than willing to bet he's correct that that's the culprit).
>
> > > All of this mostly went away with Linus' suggestion:
> > > echo 2 > /proc/sys/vm/dirty_ratio
> > > echo 1 > /proc/sys/vm/dirty_background_ratio
> > >
> > > But that's hiding the symptom which I think is that btrfs is piling up too many I/O
> > > requests during btrfs send/receive and btrfs scrub (probably balance too) and not
> > > looking at resulting impact to system health.
>
> > I see pretty much identical behavior using any number of other storage
> > configurations on a USB 2.0 flash drive connected to a system with 16GB of
> > RAM with the default dirty ratios because it's trying to cache up to 3.2GB
> > of data for writeback. While BTRFS is doing highly sub-optimal things here,
> > the ancient default writeback ratios are just as much a culprit. I would
> > suggest that get changed to 200MB or 20% of RAM, whichever is smaller, which
> > would give overall almost identical behavior to x86-32, which in turn works
> > reasonably well for most cases. I sadly don't have the time, patience, or
> > expertise to write up such a patch myself though.
>
> Dear linux-mm folks, is that something you could consider (changing the
> dirty_ratio defaults) given that it affects at least bcache and btrfs
> (with or without bcache)?
As much as the dirty_*ratio defaults a major PITA this is not something
that would be _easy_ to change without high risks of regressions. This
topic has been discussed many times with many good ideas, nothing really
materialized from them though :/
To be honest I really do hate dirty_*ratio and have seen many issues on
very large machines and always suggested to use dirty_bytes instead but
a particular value has always been a challenge to get right. It has
always been very workload specific.
That being said this is something more for IO people than MM IMHO.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-12-01 15:49 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20161118164643.g7ttuzgsj74d6fbz@merlins.org>
[not found] ` <20161118184915.j6dlazbgminxnxzx@merlins.org>
[not found] ` <b6c3daab-d990-e873-4d0f-0f0afe2259b1@coly.li>
[not found] ` <alpine.LRH.2.11.1611291255350.1914@mail.ewheeler.net>
[not found] ` <20161130164646.d6ejlv72hzellddd@merlins.org>
[not found] ` <20161130171814.3yrqzzoocg3kz4ki@merlins.org>
[not found] ` <6303e492-62f8-cbcc-4536-81350f2e9a86@gmail.com>
2016-11-30 18:16 ` btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off Marc MERLIN
2016-12-01 15:49 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox