From: Rainer Fiebig <jrf@mailbox.org>
To: Jan Kara <jack@suse.cz>, Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Rodolfo García Peñas (kix)" <kix@kix.es>,
"Oliver Winker" <oliverml1@oli1170.net>,
bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org,
"Maxim Patlasov" <mpatlasov@parallels.com>,
"Fengguang Wu" <fengguang.wu@intel.com>,
"Tejun Heo" <tj@kernel.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
killian.de.volder@megasoft.be, atillakaraca72@hotmail.com,
matheusfillipeag@gmail.com
Subject: Re: [Bug 75101] New: [bisected] s2disk / hibernate blocks on "Saving 506031 image data pages () ..."
Date: Wed, 3 Apr 2019 12:04:15 +0200 [thread overview]
Message-ID: <1ea9f923-4756-85b2-6092-6d9e94d576a1@mailbox.org> (raw)
In-Reply-To: <20190403093432.GD8836@quack2.suse.cz>
[-- Attachment #1.1: Type: text/plain, Size: 6426 bytes --]
Am 03.04.19 um 11:34 schrieb Jan Kara:
> On Tue 02-04-19 16:25:00, Andrew Morton wrote:
>>
>> I cc'ed a bunch of people from bugzilla.
>>
>> Folks, please please please remember to reply via emailed
>> reply-to-all. Don't use the bugzilla interface!
>>
>> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> wrote:
>>
>>> On 6/13/2014 6:55 AM, Johannes Weiner wrote:
>>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
>>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
>>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
>>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
>>>>>>>> Hi Oliver,
>>>>>>>>
>>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1]
>>>>>>>>> attached.
>>>>>>>>>
>>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail
>>>>>>>>> Probably more efficient when one of you guys looks directly.
>>>>>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the
>>>>>>>> bdi_wq workqueue as it should:
>>>>>>>>
>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited
>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory <-global_dirty_limits
>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited
>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited
>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited
>>>>>>>> but the worker wakeup doesn't actually do anything:
>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch <-__schedule
>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq <-worker_thread
>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker <-worker_thread
>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle <-worker_thread
>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers <-worker_enter_idle
>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule <-worker_thread
>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule <-worker_thread
>>>>>>>>
>>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at this
>>>>>>>> point and so the flush work never runs until resume, whereas before my
>>>>>>>> patch the effective dirty limit was high enough so that image could be
>>>>>>>> written in one go without being throttled; followed by an fsync() that
>>>>>>>> then writes the pages in the context of the unfrozen s2disk.
>>>>>>>>
>>>>>>>> Does this make sense? Rafael? Tejun?
>>>>>>> Well, it does seem to make sense to me.
>>>>>> From what I see, this is a deadlock in the userspace suspend model and
>>>>>> just happened to work by chance in the past.
>>>>> Well, it had been working for quite a while, so it was a rather large
>>>>> opportunity
>>>>> window it seems. :-)
>>>> No doubt about that, and I feel bad that it broke. But it's still a
>>>> deadlock that can't reasonably be accommodated from dirty throttling.
>>>>
>>>> It can't just put the flushers to sleep and then issue a large amount
>>>> of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
>>>> the messenger, this bug needs to be addressed, not get papered over.
>>>>
>>>>>> Can we patch suspend-utils as follows?
>>>>> Perhaps we can. Let's ask the new maintainer.
>>>>>
>>>>> Rodolfo, do you think you can apply the patch below to suspend-utils?
>>>>>
>>>>>> Alternatively, suspend-utils
>>>>>> could clear the dirty limits before it starts writing and restore them
>>>>>> post-resume.
>>>>> That (and the patch too) doesn't seem to address the problem with existing
>>>>> suspend-utils
>>>>> binaries, however.
>>>> It's userspace that freezes the system before issuing buffered IO, so
>>>> my conclusion was that the bug is in there. This is arguable. I also
>>>> wouldn't be opposed to a patch that sets the dirty limits to infinity
>>>> from the ioctl that freezes the system or creates the image.
>>>
>>> OK, that sounds like a workable plan.
>>>
>>> How do I set those limits to infinity?
>>
>> Five years have passed and people are still hitting this.
>>
>> Killian described the workaround in comment 14 at
>> https://bugzilla.kernel.org/show_bug.cgi?id=75101.
>>
>> People can use this workaround manually by hand or in scripts. But we
>> really should find a proper solution. Maybe special-case the freezing
>> of the flusher threads until all the writeout has completed. Or
>> something else.
>
> I've refreshed my memory wrt this bug and I believe the bug is really on
> the side of suspend-utils (uswsusp or however it is called). They are low
> level system tools, they ask the kernel to freeze all processes
> (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is
> relatively heavyweight infrastructure) to work. That is wrong in my
> opinion.
>
> I can see Johanness was suggesting in comment 11 to use O_SYNC in
> suspend-utils which worked but was too slow. Indeed O_SYNC is rather big
> hammer but using O_DIRECT should be what they need and get better
> performance - no additional buffering in the kernel, no dirty throttling,
> etc. They only need their buffer & device offsets sector aligned - they
> seem to be even page aligned in suspend-utils so they should be fine. And
> if the performance still sucks (currently they appear to do mostly random
> 4k writes so it probably would for rotating disks), they could use AIO DIO
> to get multiple pages in flight (as many as they dare to allocate buffers)
> and then the IO scheduler will reorder things as good as it can and they
> should get reasonable performance.
>
> Is there someone who works on suspend-utils these days? Because the repo
> I've found on kernel.org seems to be long dead (last commit in 2012).
>
> Honza
>
Whether it's suspend-utils (or uswsusp) or not could be answered quickly
by de-installing this package and using the kernel-methods instead.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2019-04-03 10:04 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-05 23:33 Johannes Weiner
2014-05-05 23:45 ` Rafael J. Wysocki
2014-06-12 22:02 ` Johannes Weiner
2014-06-12 23:50 ` Rafael J. Wysocki
2014-06-13 4:55 ` Johannes Weiner
2014-06-16 16:29 ` Rafael J. Wysocki
2019-04-02 23:25 ` Andrew Morton
2019-04-03 3:54 ` Matheus Fillipe
2019-04-03 8:23 ` Rainer Fiebig
2019-04-03 8:34 ` Rainer Fiebig
2019-04-03 9:34 ` Jan Kara
2019-04-03 10:04 ` Rainer Fiebig [this message]
2019-04-03 16:59 ` Matheus Fillipe
2019-04-03 17:55 ` Rainer Fiebig
2019-04-03 19:08 ` Matheus Fillipe
[not found] ` <CAFWuBvfxS0S6me_pneXmNzKwObSRUOg08_7=YToAoBg53UtPKg@mail.gmail.com>
2019-04-04 10:48 ` Rainer Fiebig
2019-04-04 16:04 ` matheus
2019-04-03 21:43 ` Rafael J. Wysocki
[not found] <bug-75101-27@https.bugzilla.kernel.org/>
2014-04-29 22:24 ` Andrew Morton
2014-05-05 15:35 ` Johannes Weiner
2014-05-05 16:10 ` Jan Kara
2014-05-05 21:00 ` Oliver Winker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ea9f923-4756-85b2-6092-6d9e94d576a1@mailbox.org \
--to=jrf@mailbox.org \
--cc=akpm@linux-foundation.org \
--cc=atillakaraca72@hotmail.com \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=killian.de.volder@megasoft.be \
--cc=kix@kix.es \
--cc=linux-mm@kvack.org \
--cc=matheusfillipeag@gmail.com \
--cc=mpatlasov@parallels.com \
--cc=oliverml1@oli1170.net \
--cc=rafael.j.wysocki@intel.com \
--cc=rjw@rjwysocki.net \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox