From: Matheus Fillipe <matheusfillipeag@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Rodolfo García Peñas" <kix@kix.es>,
"Oliver Winker" <oliverml1@oli1170.net>,
"Jan Kara" <jack@suse.cz>,
bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org,
"Maxim Patlasov" <mpatlasov@parallels.com>,
"Fengguang Wu" <fengguang.wu@intel.com>,
"Tejun Heo" <tj@kernel.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
killian.de.volder@megasoft.be, atillakaraca72@hotmail.com,
jrf@mailbox.org
Subject: Re: [Bug 75101] New: [bisected] s2disk / hibernate blocks on "Saving 506031 image data pages () ..."
Date: Wed, 3 Apr 2019 00:54:05 -0300 [thread overview]
Message-ID: <CAFWuBvcAFhhPk4K-w7OLVBo8psWuDdUP4hJNLq3QeFUyg=_Mow@mail.gmail.com> (raw)
In-Reply-To: <20190402162500.def729ec05e6e267bff8a5da@linux-foundation.org>
[-- Attachment #1: Type: text/plain, Size: 6272 bytes --]
Wow! Here I am to revive this topic in 2019! I have exactly the same
problem, on ubuntu 18.04.2 with basically all kernels since 4.15.0-42 up to
5, which was all I tested, currently on 4.18.0-17-generic... I guess this
has nothing to do with the kernel anyway.
It was working fine before, even with proprietary nvidia drivers which
would generally cause a bug on the resume and not while saving the ram
snapshot. I've been trying to tell this to the ubuntu guys and you can see
my whole story with this problem right here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819915
Shortly, I tried with or without nvidia modules enabled (or intel or using
nouveau), many different kernels, disabled i915, and this is all get in
all those different combinations:
https://launchpadlibrarian.net/417327528/i915.jpg
The event is pretty random and seems to be more likely to happen after 2 or
4 gb of ram is ever used (I have 16 in total), and nothing changes if later
I reduce the ram usage later. But is random, I successfully hibernated with
11gb in use yesterday, just resumed and hibernated 5 seconds later without
doing nothing else than running hibernate, and got freeze there.
This also happens randomly if there's just 3 or 2 gb in use, likely on the
second attempt of after more than 5 minutes after the computer is on. What
can be wrong here?
On Tue, Apr 2, 2019, 20:25 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> I cc'ed a bunch of people from bugzilla.
>
> Folks, please please please remember to reply via emailed
> reply-to-all. Don't use the bugzilla interface!
>
> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <
> rafael.j.wysocki@intel.com> wrote:
>
> > On 6/13/2014 6:55 AM, Johannes Weiner wrote:
> > > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> > >> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> > >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> > >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> > >>>>> Hi Oliver,
> > >>>>>
> > >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see
> [1]
> > >>>>>> attached.
> > >>>>>>
> > >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in
> detail
> > >>>>>> Probably more efficient when one of you guys looks directly.
> > >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the
> > >>>>> bdi_wq workqueue as it should:
> > >>>>>
> > >>>>> [ 249.148009] s2disk-3327 2.... 48550413us :
> global_dirty_limits <-balance_dirty_pages_ratelimited
> > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> global_dirtyable_memory <-global_dirty_limits
> > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> writeback_in_progress <-balance_dirty_pages_ratelimited
> > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> mod_delayed_work_on <-balance_dirty_pages_ratelimited
> > >>>>> but the worker wakeup doesn't actually do anything:
> > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us :
> finish_task_switch <-__schedule
> > >>>>> [ 249.148009] kworker/-3466 2.... 48550431us :
> _raw_spin_lock_irq <-worker_thread
> > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us :
> need_to_create_worker <-worker_thread
> > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us :
> worker_enter_idle <-worker_thread
> > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us :
> too_many_workers <-worker_enter_idle
> > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> > >>>>>
> > >>>>> My suspicion is that this fails because the bdi_wq is frozen at
> this
> > >>>>> point and so the flush work never runs until resume, whereas
> before my
> > >>>>> patch the effective dirty limit was high enough so that image
> could be
> > >>>>> written in one go without being throttled; followed by an fsync()
> that
> > >>>>> then writes the pages in the context of the unfrozen s2disk.
> > >>>>>
> > >>>>> Does this make sense? Rafael? Tejun?
> > >>>> Well, it does seem to make sense to me.
> > >>> From what I see, this is a deadlock in the userspace suspend model
> and
> > >>> just happened to work by chance in the past.
> > >> Well, it had been working for quite a while, so it was a rather large
> > >> opportunity
> > >> window it seems. :-)
> > > No doubt about that, and I feel bad that it broke. But it's still a
> > > deadlock that can't reasonably be accommodated from dirty throttling.
> > >
> > > It can't just put the flushers to sleep and then issue a large amount
> > > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
> > > the messenger, this bug needs to be addressed, not get papered over.
> > >
> > >>> Can we patch suspend-utils as follows?
> > >> Perhaps we can. Let's ask the new maintainer.
> > >>
> > >> Rodolfo, do you think you can apply the patch below to suspend-utils?
> > >>
> > >>> Alternatively, suspend-utils
> > >>> could clear the dirty limits before it starts writing and restore
> them
> > >>> post-resume.
> > >> That (and the patch too) doesn't seem to address the problem with
> existing
> > >> suspend-utils
> > >> binaries, however.
> > > It's userspace that freezes the system before issuing buffered IO, so
> > > my conclusion was that the bug is in there. This is arguable. I also
> > > wouldn't be opposed to a patch that sets the dirty limits to infinity
> > > from the ioctl that freezes the system or creates the image.
> >
> > OK, that sounds like a workable plan.
> >
> > How do I set those limits to infinity?
>
> Five years have passed and people are still hitting this.
>
> Killian described the workaround in comment 14 at
> https://bugzilla.kernel.org/show_bug.cgi?id=75101.
>
> People can use this workaround manually by hand or in scripts. But we
> really should find a proper solution. Maybe special-case the freezing
> of the flusher threads until all the writeout has completed. Or
> something else.
>
[-- Attachment #2: Type: text/html, Size: 8457 bytes --]
next prev parent reply other threads:[~2019-04-03 3:54 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-05 23:33 Johannes Weiner
2014-05-05 23:45 ` Rafael J. Wysocki
2014-06-12 22:02 ` Johannes Weiner
2014-06-12 23:50 ` Rafael J. Wysocki
2014-06-13 4:55 ` Johannes Weiner
2014-06-16 16:29 ` Rafael J. Wysocki
2019-04-02 23:25 ` Andrew Morton
2019-04-03 3:54 ` Matheus Fillipe [this message]
2019-04-03 8:23 ` Rainer Fiebig
2019-04-03 8:34 ` Rainer Fiebig
2019-04-03 9:34 ` Jan Kara
2019-04-03 10:04 ` Rainer Fiebig
2019-04-03 16:59 ` Matheus Fillipe
2019-04-03 17:55 ` Rainer Fiebig
2019-04-03 19:08 ` Matheus Fillipe
[not found] ` <CAFWuBvfxS0S6me_pneXmNzKwObSRUOg08_7=YToAoBg53UtPKg@mail.gmail.com>
2019-04-04 10:48 ` Rainer Fiebig
2019-04-04 16:04 ` matheus
2019-04-03 21:43 ` Rafael J. Wysocki
[not found] <bug-75101-27@https.bugzilla.kernel.org/>
2014-04-29 22:24 ` Andrew Morton
2014-05-05 15:35 ` Johannes Weiner
2014-05-05 16:10 ` Jan Kara
2014-05-05 21:00 ` Oliver Winker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFWuBvcAFhhPk4K-w7OLVBo8psWuDdUP4hJNLq3QeFUyg=_Mow@mail.gmail.com' \
--to=matheusfillipeag@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=atillakaraca72@hotmail.com \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=jrf@mailbox.org \
--cc=killian.de.volder@megasoft.be \
--cc=kix@kix.es \
--cc=linux-mm@kvack.org \
--cc=mpatlasov@parallels.com \
--cc=oliverml1@oli1170.net \
--cc=rafael.j.wysocki@intel.com \
--cc=rjw@rjwysocki.net \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox