re: is hibernation usable?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* re: is hibernation usable?
@ 2020-02-11 19:50 Chris Murphy
  2020-02-11 22:23 ` Luigi Semenzato
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2020-02-11 19:50 UTC (permalink / raw)
  To: linux-mm; +Cc: semenzato

Original thread:
https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/

This whole thread is a revelation. I have no doubt most users have no
idea that hibernation image creation is expected to fail if more than
50% RAM is used. Please bear with me while I ask some possibly
rudimentary questions to ensure I understand this in simple terms.

Example system: 32G RAM, all of it used, plus 2G of page outs (into
the swap device).

+ 2G already paged out to swap
+ 16GB needs to be paged out to swap, to free up enough memory to
create the hibernation image
+ 8-16GB for the (compressed) hibernation image to be written to a
*contiguous* range within swap device

This suggests a 26G-34G swap device, correct? (I realize that this
swap device could, in another example, contain more than 2G of page
outs already, and that would only increase this requirement.)

Is there now (or planned) an automatic kernel facility that will do
the eviction automatically, to free up enough memory, so that the
hibernation image can always be successfully created in-memory? If
not, does this suggest some facility needs to be created, maybe in
systemd, coordinating with the desktop environment? I don't need to
understand the details but I do want to understand if this exists,
will exist, and where it will exist.

One idea floated on Fedora devel@ a few months ago by a systemd
developer, is to activate a swap device at hibernation time. That way
the system is constrained to a smaller swap device, e.g. swap on
/dev/zram during normal use, but can still hibernate by activating a
suitably sized swap device on-demand. Do you anticipate any problems
with this idea? Could it be subject to race conditions?

Is there any difference in hibernation reliability between swap
partitions, versus swapfiles? I note there isn't a standard interface
for all file systems, notably Btrfs has a unique requirement [1]

Are there any prospects for signed hibernation images, in order to
support hibernation when UEFI Secure Boot is enabled?

What about the signing of swap? If there's a trust concern with the
hibernation image, and I agree that there is in the context of UEFI
SB, then it seems there's likewise a concern about active pages in
swap. Yes? No?

[1]
https://lore.kernel.org/linux-btrfs/CAJCQCtSLYY-AY8b1WZ1D4neTrwMsm_A61-G-8e6-H3Dmfue_vQ@mail.gmail.com/

Thanks!

--
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-11 19:50 is hibernation usable? Chris Murphy
@ 2020-02-11 22:23 ` Luigi Semenzato
  2020-02-20  2:54   ` Chris Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Luigi Semenzato @ 2020-02-11 22:23 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux Memory Management List

On Tue, Feb 11, 2020 at 11:50 AM Chris Murphy <lists@colorremedies.com> wrote:
>
> Original thread:
> https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/
>
> This whole thread is a revelation. I have no doubt most users have no
> idea that hibernation image creation is expected to fail if more than
> 50% RAM is used. Please bear with me while I ask some possibly
> rudimentary questions to ensure I understand this in simple terms.

To be clear, I am not completely sure of this.  Other developers are
not in agreement with this (as you can see from the thread).  However,
I can easily and consistently reproduce the memory allocation failure
when anon is >50% of total.  According to others, the image allocation
should reclaim pages by forcing anon pages to swap.  I don't
understand if/how the swap partition accommodates both swapped pages
and the hibernation image, but in any case, in my experiments, I
allocate a swap disk the same size of RAM, which should be sufficient
(again, according to the threads).

> Example system: 32G RAM, all of it used, plus 2G of page outs (into
> the swap device).
>
> + 2G already paged out to swap
> + 16GB needs to be paged out to swap, to free up enough memory to
> create the hibernation image
> + 8-16GB for the (compressed) hibernation image to be written to a
> *contiguous* range within swap device
>
> This suggests a 26G-34G swap device, correct? (I realize that this
> swap device could, in another example, contain more than 2G of page
> outs already, and that would only increase this requirement.)
>
> Is there now (or planned) an automatic kernel facility that will do
> the eviction automatically, to free up enough memory, so that the
> hibernation image can always be successfully created in-memory? If
> not, does this suggest some facility needs to be created, maybe in
> systemd, coordinating with the desktop environment? I don't need to
> understand the details but I do want to understand if this exists,
> will exist, and where it will exist.

I have a workaround, but it needs memcgroups.  You can

echo $limit > .../$cgroup/memory.mem.limit_in_bytes

and if your current usage is greater than $limit, and you have swap,
the operation will block until enough pages have been swapped out to
satisfy the limit.

Even this isn't guaranteed to work, even with enough free swap.  The
limit adjustment invokes mem_cgroup_resize_limit() which contains a
loop with multiple retries of a call to do_try_to_free_pages().  The
number of retries looks like a heuristic, and I've seen the resizing
fail.




> One idea floated on Fedora devel@ a few months ago by a systemd
> developer, is to activate a swap device at hibernation time. That way
> the system is constrained to a smaller swap device, e.g. swap on
> /dev/zram during normal use, but can still hibernate by activating a
> suitably sized swap device on-demand. Do you anticipate any problems
> with this idea? Could it be subject to race conditions?
>
> Is there any difference in hibernation reliability between swap
> partitions, versus swapfiles? I note there isn't a standard interface
> for all file systems, notably Btrfs has a unique requirement [1]
>
> Are there any prospects for signed hibernation images, in order to
> support hibernation when UEFI Secure Boot is enabled?
>
> What about the signing of swap? If there's a trust concern with the
> hibernation image, and I agree that there is in the context of UEFI
> SB, then it seems there's likewise a concern about active pages in
> swap. Yes? No?
>
>
> [1]
> https://lore.kernel.org/linux-btrfs/CAJCQCtSLYY-AY8b1WZ1D4neTrwMsm_A61-G-8e6-H3Dmfue_vQ@mail.gmail.com/
>
> Thanks!
>
> --
> Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-11 22:23 ` Luigi Semenzato
@ 2020-02-20  2:54   ` Chris Murphy
  2020-02-20  2:56     ` Chris Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2020-02-20  2:54 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Linux Memory Management List

On Tue, Feb 11, 2020 at 3:23 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Tue, Feb 11, 2020 at 11:50 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > Original thread:
> > https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/
> >
> > This whole thread is a revelation. I have no doubt most users have no
> > idea that hibernation image creation is expected to fail if more than
> > 50% RAM is used. Please bear with me while I ask some possibly
> > rudimentary questions to ensure I understand this in simple terms.
>
> To be clear, I am not completely sure of this.  Other developers are
> not in agreement with this (as you can see from the thread).  However,
> I can easily and consistently reproduce the memory allocation failure
> when anon is >50% of total.  According to others, the image allocation
> should reclaim pages by forcing anon pages to swap.  I don't
> understand if/how the swap partition accommodates both swapped pages
> and the hibernation image, but in any case, in my experiments, I
> allocate a swap disk the same size of RAM, which should be sufficient
> (again, according to the threads).

I'm testing with this method:

# echo reboot > /sys/power/disk
# echo disk > /sys/power/state

About 2/3rd of the time on a test system, hibernation entry fails.
It's fatal. The last journal entry is:
[  349.732372] PM: hibernation: hibernation entry

Screen is blank, system gets hot, fans go to high, and it doesn't
recover after 15 minutes. After forcing power off and rebooting, there
is no hibernation signature reported in the swap partition so I don't
think the kernel every reached reboot.

Shifting over to a qemu-kvm with pm support enabled, this is working.
If I fill up pretty much all of RAM and a small amount of swap is
used, the above two commands succeed, the VM reboots, and the
hibernation image is resumed without error. AnonPages is 73% of total.
Upon successful resume, it appears quite a lot of pages were pushed to
swap. It looks like about 1GiB was paged out.

Before hibernation:
$ cat /proc/meminfo
MemTotal:        2985944 kB
MemFree:          148376 kB
MemAvailable:     220428 kB
Buffers:             172 kB
Cached:           366100 kB
SwapCached:         4632 kB
Active:          1962088 kB
Inactive:         592576 kB
Active(anon):    1842560 kB
Inactive(anon):   467904 kB
Active(file):     119528 kB
Inactive(file):   124672 kB
Unevictable:        1628 kB
Mlocked:            1628 kB
SwapTotal:       3117052 kB
SwapFree:        2899952 kB
Dirty:              6248 kB
Writeback:             0 kB
AnonPages:       2187236 kB
Mapped:           245800 kB
Shmem:            120504 kB
KReclaimable:      58016 kB
Slab:             203260 kB
SReclaimable:      58016 kB
SUnreclaim:       145244 kB
KernelStack:       13712 kB
PageTables:        23364 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4610024 kB
Committed_AS:    6019396 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       27528 kB
VmallocChunk:          0 kB
Percpu:             4016 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      238332 kB
DirectMap2M:     2904064 kB


After resume:
[chris@vm ~]$ cat /proc/meminfo
MemTotal:        2985944 kB
MemFree:         1007132 kB
MemAvailable:    1069576 kB
Buffers:              76 kB
Cached:           400464 kB
SwapCached:       296112 kB
Active:           755856 kB
Inactive:         955624 kB
Active(anon):     731668 kB
Inactive(anon):   683352 kB
Active(file):      24188 kB
Inactive(file):   272272 kB
Unevictable:        1632 kB
Mlocked:            1632 kB
SwapTotal:       3117052 kB
SwapFree:        1874788 kB
Dirty:              2716 kB
Writeback:             0 kB
AnonPages:       1182108 kB
Mapped:           225352 kB
Shmem:            102480 kB
KReclaimable:      48968 kB
Slab:             183104 kB
SReclaimable:      48968 kB
SUnreclaim:       134136 kB
KernelStack:       14000 kB
PageTables:        22924 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4610024 kB
Committed_AS:    5937732 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       27800 kB
VmallocChunk:          0 kB
Percpu:             4016 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      238332 kB
DirectMap2M:     2904064 kB
$

There must be some other cause for the 50% limitation. Is it possible
it only starts once there's a certain amount of RAM present? e.g.
maybe it can only page out 4GiB of Anon pages to swap? And after that
point if at least 50% RAM isn't available, hibernation image creation
fails?


-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-20  2:54   ` Chris Murphy
@ 2020-02-20  2:56     ` Chris Murphy
       [not found]       ` <CAA25o9T2wwqoopoNRySdZoYkD+vtqRPsB1YPnag=TkOp5D9sYA@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2020-02-20  2:56 UTC (permalink / raw)
  To: Linux Memory Management List; +Cc: Luigi Semenzato

Also, is this the correct list for hibernation/swap discussion? Or linux-pm@?

Thanks,

Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <CAA25o9T2wwqoopoNRySdZoYkD+vtqRPsB1YPnag=TkOp5D9sYA@mail.gmail.com>]

* Re: is hibernation usable?
       [not found]       ` <CAA25o9T2wwqoopoNRySdZoYkD+vtqRPsB1YPnag=TkOp5D9sYA@mail.gmail.com>
@ 2020-02-20 17:38         ` Luigi Semenzato
  2020-02-21  8:49           ` Michal Hocko
       [not found]         ` <CAJCQCtScZg1CP2WTDoOy4-urPbvP_5Hw0H-AKTwHugN9YhdxLg@mail.gmail.com>
  1 sibling, 1 reply; 13+ messages in thread
From: Luigi Semenzato @ 2020-02-20 17:38 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM

I was forgetting: forcing swap by eating up memory is dangerous
because it can lead to unexpected OOM kills, but you can mitigate that
by giving the memory-eaters a higher OOM kill score.  Still, some way
of calling try_to_free_pages() directly from user-level would be
preferable.  I wonder if such API has been discussed.


On Thu, Feb 20, 2020 at 9:16 AM Luigi Semenzato <semenzato@google.com> wrote:
>
> I think this is the right group for the memory issues.
>
> I suspect that the problem with failed allocations (ENOMEM) boils down
> to the unreliability of the page allocator.  In my experience, under
> pressure (i.e. pages must be swapped out to be reclaimed) allocations
> can fail even when in theory they should succeed.  (I wish I were
> wrong and that someone would convincingly correct me.)
>
> I have a workaround in which I use memcgroups to free pages before
> starting hibernation.  The cgroup request "echo $limit >
> .../memory.limit_in_bytes"  blocks until memory usage in the chosen
> cgroup is below $limit.  However, I have seen this request fail even
> when there is extra available swap space.
>
> The callback for the operation is mem_cgroup_resize_limit() (BTW I am
> looking at kernel version 4.3.5) and that code has a loop where
> try_to_free_pages() is called up to retry_count, which is at least 5.
> Why 5?  One suspects that the writer of that code must have also
> realized that the page freeing request is unreliable and it's worth
> trying multiple times.
>
> So you could try something similar.  I don't know if there are
> interfaces to try_to_free_pages() other than those in cgroups.  If
> not, and you aren't using cgroups, one way might be to start several
> memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 |
> sleep infinity") and monitor allocation, then when they use more than
> 50% of RAM kill them and immediately hibernate before the freed pages
> are reused.  If you can build your custom kernel, maybe it's worth
> adding a sysfs entry to invoke try_to_free_pages().  You could also
> change the hibernation code to do that, but having the user-level hook
> may be more flexible.
>
>
> On Wed, Feb 19, 2020 at 6:56 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > Also, is this the correct list for hibernation/swap discussion? Or linux-pm@?
> >
> > Thanks,
> >
> > Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-20 17:38         ` Luigi Semenzato
@ 2020-02-21  8:49           ` Michal Hocko
  2020-02-21  9:04             ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2020-02-21  8:49 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> I was forgetting: forcing swap by eating up memory is dangerous
> because it can lead to unexpected OOM kills

Could you be more specific what you have in mind? swapoff causing the
OOM killer?

> , but you can mitigate that
> by giving the memory-eaters a higher OOM kill score.  Still, some way
> of calling try_to_free_pages() directly from user-level would be
> preferable.  I wonder if such API has been discussed.

No, there is no API to trigger the global memory reclaim. You could
start the reclaim by increasing min_free_kbytes but I wouldn't really
recommend that unless you know exactly what you are doing and also I
fail to see the point. If s2disk fails due to insufficient swap space
then how can a pro-active reclaim help in the first place?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-21  8:49           ` Michal Hocko
@ 2020-02-21  9:04             ` Rafael J. Wysocki
  2020-02-21  9:36               ` Michal Hocko
  2020-02-21  9:46               ` Chris Murphy
  0 siblings, 2 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2020-02-21  9:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > I was forgetting: forcing swap by eating up memory is dangerous
> > because it can lead to unexpected OOM kills
>
> Could you be more specific what you have in mind? swapoff causing the
> OOM killer?
>
> > , but you can mitigate that
> > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > of calling try_to_free_pages() directly from user-level would be
> > preferable.  I wonder if such API has been discussed.
>
> No, there is no API to trigger the global memory reclaim. You could
> start the reclaim by increasing min_free_kbytes but I wouldn't really
> recommend that unless you know exactly what you are doing and also I
> fail to see the point. If s2disk fails due to insufficient swap space
> then how can a pro-active reclaim help in the first place?

My understanding of the problem is that the size of swap is
(theoretically) sufficient, but it is not used as expected during the
preallocation of image memory.

It was stated in one of the previous messages (not in this thread,
cannot find it now) that swap (of the same size as RAM) was activated
(swapon) right before hibernation, so theoretically that should be
sufficient AFAICS.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-21  9:04             ` Rafael J. Wysocki
@ 2020-02-21  9:36               ` Michal Hocko
  2020-02-21 17:13                 ` Luigi Semenzato
  2020-02-21  9:46               ` Chris Murphy
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2020-02-21  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM

On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote:
> On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > > I was forgetting: forcing swap by eating up memory is dangerous
> > > because it can lead to unexpected OOM kills
> >
> > Could you be more specific what you have in mind? swapoff causing the
> > OOM killer?
> >
> > > , but you can mitigate that
> > > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > > of calling try_to_free_pages() directly from user-level would be
> > > preferable.  I wonder if such API has been discussed.
> >
> > No, there is no API to trigger the global memory reclaim. You could
> > start the reclaim by increasing min_free_kbytes but I wouldn't really
> > recommend that unless you know exactly what you are doing and also I
> > fail to see the point. If s2disk fails due to insufficient swap space
> > then how can a pro-active reclaim help in the first place?
> 
> My understanding of the problem is that the size of swap is
> (theoretically) sufficient, but it is not used as expected during the
> preallocation of image memory.
> 
> It was stated in one of the previous messages (not in this thread,
> cannot find it now) that swap (of the same size as RAM) was activated
> (swapon) right before hibernation, so theoretically that should be
> sufficient AFAICS.

Hmm, this is interesting. Let me have a closer look...

pm_restrict_gfp_mask which would completely rule out any IO
happens after hibernate_preallocate_memory is done and my limited
understanding tells me that this is where all the reclaim happens
(via shrink_all_memory). It is quite possible that the MM decides to
not swap in that path - depending on the memory usage - and miss it's
target. More details would be needed. E.g. vmscan tracepoints could tell
us more.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-21  9:36               ` Michal Hocko
@ 2020-02-21 17:13                 ` Luigi Semenzato
  0 siblings, 0 replies; 13+ messages in thread
From: Luigi Semenzato @ 2020-02-21 17:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rafael J. Wysocki, Chris Murphy, Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 1:36 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote:
> > On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > > > I was forgetting: forcing swap by eating up memory is dangerous
> > > > because it can lead to unexpected OOM kills
> > >
> > > Could you be more specific what you have in mind? swapoff causing the
> > > OOM killer?

No, not swapoff, just fast allocation.

Also, in some earlier experiments I tried gradually increasing
min_free_kbytes (precisely as suggested) and this would randomly
trigger OOM kills when swap space was still available.

> > > > , but you can mitigate that
> > > > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > > > of calling try_to_free_pages() directly from user-level would be
> > > > preferable.  I wonder if such API has been discussed.
> > >
> > > No, there is no API to trigger the global memory reclaim. You could
> > > start the reclaim by increasing min_free_kbytes but I wouldn't really
> > > recommend that unless you know exactly what you are doing and also I
> > > fail to see the point. If s2disk fails due to insufficient swap space
> > > then how can a pro-active reclaim help in the first place?
> >
> > My understanding of the problem is that the size of swap is
> > (theoretically) sufficient, but it is not used as expected during the
> > preallocation of image memory.
> >
> > It was stated in one of the previous messages (not in this thread,
> > cannot find it now) that swap (of the same size as RAM) was activated
> > (swapon) right before hibernation, so theoretically that should be
> > sufficient AFAICS.

Correct, those were my experiments.  Search the archives for
"semenzato", there are a couple of threads on the topic.

But really, why not have a user-level interface for reclaim?  I find
it very difficult to understand the behavior of the reclaim code, and
any attempt to reclaim from user level (memory-eating processes,
raising min_free_kbytes) can end in the OOM-kill path.  Using cgroups'
memory.limit_in_bytes doesn't have this problem, precisely because it
only calls try_to_free_pages(), which doesn't trigger OOM killing.  If
I could make that call from user level (without cgroups) it would
greatly simplify my current workaround, and would be useful in other
situations as well.

Something like

  echo $page_count > /proc/sys/vm/try_to_free_pages
  cat /proc/sys/vm/pages_freed   # the number of pages freed at the
latest request

> Hmm, this is interesting. Let me have a closer look...
>
> pm_restrict_gfp_mask which would completely rule out any IO
> happens after hibernate_preallocate_memory is done and my limited
> understanding tells me that this is where all the reclaim happens
> (via shrink_all_memory). It is quite possible that the MM decides to
> not swap in that path - depending on the memory usage - and miss it's
> target. More details would be needed. E.g. vmscan tracepoints could tell
> us more.
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-21  9:04             ` Rafael J. Wysocki
  2020-02-21  9:36               ` Michal Hocko
@ 2020-02-21  9:46               ` Chris Murphy
  1 sibling, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2020-02-21  9:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Michal Hocko, Luigi Semenzato, Chris Murphy,
	Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 2:04 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> My understanding of the problem is that the size of swap is
> (theoretically) sufficient, but it is not used as expected during the
> preallocation of image memory.

Right. I have no idea how locality of pages is determined in the swap
device. But if it's sufficiently fragmented such that contiguous free
space for a hibernation image is not sufficient, then hibernation
could fail.

> It was stated in one of the previous messages (not in this thread,
> cannot find it now) that swap (of the same size as RAM) was activated
> (swapon) right before hibernation, so theoretically that should be
> sufficient AFAICS.

I mentioned it as an idea floated by systemd developers. I'm not sure
if it's mentioned elsewhere. Some folks wonder if such functionality
could be prone to racing.
https://lore.kernel.org/linux-mm/CAJCQCtSx0FOX7q0p=9XgDLJ6O0+hF_vc-wU4KL=c9xoSGGkstA@mail.gmail.com/T/#m4d47d127da493f998b232d42d81621335358aee1

Another idea that's been suggested for a while is formally separating
hibernation and paging into separate files (or partitions).
a. Guarantees hibernation image has the necessary contiguous free space.
b. Might be easier to create (or even obviate) a sane interface for
hibernation images in swapfiles; that is, if it were a dedicated
hibernationfile rather than being inserted in a swapfile. Right now
that interface doesn't exist, so e.g. on Btrfs while it can support
swapfiles and hibernation images, the offset has to be figured out
manually so resume can succeed.
https://github.com/systemd/systemd/issues/11939#issuecomment-471684411

--
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <CAJCQCtScZg1CP2WTDoOy4-urPbvP_5Hw0H-AKTwHugN9YhdxLg@mail.gmail.com>]

* Re: is hibernation usable?
       [not found]         ` <CAJCQCtScZg1CP2WTDoOy4-urPbvP_5Hw0H-AKTwHugN9YhdxLg@mail.gmail.com>
@ 2020-02-20 19:44           ` Luigi Semenzato
  2020-02-20 21:48             ` Chris Murphy
  2020-02-27  6:43             ` Chris Murphy
  0 siblings, 2 replies; 13+ messages in thread
From: Luigi Semenzato @ 2020-02-20 19:44 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> >
> > I think this is the right group for the memory issues.
> >
> > I suspect that the problem with failed allocations (ENOMEM) boils down
> > to the unreliability of the page allocator.  In my experience, under
> > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > can fail even when in theory they should succeed.  (I wish I were
> > wrong and that someone would convincingly correct me.)
>
> What is vm.swappiness set to on your system? A fellow Fedora
> contributor who has consistently reproduced what you describe, has
> discovered he has vm.swappiness=0, and even if it's set to 1, the
> problem no longer happens. And this is not a documented consequence of
> using a value of 0.

I am using the default value of 60.

A zero value should cause all file pages to be discarded before any
anonymous pages are swapped.  I wonder if the fellow Fedora
contributor's workload has a lot of file pages, so that discarding
them is enough for the image allocator to succeed. In that case "sync;
echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
the same result.  (By the way, in my experiments I do that just before
hibernating.)

> > I have a workaround in which I use memcgroups to free pages before
> > starting hibernation.  The cgroup request "echo $limit >
> > .../memory.limit_in_bytes"  blocks until memory usage in the chosen
> > cgroup is below $limit.  However, I have seen this request fail even
> > when there is extra available swap space.
> >
> > The callback for the operation is mem_cgroup_resize_limit() (BTW I am
> > looking at kernel version 4.3.5) and that code has a loop where
> > try_to_free_pages() is called up to retry_count, which is at least 5.
> > Why 5?  One suspects that the writer of that code must have also
> > realized that the page freeing request is unreliable and it's worth
> > trying multiple times.
> >
> > So you could try something similar.  I don't know if there are
> > interfaces to try_to_free_pages() other than those in cgroups.  If
> > not, and you aren't using cgroups, one way might be to start several
> > memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 |
> > sleep infinity") and monitor allocation, then when they use more than
> > 50% of RAM kill them and immediately hibernate before the freed pages
> > are reused.  If you can build your custom kernel, maybe it's worth
> > adding a sysfs entry to invoke try_to_free_pages().  You could also
> > change the hibernation code to do that, but having the user-level hook
> > may be more flexible.
>
> Fedora 31+ now uses cgroupsv2. In any case, my use case is making sure
> this works correctly, sanely, with mainline kernels because Fedora
> doesn't do custom things with the kernel.
>
>
>
> --
> Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-20 19:44           ` Luigi Semenzato
@ 2020-02-20 21:48             ` Chris Murphy
  2020-02-27  6:43             ` Chris Murphy
  1 sibling, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2020-02-20 21:48 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > I think this is the right group for the memory issues.
> > >
> > > I suspect that the problem with failed allocations (ENOMEM) boils down
> > > to the unreliability of the page allocator.  In my experience, under
> > > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > > can fail even when in theory they should succeed.  (I wish I were
> > > wrong and that someone would convincingly correct me.)
> >
> > What is vm.swappiness set to on your system? A fellow Fedora
> > contributor who has consistently reproduced what you describe, has
> > discovered he has vm.swappiness=0, and even if it's set to 1, the
> > problem no longer happens. And this is not a documented consequence of
> > using a value of 0.
>
> I am using the default value of 60.
>
> A zero value should cause all file pages to be discarded before any
> anonymous pages are swapped.  I wonder if the fellow Fedora
> contributor's workload has a lot of file pages, so that discarding
> them is enough for the image allocator to succeed. In that case "sync;
> echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
> the same result.  (By the way, in my experiments I do that just before
> hibernating.)

Unfortunately I can't reproduce graceful failure you describe, myself.
I either get successful hibernation/resume or some kind of
non-deterministic and fatal failure to enter hibernation - and any
dmesg/journal that might contain evidence of the failure is lost. I've
had better success with qemu-kvm testing, but even in that case I see
about 1/4 of the time (with a ridiculously small sample size) failure
to complete hibernation entry. I can't tell if the failure happens
during page out, hibernation image creation, or hibernation image
write out - but the result is a black screen (virt-manager console)
and the VM never shutsdown or reboots, it just hangs and spins ~400%
CPU (even though it's only assigned 3 CPUs).

It's sufficiently unreliable that I can't really consider it supported
or supportable.

Microsoft and Apple have put more emphasis lately on S0 low power
idle, faster booting, and application state saving. The behavior in
Windows 10 with hiberfil.sys is a limited environment, essentially
that of the login window (no user environment state is saved in it),
and is used both for resuming from S4, as well as fast boot. A
separate file pagefile.sys is used for paging, so there's never a
conflict where a use case that depends on significant page out can
prevent hibernation from succeeding. It's also Secure Boot compatible.
Where on linux with x86_64 it isn't.

Between kernel and ACPI and firmware bugs, it's going to take a lot
more effort to make it reliable and trustworthy for the general case.
Or it should just be abandoned, it seems to be mostly that way
already.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: is hibernation usable?
  2020-02-20 19:44           ` Luigi Semenzato
  2020-02-20 21:48             ` Chris Murphy
@ 2020-02-27  6:43             ` Chris Murphy
  1 sibling, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2020-02-27  6:43 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > I think this is the right group for the memory issues.
> > >
> > > I suspect that the problem with failed allocations (ENOMEM) boils down
> > > to the unreliability of the page allocator.  In my experience, under
> > > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > > can fail even when in theory they should succeed.  (I wish I were
> > > wrong and that someone would convincingly correct me.)
> >
> > What is vm.swappiness set to on your system? A fellow Fedora
> > contributor who has consistently reproduced what you describe, has
> > discovered he has vm.swappiness=0, and even if it's set to 1, the
> > problem no longer happens. And this is not a documented consequence of
> > using a value of 0.
>
> I am using the default value of 60.
>
> A zero value should cause all file pages to be discarded before any
> anonymous pages are swapped.  I wonder if the fellow Fedora
> contributor's workload has a lot of file pages, so that discarding
> them is enough for the image allocator to succeed. In that case "sync;
> echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
> the same result.  (By the way, in my experiments I do that just before
> hibernating.)

He reports hibernation failure even if he drops caches beforehand.

https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.org/message/XYWYF33RFVISVZTPYSJRRXP7TFXPV4GD/

-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-02-27  6:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 19:50 is hibernation usable? Chris Murphy
2020-02-11 22:23 ` Luigi Semenzato
2020-02-20  2:54   ` Chris Murphy
2020-02-20  2:56     ` Chris Murphy
     [not found]       ` <CAA25o9T2wwqoopoNRySdZoYkD+vtqRPsB1YPnag=TkOp5D9sYA@mail.gmail.com>
2020-02-20 17:38         ` Luigi Semenzato
2020-02-21  8:49           ` Michal Hocko
2020-02-21  9:04             ` Rafael J. Wysocki
2020-02-21  9:36               ` Michal Hocko
2020-02-21 17:13                 ` Luigi Semenzato
2020-02-21  9:46               ` Chris Murphy
     [not found]         ` <CAJCQCtScZg1CP2WTDoOy4-urPbvP_5Hw0H-AKTwHugN9YhdxLg@mail.gmail.com>
2020-02-20 19:44           ` Luigi Semenzato
2020-02-20 21:48             ` Chris Murphy
2020-02-27  6:43             ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox