32bit architectures and __HAVE_ARCH_PTE_SWP_EXCLUSIVE

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>
Subject: 32bit architectures and __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Date: Tue, 22 Nov 2022 15:05:24 +0100	[thread overview]
Message-ID: <ceb85a8b-d6e8-830f-eddb-69ae1531e10e@redhat.com> (raw)

Hi all,

Spoiler: is there a real use case for > 16 GiB of swap in a single file 
on 32bit architectures?

I'm currently looking into implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE 
support for all remaining architectures. So far, I only implemented it 
for the most relevant enterprise architectures.

With __HAVE_ARCH_PTE_SWP_EXCLUSIVE, we remember when unmapping a page 
and replacing the present PTE by a swap PTE for swapout whether the 
anonymous page that was mapped was exclusive (PageAnonExclusive(), i.e., 
not COW-shared). When refaulting that page, whereby we replace the swap 
PTE by a present PTE, we can reuse that information to map that page 
writable and avoid unnecessary page copies due to COW, even if there are 
still unexpected references on the page.

While this would usually be a pure optimization, currently O_DIRECT 
still (wrongly) uses FOLL_GET instead of FOLL_PIN and can trigger in 
corner cases memory corruptions. So for that case, it is also a 
temporary fix until O_DIRECT properly uses FOLL_PIN. More details can be 
found in [1].

Ideally, I'd just implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all 
architectures. However, __HAVE_ARCH_PTE_SWP_EXCLUSIVE requires an 
additional bit in the swap PTE. While mostly unproblematic on 64bit, for 
32bit this implies that we'll have to "steal" one bit from the swap 
offset on most architectures, reducing the maximum swap size per file.

Assuming we previously supported 32 GiB per swap file (e.g., hexagon, 
csky), this number would get reduced to 16 GiB. The kernel would 
automatically truncate the oversized swap area and the system would 
continue working by using less space of that swapfile, but ... well, is 
there a but?

Usually (well, there is PAE on x86 ...), a 32bit system can address 4 
GiB of memory. Maximum swap size recommendation seem to be around 2--3 
times the memory size (2x without hibernation, 3x with hibernation). So 
it sounds like there is barely a use case for more swap space. Of course 
one can use multiple swap files.

So, is anybody aware of excessive swap space requirements on 32bit?

Note that I thought about storing the exclusive marker in the swap_map 
instead of in the swap PTE, but quickly decided to discard that idea 
because it results in significantly more complexity and the swap code is 
already horrible enough.

[1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com

-- 
Thanks,

David / dhildenb

                 reply	other threads:[~2022-11-22 14:08 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ceb85a8b-d6e8-830f-eddb-69ae1531e10e@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox