* [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM
@ 2025-02-01 2:15 Sourav Panda
2025-02-03 2:54 ` David Rientjes
2025-02-04 9:44 ` David Hildenbrand
0 siblings, 2 replies; 5+ messages in thread
From: Sourav Panda @ 2025-02-01 2:15 UTC (permalink / raw)
To: lsf-pc, Linux-MM, Pavel Tatashin, Yu Zhao, shr, David Rientjes
[-- Attachment #1: Type: text/plain, Size: 2656 bytes --]
Hi,
KSM is a powerful tool for deduplicating memory, reducing usage by merging
identical pages across processes. However, there are certain interface and
implementation aspect that prevents its deployment in our use case; wherein
security and efficiency (CPU overhead - due to background scanning) are of
greater importance.
We propose Selective KSM, a mechanism to control when the merging takes
place and what pages can be merged together. We do this by partitioning the
merge-space as per security-domains and carryout the merging as part of a
synchronous syscall. Doing so, we ensure sensitive-content is not merged
with non-sensitive content.
Our overall goal is to optimize the memory utilization in a virtualized
environment, wherein there exists significant duplications across guest
instances (e.g., kernel). With the better ability of the operator to group
pages
as per security and similarity, Selective KSM improves security and
efficiency.
Other than virtualized environments, we also want Selective KSM to work
well in containerized environments.
An example API could look like this ( Alternatively we can do it through
sysfs
without adding syscalls):
// This feature shall be gated by a KConfig: “CONFIG_SELECTIVE_KSM”
// Create a unique identifier known to userland.
char *ksm_name = “some_name”;
// ksm_open() creates and opens a new, or opens an existing, ksm partition
obj.
// flags is a bit mask to determine if the merging is sync, etc.
// KSM_SYNC: Carryout synchronous merging (no-background scanning).
// KSM_CREAT: Creates a KSM partition obj if it does not exist.
// KSM_EXCL: If KSM partition obj with name already exists and
// KSM_CREAT is also specified, return err.
// modes is used to handle permissions:
// O_RDONLY, O_WRONLY, O_RDWR, S_IRUSR, S_IWUSR, S_IXUSR
// On success, returns a file descriptor (a nonnegative integer) and
creates the
// sysfs path:
// /sys/kernel/mm/ksm/partition/<ksm_name>/
// On failure, it returns -1 and sets errno to indicate the error.
int ksm_fd = ksm_open(ksm_name, flag, mode);
// Destroy the name. The named object will be removed only after all open
// references are closed. On success, ksm_unlink() returns 0.
// On failure, it returns -1 and sets errno to indicate the error.
ksm_unlink(ksm_name);
// Trigger merge. Only valid if KSM_SYNC is set during ksm_open().
ksm_merge(ksm_fd, pid, addr, size);
// Trigger unmerge. Only valid if KSM_SYNC is set during ksm_open().
ksm_unmerge(ksm_fd, pid, addr, size);
With regards,
Sourav Panda
[-- Attachment #2: Type: text/html, Size: 19888 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM
2025-02-01 2:15 [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM Sourav Panda
@ 2025-02-03 2:54 ` David Rientjes
2025-02-03 7:20 ` Sourav Panda
2025-02-04 9:44 ` David Hildenbrand
1 sibling, 1 reply; 5+ messages in thread
From: David Rientjes @ 2025-02-03 2:54 UTC (permalink / raw)
To: Sourav Panda; +Cc: lsf-pc, Linux-MM, Pavel Tatashin, Yu Zhao, shr
[-- Attachment #1: Type: text/plain, Size: 3276 bytes --]
On Fri, 31 Jan 2025, Sourav Panda wrote:
> Hi,
>
> KSM is a powerful tool for deduplicating memory, reducing usage by merging
>
> identical pages across processes. However, there are certain interface and
>
> implementation aspect that prevents its deployment in our use case; wherein
>
> security and efficiency (CPU overhead - due to background scanning) are of
>
> greater importance.
>
> We propose Selective KSM, a mechanism to control when the merging takes
>
> place and what pages can be merged together. We do this by partitioning the
>
> merge-space as per security-domains and carryout the merging as part of a
>
> synchronous syscall. Doing so, we ensure sensitive-content is not merged
>
> with non-sensitive content.
>
Thanks for proposing this, Sourav, it sounds like a useful topic to
discuss.
Regarding the above, this looks like this is analogous to doing
synchronous MADV_COLLAPSE in process context and not relying on khugepaged
as the sole mechanism for doing that collapse? In your case, it's
userspace doing a merge in process context without relying on ksmd.
Is s/Selective/Userspace/ the way to think about it?
Does this require a fully cooperative guest for it to work properly?
> Our overall goal is to optimize the memory utilization in a virtualized
>
> environment, wherein there exists significant duplications across guest
>
> instances (e.g., kernel). With the better ability of the operator to group
> pages
>
> as per security and similarity, Selective KSM improves security and
> efficiency.
>
> Other than virtualized environments, we also want Selective KSM to work
>
> well in containerized environments.
>
> An example API could look like this ( Alternatively we can do it through
> sysfs
>
> without adding syscalls):
>
> // This feature shall be gated by a KConfig: “CONFIG_SELECTIVE_KSM”
>
> // Create a unique identifier known to userland.
>
> char *ksm_name = “some_name”;
>
> // ksm_open() creates and opens a new, or opens an existing, ksm partition
> obj.
>
> // flags is a bit mask to determine if the merging is sync, etc.
>
> // KSM_SYNC: Carryout synchronous merging (no-background scanning).
>
> // KSM_CREAT: Creates a KSM partition obj if it does not exist.
>
> // KSM_EXCL: If KSM partition obj with name already exists and
>
> // KSM_CREAT is also specified, return err.
>
> // modes is used to handle permissions:
>
> // O_RDONLY, O_WRONLY, O_RDWR, S_IRUSR, S_IWUSR, S_IXUSR
>
> // On success, returns a file descriptor (a nonnegative integer) and
> creates the
>
> // sysfs path:
>
> // /sys/kernel/mm/ksm/partition/<ksm_name>/
>
> // On failure, it returns -1 and sets errno to indicate the error.
>
> int ksm_fd = ksm_open(ksm_name, flag, mode);
>
> // Destroy the name. The named object will be removed only after all open
>
> // references are closed. On success, ksm_unlink() returns 0.
>
> // On failure, it returns -1 and sets errno to indicate the error.
>
> ksm_unlink(ksm_name);
>
> // Trigger merge. Only valid if KSM_SYNC is set during ksm_open().
>
> ksm_merge(ksm_fd, pid, addr, size);
>
> // Trigger unmerge. Only valid if KSM_SYNC is set during ksm_open().
>
> ksm_unmerge(ksm_fd, pid, addr, size);
>
> With regards,
>
> Sourav Panda
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM
2025-02-03 2:54 ` David Rientjes
@ 2025-02-03 7:20 ` Sourav Panda
0 siblings, 0 replies; 5+ messages in thread
From: Sourav Panda @ 2025-02-03 7:20 UTC (permalink / raw)
To: David Rientjes; +Cc: lsf-pc, Linux-MM, Pavel Tatashin, Yu Zhao, shr
[-- Attachment #1: Type: text/plain, Size: 3956 bytes --]
On Sun, Feb 2, 2025 at 6:54 PM David Rientjes <rientjes@google.com> wrote:
> On Fri, 31 Jan 2025, Sourav Panda wrote:
>
> > Hi,
> >
> > KSM is a powerful tool for deduplicating memory, reducing usage by
> merging
> >
> > identical pages across processes. However, there are certain interface
> and
> >
> > implementation aspect that prevents its deployment in our use case;
> wherein
> >
> > security and efficiency (CPU overhead - due to background scanning) are
> of
> >
> > greater importance.
> >
> > We propose Selective KSM, a mechanism to control when the merging takes
> >
> > place and what pages can be merged together. We do this by partitioning
> the
> >
> > merge-space as per security-domains and carryout the merging as part of a
> >
> > synchronous syscall. Doing so, we ensure sensitive-content is not merged
> >
> > with non-sensitive content.
> >
>
> Thanks for proposing this, Sourav, it sounds like a useful topic to
> discuss.
> Regarding the above, this looks like this is analogous to doing
> synchronous MADV_COLLAPSE in process context and not relying on khugepaged
> as the sole mechanism for doing that collapse? In your case, it's
> userspace doing a merge in process context without relying on ksmd.
>
> Is s/Selective/Userspace/ the way to think about it?
>
Yes, this is a good analogy.
>
> Does this require a fully cooperative guest for it to work properly?
>
A guest VM would have to be fully cooperative to achieve this (as per the
current proposal). Furthermore, later on we can think of implementing an
advisor (e.g., like how we have KSM advisors today for adapting some
parameters) for optimization sake.
>
> > Our overall goal is to optimize the memory utilization in a virtualized
> >
> > environment, wherein there exists significant duplications across guest
> >
> > instances (e.g., kernel). With the better ability of the operator to
> group
> > pages
> >
> > as per security and similarity, Selective KSM improves security and
> > efficiency.
> >
> > Other than virtualized environments, we also want Selective KSM to work
> >
> > well in containerized environments.
> >
> > An example API could look like this ( Alternatively we can do it through
> > sysfs
> >
> > without adding syscalls):
> >
> > // This feature shall be gated by a KConfig: “CONFIG_SELECTIVE_KSM”
> >
> > // Create a unique identifier known to userland.
> >
> > char *ksm_name = “some_name”;
> >
> > // ksm_open() creates and opens a new, or opens an existing, ksm
> partition
> > obj.
> >
> > // flags is a bit mask to determine if the merging is sync, etc.
> >
> > // KSM_SYNC: Carryout synchronous merging (no-background scanning).
> >
> > // KSM_CREAT: Creates a KSM partition obj if it does not exist.
> >
> > // KSM_EXCL: If KSM partition obj with name already exists and
> >
> > // KSM_CREAT is also specified, return err.
> >
> > // modes is used to handle permissions:
> >
> > // O_RDONLY, O_WRONLY, O_RDWR, S_IRUSR, S_IWUSR, S_IXUSR
> >
> > // On success, returns a file descriptor (a nonnegative integer) and
> > creates the
> >
> > // sysfs path:
> >
> > // /sys/kernel/mm/ksm/partition/<ksm_name>/
> >
> > // On failure, it returns -1 and sets errno to indicate the error.
> >
> > int ksm_fd = ksm_open(ksm_name, flag, mode);
> >
> > // Destroy the name. The named object will be removed only after all open
> >
> > // references are closed. On success, ksm_unlink() returns 0.
> >
> > // On failure, it returns -1 and sets errno to indicate the error.
> >
> > ksm_unlink(ksm_name);
> >
> > // Trigger merge. Only valid if KSM_SYNC is set during ksm_open().
> >
> > ksm_merge(ksm_fd, pid, addr, size);
> >
> > // Trigger unmerge. Only valid if KSM_SYNC is set during ksm_open().
> >
> > ksm_unmerge(ksm_fd, pid, addr, size);
> >
> > With regards,
> >
> > Sourav Panda
> >
[-- Attachment #2: Type: text/html, Size: 5284 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM
2025-02-01 2:15 [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM Sourav Panda
2025-02-03 2:54 ` David Rientjes
@ 2025-02-04 9:44 ` David Hildenbrand
2025-02-04 18:21 ` Sourav Panda
1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2025-02-04 9:44 UTC (permalink / raw)
To: Sourav Panda, lsf-pc, Linux-MM, Pavel Tatashin, Yu Zhao, shr,
David Rientjes
On 01.02.25 03:15, Sourav Panda wrote:
> Hi,
Hi,
>
>
> KSM is a powerful tool for deduplicating memory, reducing usage by merging
>
> identical pages across processes. However, there are certain interface and
>
> implementation aspect that prevents its deployment in our use case; wherein
>
> security and efficiency (CPU overhead - due to background scanning) are of
>
> greater importance.
>
>
> We propose Selective KSM, a mechanism to control when the merging takes
>
> place and what pages can be merged together. We do this by partitioning the
>
> merge-space as per security-domains and carryout the merging as part of a
>
> synchronous syscall. Doing so, we ensure sensitive-content is not merged
>
> with non-sensitive content.
I'll note that there was an RFC for uKSM [1] last year. Unfortunately, I
didn't have time to look into it in more detail, and there was never any
push for it.
In particular, it proposed an interface:
- /proc/uksm/merge enables the merging of two pages given their process
IDs and addresses.
- /proc/uksm/unmerge allows unmerging a previously merged KSM page.
- /proc/uksm/cmp provides a lightweight mechanism to check page content
equivalence before invoking a merge operation.
[1]
https://lore.kernel.org/linux-mm/20240329104035.62942-1-teawater@antgroup.com/T/
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM
2025-02-04 9:44 ` David Hildenbrand
@ 2025-02-04 18:21 ` Sourav Panda
0 siblings, 0 replies; 5+ messages in thread
From: Sourav Panda @ 2025-02-04 18:21 UTC (permalink / raw)
To: David Hildenbrand
Cc: lsf-pc, Linux-MM, Pavel Tatashin, Yu Zhao, shr, David Rientjes
[-- Attachment #1: Type: text/plain, Size: 1993 bytes --]
On Tue, Feb 4, 2025 at 1:44 AM David Hildenbrand <david@redhat.com> wrote:
> On 01.02.25 03:15, Sourav Panda wrote:
> > Hi,
>
> Hi,
>
> >
> >
> > KSM is a powerful tool for deduplicating memory, reducing usage by
> merging
> >
> > identical pages across processes. However, there are certain interface
> and
> >
> > implementation aspect that prevents its deployment in our use case;
> wherein
> >
> > security and efficiency (CPU overhead - due to background scanning) are
> of
> >
> > greater importance.
> >
> >
> > We propose Selective KSM, a mechanism to control when the merging takes
> >
> > place and what pages can be merged together. We do this by partitioning
> the
> >
> > merge-space as per security-domains and carryout the merging as part of a
> >
> > synchronous syscall. Doing so, we ensure sensitive-content is not merged
> >
> > with non-sensitive content.
>
> I'll note that there was an RFC for uKSM [1] last year. Unfortunately, I
> didn't have time to look into it in more detail, and there was never any
> push for it.
>
Thank you David. I took a look at it, one major callout would be it is
extremely fine grained wherein you specify the exact 2 pages you want to
have merged. I prefer triggering a merge at a coarser granularity wherein
you just specify the address range you want merged. Furthermore, are not
required to specify what to merge against in the same invocation (e.g.,
insert / search the unstable tree).
>
> In particular, it proposed an interface:
>
> - /proc/uksm/merge enables the merging of two pages given their process
> IDs and addresses.
> - /proc/uksm/unmerge allows unmerging a previously merged KSM page.
> - /proc/uksm/cmp provides a lightweight mechanism to check page content
> equivalence before invoking a merge operation.
>
> [1]
>
> https://lore.kernel.org/linux-mm/20240329104035.62942-1-teawater@antgroup.com/T/
>
> --
> Cheers,
>
> David / dhildenb
>
>
[-- Attachment #2: Type: text/html, Size: 2784 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-04 18:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-01 2:15 [LSF/MM/BPF TOPIC] KSM Enhancements: Selective KSM Sourav Panda
2025-02-03 2:54 ` David Rientjes
2025-02-03 7:20 ` Sourav Panda
2025-02-04 9:44 ` David Hildenbrand
2025-02-04 18:21 ` Sourav Panda
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox