linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/1] psi: remove 500ms min window size limitation for triggers
@ 2023-03-03  1:13 Suren Baghdasaryan
  2023-03-03  1:16 ` Suren Baghdasaryan
  0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-03-03  1:13 UTC (permalink / raw)
  To: tj
  Cc: hannes, lizefan.x, peterz, johunt, mhocko, keescook,
	quic_sudaraja, cgroups, linux-mm, linux-kernel, surenb

Current 500ms min window size for psi triggers limits polling interval
to 50ms to prevent polling threads from using too much cpu bandwidth by
polling too frequently. However the number of cgroups with triggers is
unlimited, so this protection can be defeated by creating multiple
cgroups with psi triggers (triggers in each cgroup are served by a single
"psimon" kernel thread).
Instead of limiting min polling period, which also limits the latency of
psi events, it's better to limit psi trigger creation to authorized users
only, like we do for system-wide psi triggers (/proc/pressure/* files can
be written only by processes with CAP_SYS_RESOURCE capability). This also
makes access rules for cgroup psi files consistent with system-wide ones.
Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
remove the psi window min size limitation.

Suggested-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@quicinc.com/
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 kernel/cgroup/cgroup.c | 10 ++++++++++
 kernel/sched/psi.c     |  4 +---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 935e8121b21e..b600a6baaeca 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3867,6 +3867,12 @@ static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of,
 	return psi_trigger_poll(&ctx->psi.trigger, of->file, pt);
 }
 
+static int cgroup_pressure_open(struct kernfs_open_file *of)
+{
+	return (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) ?
+		-EPERM : 0;
+}
+
 static void cgroup_pressure_release(struct kernfs_open_file *of)
 {
 	struct cgroup_file_ctx *ctx = of->priv;
@@ -5266,6 +5272,7 @@ static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "io.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_IO]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_io_pressure_show,
 		.write = cgroup_io_pressure_write,
 		.poll = cgroup_pressure_poll,
@@ -5274,6 +5281,7 @@ static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "memory.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_MEM]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_memory_pressure_show,
 		.write = cgroup_memory_pressure_write,
 		.poll = cgroup_pressure_poll,
@@ -5282,6 +5290,7 @@ static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "cpu.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_CPU]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_cpu_pressure_show,
 		.write = cgroup_cpu_pressure_write,
 		.poll = cgroup_pressure_poll,
@@ -5291,6 +5300,7 @@ static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "irq.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_IRQ]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_irq_pressure_show,
 		.write = cgroup_irq_pressure_write,
 		.poll = cgroup_pressure_poll,
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 02e011cabe91..0945f956bf80 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -160,7 +160,6 @@ __setup("psi=", setup_psi);
 #define EXP_300s	2034		/* 1/exp(2s/300s) */
 
 /* PSI trigger definitions */
-#define WINDOW_MIN_US 500000	/* Min window size is 500ms */
 #define WINDOW_MAX_US 10000000	/* Max window size is 10s */
 #define UPDATES_PER_WINDOW 10	/* 10 updates per window */
 
@@ -1278,8 +1277,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
 	if (state >= PSI_NONIDLE)
 		return ERR_PTR(-EINVAL);
 
-	if (window_us < WINDOW_MIN_US ||
-		window_us > WINDOW_MAX_US)
+	if (window_us == 0 || window_us > WINDOW_MAX_US)
 		return ERR_PTR(-EINVAL);
 
 	/* Check threshold */
-- 
2.40.0.rc0.216.gc4246ad0f0-goog



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] psi: remove 500ms min window size limitation for triggers
  2023-03-03  1:13 [PATCH v2 1/1] psi: remove 500ms min window size limitation for triggers Suren Baghdasaryan
@ 2023-03-03  1:16 ` Suren Baghdasaryan
  2023-05-02 17:20   ` Suren Baghdasaryan
  0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-03-03  1:16 UTC (permalink / raw)
  To: peterz
  Cc: tj, hannes, lizefan.x, johunt, mhocko, keescook, quic_sudaraja,
	cgroups, linux-mm, linux-kernel

On Thu, Mar 2, 2023 at 5:13 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Current 500ms min window size for psi triggers limits polling interval
> to 50ms to prevent polling threads from using too much cpu bandwidth by
> polling too frequently. However the number of cgroups with triggers is
> unlimited, so this protection can be defeated by creating multiple
> cgroups with psi triggers (triggers in each cgroup are served by a single
> "psimon" kernel thread).
> Instead of limiting min polling period, which also limits the latency of
> psi events, it's better to limit psi trigger creation to authorized users
> only, like we do for system-wide psi triggers (/proc/pressure/* files can
> be written only by processes with CAP_SYS_RESOURCE capability). This also
> makes access rules for cgroup psi files consistent with system-wide ones.
> Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
> remove the psi window min size limitation.
>
> Suggested-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
> Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@quicinc.com/
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Forgot to change the --to field from Tejun to PeterZ.
Peter, just to clarify, this change is targeted for inclusion in your tree.
Thanks!

> ---
>  kernel/cgroup/cgroup.c | 10 ++++++++++
>  kernel/sched/psi.c     |  4 +---
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 935e8121b21e..b600a6baaeca 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -3867,6 +3867,12 @@ static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of,
>         return psi_trigger_poll(&ctx->psi.trigger, of->file, pt);
>  }
>
> +static int cgroup_pressure_open(struct kernfs_open_file *of)
> +{
> +       return (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) ?
> +               -EPERM : 0;
> +}
> +
>  static void cgroup_pressure_release(struct kernfs_open_file *of)
>  {
>         struct cgroup_file_ctx *ctx = of->priv;
> @@ -5266,6 +5272,7 @@ static struct cftype cgroup_psi_files[] = {
>         {
>                 .name = "io.pressure",
>                 .file_offset = offsetof(struct cgroup, psi_files[PSI_IO]),
> +               .open = cgroup_pressure_open,
>                 .seq_show = cgroup_io_pressure_show,
>                 .write = cgroup_io_pressure_write,
>                 .poll = cgroup_pressure_poll,
> @@ -5274,6 +5281,7 @@ static struct cftype cgroup_psi_files[] = {
>         {
>                 .name = "memory.pressure",
>                 .file_offset = offsetof(struct cgroup, psi_files[PSI_MEM]),
> +               .open = cgroup_pressure_open,
>                 .seq_show = cgroup_memory_pressure_show,
>                 .write = cgroup_memory_pressure_write,
>                 .poll = cgroup_pressure_poll,
> @@ -5282,6 +5290,7 @@ static struct cftype cgroup_psi_files[] = {
>         {
>                 .name = "cpu.pressure",
>                 .file_offset = offsetof(struct cgroup, psi_files[PSI_CPU]),
> +               .open = cgroup_pressure_open,
>                 .seq_show = cgroup_cpu_pressure_show,
>                 .write = cgroup_cpu_pressure_write,
>                 .poll = cgroup_pressure_poll,
> @@ -5291,6 +5300,7 @@ static struct cftype cgroup_psi_files[] = {
>         {
>                 .name = "irq.pressure",
>                 .file_offset = offsetof(struct cgroup, psi_files[PSI_IRQ]),
> +               .open = cgroup_pressure_open,
>                 .seq_show = cgroup_irq_pressure_show,
>                 .write = cgroup_irq_pressure_write,
>                 .poll = cgroup_pressure_poll,
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 02e011cabe91..0945f956bf80 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -160,7 +160,6 @@ __setup("psi=", setup_psi);
>  #define EXP_300s       2034            /* 1/exp(2s/300s) */
>
>  /* PSI trigger definitions */
> -#define WINDOW_MIN_US 500000   /* Min window size is 500ms */
>  #define WINDOW_MAX_US 10000000 /* Max window size is 10s */
>  #define UPDATES_PER_WINDOW 10  /* 10 updates per window */
>
> @@ -1278,8 +1277,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
>         if (state >= PSI_NONIDLE)
>                 return ERR_PTR(-EINVAL);
>
> -       if (window_us < WINDOW_MIN_US ||
> -               window_us > WINDOW_MAX_US)
> +       if (window_us == 0 || window_us > WINDOW_MAX_US)
>                 return ERR_PTR(-EINVAL);
>
>         /* Check threshold */
> --
> 2.40.0.rc0.216.gc4246ad0f0-goog
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] psi: remove 500ms min window size limitation for triggers
  2023-03-03  1:16 ` Suren Baghdasaryan
@ 2023-05-02 17:20   ` Suren Baghdasaryan
  2023-05-02 17:24     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-05-02 17:20 UTC (permalink / raw)
  To: peterz
  Cc: tj, hannes, lizefan.x, johunt, mhocko, keescook, quic_sudaraja,
	cgroups, linux-mm, linux-kernel

On Thu, Mar 2, 2023 at 5:16 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Mar 2, 2023 at 5:13 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > Current 500ms min window size for psi triggers limits polling interval
> > to 50ms to prevent polling threads from using too much cpu bandwidth by
> > polling too frequently. However the number of cgroups with triggers is
> > unlimited, so this protection can be defeated by creating multiple
> > cgroups with psi triggers (triggers in each cgroup are served by a single
> > "psimon" kernel thread).
> > Instead of limiting min polling period, which also limits the latency of
> > psi events, it's better to limit psi trigger creation to authorized users
> > only, like we do for system-wide psi triggers (/proc/pressure/* files can
> > be written only by processes with CAP_SYS_RESOURCE capability). This also
> > makes access rules for cgroup psi files consistent with system-wide ones.
> > Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
> > remove the psi window min size limitation.
> >
> > Suggested-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
> > Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@quicinc.com/
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Forgot to change the --to field from Tejun to PeterZ.
> Peter, just to clarify, this change is targeted for inclusion in your tree.

I think this patch slipped through the cracks. Peter, could you please
take it into your tree?
Thanks,
Suren.

> Thanks!
>
> > ---
> >  kernel/cgroup/cgroup.c | 10 ++++++++++
> >  kernel/sched/psi.c     |  4 +---
> >  2 files changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index 935e8121b21e..b600a6baaeca 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -3867,6 +3867,12 @@ static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of,
> >         return psi_trigger_poll(&ctx->psi.trigger, of->file, pt);
> >  }
> >
> > +static int cgroup_pressure_open(struct kernfs_open_file *of)
> > +{
> > +       return (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) ?
> > +               -EPERM : 0;
> > +}
> > +
> >  static void cgroup_pressure_release(struct kernfs_open_file *of)
> >  {
> >         struct cgroup_file_ctx *ctx = of->priv;
> > @@ -5266,6 +5272,7 @@ static struct cftype cgroup_psi_files[] = {
> >         {
> >                 .name = "io.pressure",
> >                 .file_offset = offsetof(struct cgroup, psi_files[PSI_IO]),
> > +               .open = cgroup_pressure_open,
> >                 .seq_show = cgroup_io_pressure_show,
> >                 .write = cgroup_io_pressure_write,
> >                 .poll = cgroup_pressure_poll,
> > @@ -5274,6 +5281,7 @@ static struct cftype cgroup_psi_files[] = {
> >         {
> >                 .name = "memory.pressure",
> >                 .file_offset = offsetof(struct cgroup, psi_files[PSI_MEM]),
> > +               .open = cgroup_pressure_open,
> >                 .seq_show = cgroup_memory_pressure_show,
> >                 .write = cgroup_memory_pressure_write,
> >                 .poll = cgroup_pressure_poll,
> > @@ -5282,6 +5290,7 @@ static struct cftype cgroup_psi_files[] = {
> >         {
> >                 .name = "cpu.pressure",
> >                 .file_offset = offsetof(struct cgroup, psi_files[PSI_CPU]),
> > +               .open = cgroup_pressure_open,
> >                 .seq_show = cgroup_cpu_pressure_show,
> >                 .write = cgroup_cpu_pressure_write,
> >                 .poll = cgroup_pressure_poll,
> > @@ -5291,6 +5300,7 @@ static struct cftype cgroup_psi_files[] = {
> >         {
> >                 .name = "irq.pressure",
> >                 .file_offset = offsetof(struct cgroup, psi_files[PSI_IRQ]),
> > +               .open = cgroup_pressure_open,
> >                 .seq_show = cgroup_irq_pressure_show,
> >                 .write = cgroup_irq_pressure_write,
> >                 .poll = cgroup_pressure_poll,
> > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> > index 02e011cabe91..0945f956bf80 100644
> > --- a/kernel/sched/psi.c
> > +++ b/kernel/sched/psi.c
> > @@ -160,7 +160,6 @@ __setup("psi=", setup_psi);
> >  #define EXP_300s       2034            /* 1/exp(2s/300s) */
> >
> >  /* PSI trigger definitions */
> > -#define WINDOW_MIN_US 500000   /* Min window size is 500ms */
> >  #define WINDOW_MAX_US 10000000 /* Max window size is 10s */
> >  #define UPDATES_PER_WINDOW 10  /* 10 updates per window */
> >
> > @@ -1278,8 +1277,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> >         if (state >= PSI_NONIDLE)
> >                 return ERR_PTR(-EINVAL);
> >
> > -       if (window_us < WINDOW_MIN_US ||
> > -               window_us > WINDOW_MAX_US)
> > +       if (window_us == 0 || window_us > WINDOW_MAX_US)
> >                 return ERR_PTR(-EINVAL);
> >
> >         /* Check threshold */
> > --
> > 2.40.0.rc0.216.gc4246ad0f0-goog
> >


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] psi: remove 500ms min window size limitation for triggers
  2023-05-02 17:20   ` Suren Baghdasaryan
@ 2023-05-02 17:24     ` Peter Zijlstra
  2023-05-02 17:28       ` Suren Baghdasaryan
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2023-05-02 17:24 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: tj, hannes, lizefan.x, johunt, mhocko, keescook, quic_sudaraja,
	cgroups, linux-mm, linux-kernel

On Tue, May 02, 2023 at 10:20:34AM -0700, Suren Baghdasaryan wrote:
> On Thu, Mar 2, 2023 at 5:16 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Thu, Mar 2, 2023 at 5:13 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > Current 500ms min window size for psi triggers limits polling interval
> > > to 50ms to prevent polling threads from using too much cpu bandwidth by
> > > polling too frequently. However the number of cgroups with triggers is
> > > unlimited, so this protection can be defeated by creating multiple
> > > cgroups with psi triggers (triggers in each cgroup are served by a single
> > > "psimon" kernel thread).
> > > Instead of limiting min polling period, which also limits the latency of
> > > psi events, it's better to limit psi trigger creation to authorized users
> > > only, like we do for system-wide psi triggers (/proc/pressure/* files can
> > > be written only by processes with CAP_SYS_RESOURCE capability). This also
> > > makes access rules for cgroup psi files consistent with system-wide ones.
> > > Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
> > > remove the psi window min size limitation.
> > >
> > > Suggested-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
> > > Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@quicinc.com/
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > Forgot to change the --to field from Tejun to PeterZ.
> > Peter, just to clarify, this change is targeted for inclusion in your tree.
> 
> I think this patch slipped through the cracks. Peter, could you please
> take it into your tree?

Sorry, yes, got lost. I'll go queue it for post -rc1. No urgency with
this right?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] psi: remove 500ms min window size limitation for triggers
  2023-05-02 17:24     ` Peter Zijlstra
@ 2023-05-02 17:28       ` Suren Baghdasaryan
  0 siblings, 0 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-05-02 17:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tj, hannes, lizefan.x, johunt, mhocko, keescook, quic_sudaraja,
	cgroups, linux-mm, linux-kernel

On Tue, May 2, 2023 at 10:24 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, May 02, 2023 at 10:20:34AM -0700, Suren Baghdasaryan wrote:
> > On Thu, Mar 2, 2023 at 5:16 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Mar 2, 2023 at 5:13 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > Current 500ms min window size for psi triggers limits polling interval
> > > > to 50ms to prevent polling threads from using too much cpu bandwidth by
> > > > polling too frequently. However the number of cgroups with triggers is
> > > > unlimited, so this protection can be defeated by creating multiple
> > > > cgroups with psi triggers (triggers in each cgroup are served by a single
> > > > "psimon" kernel thread).
> > > > Instead of limiting min polling period, which also limits the latency of
> > > > psi events, it's better to limit psi trigger creation to authorized users
> > > > only, like we do for system-wide psi triggers (/proc/pressure/* files can
> > > > be written only by processes with CAP_SYS_RESOURCE capability). This also
> > > > makes access rules for cgroup psi files consistent with system-wide ones.
> > > > Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
> > > > remove the psi window min size limitation.
> > > >
> > > > Suggested-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
> > > > Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@quicinc.com/
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> > >
> > > Forgot to change the --to field from Tejun to PeterZ.
> > > Peter, just to clarify, this change is targeted for inclusion in your tree.
> >
> > I think this patch slipped through the cracks. Peter, could you please
> > take it into your tree?
>
> Sorry, yes, got lost. I'll go queue it for post -rc1. No urgency with
> this right?

Yes, I'll be merging it into Android branches counting on it making
upstream later on :) Greg will hate me for that but I'll survive.
Thanks!


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-02 17:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-03  1:13 [PATCH v2 1/1] psi: remove 500ms min window size limitation for triggers Suren Baghdasaryan
2023-03-03  1:16 ` Suren Baghdasaryan
2023-05-02 17:20   ` Suren Baghdasaryan
2023-05-02 17:24     ` Peter Zijlstra
2023-05-02 17:28       ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox