linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jann Horn <jannh@google.com>, Davidlohr Bueso <dave@stgolabs.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Christoph Lameter <clameter@sgi.com>,
	Kemi Wang <kemi.wang@intel.com>,
	Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH AUTOSEL 3.18 8/9] mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size
Date: Fri, 16 Nov 2018 14:19:10 -0500	[thread overview]
Message-ID: <20181116191910.GJ1706@sasha-vm> (raw)
In-Reply-To: <20181116184457.GA11906@dhcp22.suse.cz>

On Fri, Nov 16, 2018 at 07:44:57PM +0100, Michal Hocko wrote:
>On Fri 16-11-18 13:19:04, Sasha Levin wrote:
>> On Fri, Nov 16, 2018 at 09:55:25AM +0100, Michal Hocko wrote:
>[...]
>> > > Race condition with memory hotplug due to missing locks:
>> > >
>> > > 	https://marc.info/?l=linux-mm&m=154211934011188&w=2
>> >
>> > Memory hotplug locking is dubious at best and this patch doesn't really
>> > fix it. It fixes a theoretical problem. I am not aware anybody would be
>> > hitting in practice. We need to rework the locking quite extensively.
>>
>> The word "theoretical" used in the stable rules file does not mean
>> that we need to have actual reports of users hitting bugs before we
>> start backporting the relevant patch, it simply suggests that there
>> needs to be a reasonable explanation of how this issue can be hit.
>>
>> For this memory hotplug patch in particular, I use the hv_balloon driver
>> at this very moment (running a linux guest on windows, with "dynamic
>> memory" enabled). Should I wait for it to crash before I can fix it?
>>
>> Is the upstream code perfect? No, but that doesn't mean that it's not
>> working at all, and if there are users they expect to see fixes going in
>> and not just sitting idly waiting for a big rewrite that will come in a
>> few years.
>>
>> Memory hotplug fixes are not something you think should go to stable?
>> Andrew sent a few of them to stable, so that can't be the case.
>
>I am not arguing about hotplug fixes in general. I was arguing that this
>particular one is a theoretical one and hotplug locking is quite subtle.
>E.g. 381eab4a6ee mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
>http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv
>So in general unless the issue is really triggered easily I am rather
>conservative.

We have millions of machines running linux, everything is triggered
"easily" at that scale.

>> > > Raising an OOM event that causes issues in userspace when no OOM has
>> > > actually occured:
>> > >
>> > > 	https://marc.info/?l=linux-mm&m=154211939811582&w=2
>> >
>> > The patch makes sense I just do not think this is a stable material. The
>> > semantic of the event was and still is suboptimal.
>>
>> I really fail to understand your reasoning about -stable here. This
>> patch is something people actually hit in the field, spent time on
>> triaging and analysing it, and submitting a fix which looks reasonably
>> straightforward.
>>
>> That fix was acked by quite a few folks (including yourself) and merged
>> in. And as far as we can tell, it actually fixed the problem.
>>
>> Why is it not stable material?
>
>Because the semantic of the OOM event is quite tricky itself. We have
>discussed this patch and concluded that the updated one is more
>sensible. But it is not yet clear whether this is actually what other
>users expect as well. That to me does sound quite risky for a stable
>kernel.

So there's another patch following this one that fixes it? Sure - can I
take both?

Users expect to not have their containers die randomly, if you're saying
that you're still working on a fix for that then that is a different
story than saying "we fixed it, but it should not go to stable".

And let's also draw a line there, users will not wait for the OOM event
logic to be perfect before they can expect their workloads to run
without issues.

>> My understanding is that you're concerned with the patch itself being
>> "suboptimal", but in that case - why did you ack it?
>>
>> > > I think that all 3 cases represent a "real" bug users can hit, and I
>> > > honestly don't know why they were not tagged for stable.
>> >
>> > It would be much better to ask in the respective email thread rather
>> > than spamming mailing with AUTOSEL patches which rarely get any
>> > attention.
>>
>> I actually tried it, but the comments I got is that it gets in the way
>> and people preferred something they can filter.
>
>which means that AUTOSEL just goes to /dev/null...

Or just not get mixed with the process? for some people it's easier to
see AUTOSEL mails with the way it works now rather than if they suddenly
show up as a continuation of a weeks old thread.

>> > We have been through this discussion several times already and I thought
>> > we have agreed that those subsystems which are seriously considering stable
>> > are opted out from the AUTOSEL automagic. Has anything changed in that
>> > regards.
>>
>> I checked in with Andrew to get his input on this, he suggested that
>> these patches should be sent to linux-mm and he'll give it a close look.
>
>If Andrew is happy to get AUTOSEL patches then I will not object of
>course but let's not merge these patches without and expclicit OK.

This is fair. I think that the process has caused some unnecessary
friction: we all want the same result but just disagree on the means :)

I won't merge any mm/ AUTOSEL patches until this gets clearer.

--
Thanks,
Sasha

  reply	other threads:[~2018-11-16 19:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20181113055252.79406-1-sashal@kernel.org>
2018-11-13  5:52 ` Sasha Levin
2018-11-15 22:08   ` Andrew Morton
2018-11-15 22:37     ` Sasha Levin
2018-11-15 22:47       ` Andrew Morton
2018-11-15 23:01         ` Sasha Levin
2018-11-16  8:55           ` Michal Hocko
2018-11-16 18:19             ` Sasha Levin
2018-11-16 18:44               ` Michal Hocko
2018-11-16 19:19                 ` Sasha Levin [this message]
2018-11-16 19:34                   ` Michal Hocko
2018-11-13  5:52 ` [PATCH AUTOSEL 3.18 9/9] mm: don't warn about large allocations for slab Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181116191910.GJ1706@sasha-vm \
    --to=sashal@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=dave@stgolabs.net \
    --cc=jannh@google.com \
    --cc=kemi.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox