From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E43B5EEAA42 for ; Fri, 15 Sep 2023 04:26:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A1436B0319; Fri, 15 Sep 2023 00:26:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 529CE6B031B; Fri, 15 Sep 2023 00:26:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A3F46B031C; Fri, 15 Sep 2023 00:26:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 256CE6B0319 for ; Fri, 15 Sep 2023 00:26:33 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E6B66B4327 for ; Fri, 15 Sep 2023 04:26:32 +0000 (UTC) X-FDA: 81237545424.23.CBFCFA8 Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf28.hostedemail.com (Postfix) with ESMTP id 2225EC0009 for ; Fri, 15 Sep 2023 04:26:30 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jgUUOz4H; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694751991; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qy1MWRDW/N7/7GaK+3ShmCxrItvLSys0pDTcIaDLgUo=; b=3hvy9df8Dshr7wPoQ49UCUHDNYljwjba9bg/4J7drU+TMt1wf7ib7vbaSaVuQV/7+EbVk7 mt1gSkTolr+8f4L0bvNtO5MmG+znoVJ1I6pVfH32sPi+HepiS1psQ+K4ZtEP0GgSVOm4rm 8TIVUEQllIJcXiEdtu8aXgqHH8l2TSg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jgUUOz4H; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694751991; a=rsa-sha256; cv=none; b=4Lo9F22DXty/FXKw33P4GWEX57lJKDN9uhZDpp9WdiABlfFeFkVgQ0DSeIbhGilGJwf1c3 EDEWUlJ6BqSDxVDS/5GAi0Q8wnwfK//Vt3Ot7Y87UMqLZqcU3+ywRF9WrwbFAnxsqBwUlq OVV3vsL9Q85UuxZ565E0TL3D0Pdlmvs= Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-d7b9c04591fso1713628276.3 for ; Thu, 14 Sep 2023 21:26:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694751990; x=1695356790; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=qy1MWRDW/N7/7GaK+3ShmCxrItvLSys0pDTcIaDLgUo=; b=jgUUOz4HrGczWGlBWpe3DyZ3Jti+768URYMxopjVDF79yNXFgd7WQ2pFAl5Mc2Fg4X ycMJWyFL3rya1HRsOjoO3iNC76eLQyxSBCTZwbrjv8d0vVWvTW0NLlN3Xz/PU+pfh9L/ 8REKIt5dn7zpn0T3yxwZ2fgvv9nm+FUYXgm7A9xVITa67bYjHEtQlo4hB6PrLsyKInPg Vy5hEKgHI2vRc4Ju2VnUPsWiuKQz/dSWUBm1L9nR09HoHKEtUXN8wo6W+Z79ptVAoqZR UncwkcnuJ8hN9T6n0nb58jvkI9W/92KHzUvemVxFESDQ+6aq4pLxuuJiB4oJaHhVMayH 2PuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694751990; x=1695356790; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qy1MWRDW/N7/7GaK+3ShmCxrItvLSys0pDTcIaDLgUo=; b=YaHltUSdKJatz/cY6FK5RGVQaD99ZjJ40kEmsgA0rz5KPEGJRO8mvlFY8QbtlReKEZ h1Pll77i9xybE7Zak+7cR9phKORmvJeJYxDroQBSF79Ew06uF1OeysDxybQ2hWtC2Nnh 0N/xF1E1EnZAllQDgdz1FfxLTLf512UrTjGstMgGNA7IVjeagts3IcERQWafH3adI8FZ WeMnIHx/h2dkzlsncVVCXBWD3bIPQi4G1mQgu5zNzsKEvI1W0qnhxf7mJVixFhrZoxBa ZV33cHe4Xn30O5ECJP1MuFrr95w9a6uX5g4AqJP7Ex1PlQK+Wv6UHiLyL3Mtx5+M7CG4 iwdg== X-Gm-Message-State: AOJu0YzOOrQ35vJUf8Wb+xMm+zoIjXxNqBXcAqfX0EkGK6Tgrg39Q2Zu QdFnuzCmSB+X5b94saMpRHPo5g== X-Google-Smtp-Source: AGHT+IH+lTkhfrC1mqknRA+Ma4UtmUqUdcdvECb/tWFAtNO7hi4x11c7xryrBjvUNqjdfSQnc3ibgw== X-Received: by 2002:a25:8041:0:b0:d80:1023:8222 with SMTP id a1-20020a258041000000b00d8010238222mr447199ybn.34.1694751990205; Thu, 14 Sep 2023 21:26:30 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a23-20020a25ca17000000b00d7b957d8ed9sm682413ybg.17.2023.09.14.21.26.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 21:26:29 -0700 (PDT) Date: Thu, 14 Sep 2023 21:26:15 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Suren Baghdasaryan cc: Matthew Wilcox , Yang Shi , Michal Hocko , Vlastimil Babka , syzbot , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com Subject: Re: [syzbot] [mm?] kernel BUG in vma_replace_policy In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323328-522046950-1694751990=:12895" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2225EC0009 X-Stat-Signature: ty8xtd47sgtoiuejztys1tb1mspayrdf X-Rspam-User: X-HE-Tag: 1694751990-201055 X-HE-Meta: U2FsdGVkX1/ImF8OBk7Q64hW2lqhBMGgUM3CyGtEwsdJdFyygvvWuKyobrhM7P5P8XahLOtcH21nMUIqFiWm84+k0k3qNDcxIwbGZxPKR4EiLLXm2O3iHH2HLH61r0GH7PWQ9zdRJqe9QBRopCJWa+wZfjLA9LzllCqesFR6ZmsDtUfCibkyemio1gRS0DSeE/SF5pGWpsEgz3gz6NQL5yih3I1M9oyq8gYOAoKdaLzXSUjnU9rpOrvOeImR0GoSYXI+YX+/DC32K6Dg9CNAOl+kN/hRUMU5mlc0Rd6N5Si5amwli1sEQi1Z79WpJ2a2tTEnHxBBsdZEhEinvprD1KCEnWSUrpoyVeLrEb1AqBGQRvJI4czJS7EW0/datRkzk0N7fEhb6xkxyCj6TYUYCSl4xD0M4Nz5WUhdpu5H3swW3VpFtAvd8b8rxtkVEEushLIXraD+Rk1dPjF7bifO14CCFvK47Qm33aSqDazGYQy4YPnULvlZ9xYYPetGTAAZWz50W1tMJgRKEo3bSWfWqDxSUlfWd6009Q4Rget1C9J5FTTNkZHdFX4SL05vzQvfhHT+oZNiOVVQc++mOv2/agMd89xdpGp7T2aw5ASw2L7caMy6lGUyRxEtiYOLgao+FfWDGRK9TS9x8NP6giqjXo2z7zZhE0+tysNQnK0saHBhWv7zirrBOjThT+uzfpCsolBLswaUHYEe7I21NhVccYLyVuymAH5sO1fnqMPkmAzmvrsGngyl2ilcmx3ydOWN6eGDfLADRTmnTLkQ5Ak37NY8ylAN9pH6cL3YBH0bI1+sAilU+P87WvSP+bK7b42W8lWyIQ+DsO6cfRSLIUVYEINxQa5wlHky4JoO8bqm5xN0NtCyTbEtDqRTiN0+yb01AN6+tX+ymqwGQeTJTrjBumTr5FP51NTbs81qJKbSV/gFcWh779R9Rd1tcttNYsuh7nh5c6/w+WifKNDkXRh ott1WSgG GjPYq/wMMMkLD5Ll5vAknEuSlECDo/5fnmWOOl8P1hKSBxV6vKkxzzZNuLF3qTaerRJKgamNYpP7Oqwdxfq9Bjdq8By6H766NRC/PBjAcH/E+FOQrtkpqWuVThBgxpaHl9RrU7bYoKlLzepwlj6xV1zMlUqdNHkPggZUMck+DlacM7kBm0HEo0b3io1RZiEGhRwppfZKthV0YUUr9GegzcwdxasLNZbAiZnDiVnv8P8XFlP1aeoM0i2p4AhZnEb5Po5fLbqYx6N4qYbazl/iRXbB7rDbu2aEiaFE5PjLOmcK9zeSiSfCOZVw/VrRmz2NpNpSUp3E+rXnatgC6+8taskGJ6CCLBSBs/lBKxmUylduz84hsSZGLHaJsMNnICiWkJO/JLYpTz0DbI+2WtHR8JVgJF0chJQ3WeO8oXiXkpfVELyD5DFKALotrBR7U8SlTBQcZWgSXzTI9qJJhvb/APEOUA8jPq4Jm8mV505id4VaOikTSV9s9OlFhc9VDwmOTTU4K5w3OcGaYMgJxb0/1RlpzkOlGf3/b7HgFTXeI3cltcREX06vLphstpsgxz0YEPSmjOKOxSTe05pviWdS5YIqM//N2lwO2B38XoEXmeMjapf09NIACcjGW1iMERdsVCu9SYrgMDO2/Q6W42juxK2AZl/FfDpXxEFjiaampaINf4OHQSImc3GcNN8vbisytQzgFQCSjsABIZPnDRSMJMitLsCX1L+n9Obeas2UdSxJu7Z4yBTc3tbm0ROcmqMLZe7abop6vlDdP8WGc0WSsFSRcnbK/6BxYRvNt0+GV+4GlcPpEl93fgGyQ784Cv4lZkDry X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-522046950-1694751990=:12895 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Thu, 14 Sep 2023, Suren Baghdasaryan wrote: > On Thu, Sep 14, 2023 at 9:24=E2=80=AFPM Matthew Wilcox wrote: > > On Thu, Sep 14, 2023 at 08:53:59PM +0000, Suren Baghdasaryan wrote: > > > On Thu, Sep 14, 2023 at 8:00=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Thu, Sep 14, 2023 at 7:09=E2=80=AFPM Matthew Wilcox wrote: > > > > > > > > > > On Thu, Sep 14, 2023 at 06:20:56PM +0000, Suren Baghdasaryan wrot= e: > > > > > > I think I found the problem and the explanation is much simpler= =2E While > > > > > > walking the page range, queue_folios_pte_range() encounters an > > > > > > unmovable page and queue_folios_pte_range() returns 1. That cau= ses a > > > > > > break from the loop inside walk_page_range() and no more VMAs g= et > > > > > > locked. After that the loop calling mbind_range() walks over al= l VMAs, > > > > > > even the ones which were skipped by queue_folios_pte_range() an= d that > > > > > > causes this BUG assertion. > > > > > > > > > > > > Thinking what's the right way to handle this situation (what's = the > > > > > > expected behavior here)... > > > > > > I think the safest way would be to modify walk_page_range() and= make > > > > > > it continue calling process_vma_walk_lock() for all VMAs in the= range > > > > > > even when __walk_page_range() returns a positive err. Any objec= tion or > > > > > > alternative suggestions? > > > > > > > > > > So we only return 1 here if MPOL_MF_MOVE* & MPOL_MF_STRICT were > > > > > specified. That means we're going to return an error, no matter = what, > > > > > and there's no point in calling mbind_range(). Right? > > > > > > > > > > +++ b/mm/mempolicy.c > > > > > @@ -1334,6 +1334,8 @@ static long do_mbind(unsigned long start, u= nsigned long len, > > > > > ret =3D queue_pages_range(mm, start, end, nmask, > > > > > flags | MPOL_MF_INVERT, &pagelist, true= ); > > > > > > > > > > + if (ret =3D=3D 1) > > > > > + ret =3D -EIO; > > > > > if (ret < 0) { > > > > > err =3D ret; > > > > > goto up_out; > > > > > > > > > > (I don't really understand this code, so it can't be this simple,= can > > > > > it? Why don't we just return -EIO from queue_folios_pte_range() = if > > > > > this is the right answer?) > > > > > > > > Yeah, I'm trying to understand the expected behavior of this functi= on > > > > to make sure we are not missing anything. I tried a simple fix that= I > > > > suggested in my previous email and it works but I want to understan= d a > > > > bit more about this function's logic before posting the fix. > > > > > > So, current functionality is that after queue_pages_range() encounter= s > > > an unmovable page, terminates the loop and returns 1, mbind_range() > > > will still be called for the whole range > > > (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1345)= , > > > all pages in the pagelist will be migrated > > > (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1355) > > > and only after that the -EIO code will be returned > > > (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1362)= =2E > > > So, if we follow Matthew's suggestion we will be altering the current > > > behavior which I assume is not what we want to do. > > > > Right, I'm intentionally changing the behaviour. My thinking is > > that mbind(MPOL_MF_MOVE | MPOL_MF_STRICT) is going to fail. Should > > such a failure actually move the movable pages before reporting that > > it failed? I don't know. > > > > > The simple fix I was thinking about that would not alter this behavio= r > > > is smth like this: > > > > I don't like it, but can we run it past syzbot to be sure it solves the > > issue and we're not chasing a ghost here? >=20 > Yes, I just finished running the reproducer on both upstream and > linux-next builds listed in > https://syzkaller.appspot.com/bug?extid=3Db591856e0f0139f83023 and the > problem does not happen anymore. > I'm fine with your suggestion too, just wanted to point out it would > introduce change in the behavior. Let me know how you want to proceed. Well done, identifying the mysterious cause of this problem: I'm glad to hear that you've now verified that hypothesis. You're right, it would be a regression to follow Matthew's suggestion. Traditionally, modulo bugs and inconsistencies, the queue_pages_range() phase of do_mbind() has done the best it can, gathering all the pages it can that need migration, even if some were missed; and proceeds to do the mbind_range() phase if there was nothing "seriously" wrong (a gap causing -EFAULT). Then at the end, if MPOL_MF_STRICT was set, and not all the pages could be migrated (or MOVE was not specified and not all pages were well placed), it returns -EIO rather than 0 to inform the caller that not all could be done. There have been numerous tweaks, but I think most importantly 5.3's d883544515aa ("mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") added those "return 1"s which stop the pagewalk early. In my opinion, not an improvement - makes it harder to get mbind() to do the best job it can (or is it justified as what you're asking for if you say STRICT?). But whatever, it would be a further regression for mbind() not to have done the mbind_range(), even though it goes on to return -EIO. I had a bad first reaction to your walk_page_range() patch (was expecting to see vma_start_write()s in mbind_range()), but perhaps your patch is exactly what process_mm_walk_lock() does now demand. [Why is Hugh responding on this? Because I have some long-standing mm/mempolicy.c patches to submit next week, but in reviewing what I could or could not afford to get into at this time, had decided I'd better stay out of queue_pages_range() for now - beyond the trivial preferring an MPOL_MF_WRLOCK flag to your bool lock_vma.] Hugh --8323328-522046950-1694751990=:12895--