From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEE41EEAA74 for ; Thu, 14 Sep 2023 20:54:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D3226B02EC; Thu, 14 Sep 2023 16:54:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3838D6B02ED; Thu, 14 Sep 2023 16:54:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24AB26B02EE; Thu, 14 Sep 2023 16:54:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1603E6B02EC for ; Thu, 14 Sep 2023 16:54:16 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B755140441 for ; Thu, 14 Sep 2023 20:54:15 +0000 (UTC) X-FDA: 81236405670.03.47DA151 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 9FE531C0002 for ; Thu, 14 Sep 2023 20:54:13 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SXwFs4NL; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694724853; a=rsa-sha256; cv=none; b=4CvK35bvg79t/wWNx1FmLxiFm8oxuLSAb8lFCcV3KbyglciXwKoH2V4BbCw+SoGLna/fd5 CyLP51ulfPx+9gaxxZaPplBoVIshYAy+Kk2ZJo/lxwSoPbKpD7Xa8gD+7XVmDQ1aBypOx7 ImfoNh+EBrwqA255Plw0V07NaGjoMhE= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SXwFs4NL; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694724853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UKqdJpEhXaeQzQWCZBTpv9uWAL/fOLXY3yRJZGmhuG4=; b=DpnJBcZ8o68B9nn9ifCU3Toov5P2rVwd/MEliFopCKhCpJg9St0A3w+B5vCSNlJX4MFpnL JB7fEu9Y8u5UXx2uoMEXHjXBPiOob7ulu14Y5gUW16pdcqHOg12m0K+MbK9Hoi9I2mvrFh D6tnjekiKGh9WwxL1Oawd2Oemr8QJ18= Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-591ba8bd094so15564847b3.3 for ; Thu, 14 Sep 2023 13:54:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694724853; x=1695329653; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UKqdJpEhXaeQzQWCZBTpv9uWAL/fOLXY3yRJZGmhuG4=; b=SXwFs4NLVxJdioS713RlQB+GJuLWZe0AMUrVAKFrTnetqDk6Uz+pM79EVM+NgAuLLH BCXtUPse3h9QcNPHaNg1z2ZKeexiMchj3sug/9sHlElJDLd2ZTMG/tqxQTp+hS2imlT/ OYjf3HYrZ5PVjTPIi4NESgczPdzWKl0fptIzcuyGWLs1Targ9og6fJgZqRcK0nAQOp8f 56epe/1M5jA9DHPAaHMWrncs369Y0s1/EKu3Jv5477vkV24fS6L2KDQP6/VpIhuGVcd8 NtJBf5zU7rbwzgQWbvRq6plT/KnXjI6THPKwvuyeAS9iap9SOWlIo24h79i8XHaPOwT8 p5WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694724853; x=1695329653; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UKqdJpEhXaeQzQWCZBTpv9uWAL/fOLXY3yRJZGmhuG4=; b=DguK4xRy3zDg9NjEvuOh0dU3bOSitJi6zSsKSLRegJEVRu82OkMhl8DSy3HsmwSzN+ AO9gzhaH/Be0Q/MRWV8JXyMIzXeW+yj6GfjocDvWw+FJzF3WRlp88nwsEH7iPXPCQ724 sIBIOeEAtGBaSmidLBiMolqquv9QCNXud2mgZ786UGC4HDvAhK/AKHt5pqoZkAjOBPfT 6N7CJHz+QuJKwpMpM+sCMblY+KuDzvyJzWnRr6TtcfmUqTRvrKd7unhlPOMDcOSHzfzr FCpG8VLnf4sWo83lTV/8bVghWlOUFULv38A2JjAv2pj8UVGzvk/d9gblpBSwDCplmW/Y 411A== X-Gm-Message-State: AOJu0YwBiaD3rg75RfXzAI0yXirYuvG6cBNM1eUbpHhV0KsZHEgN7M47 ZpjkeMOT66zucvZU7Et0pCXWM2/naxHiZ+20X9Q5Sw== X-Google-Smtp-Source: AGHT+IF4DvWD3xJkF2IOcbdqofMhRhruqJm2atUAe2XDNeL27MYzjsKYmgYMjt6xJog9bazLiMekC3zpQjze/QrotIY= X-Received: by 2002:a0d:ca97:0:b0:59b:d351:60cd with SMTP id m145-20020a0dca97000000b0059bd35160cdmr5897145ywd.38.1694724852506; Thu, 14 Sep 2023 13:54:12 -0700 (PDT) MIME-Version: 1.0 References: <000000000000f392a60604a65085@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 14 Sep 2023 20:53:59 +0000 Message-ID: Subject: Re: [syzbot] [mm?] kernel BUG in vma_replace_policy To: Matthew Wilcox Cc: syzbot , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9FE531C0002 X-Stat-Signature: k39ex93snswsi7fipjqa3kse7bxiwysj X-Rspam-User: X-HE-Tag: 1694724853-421461 X-HE-Meta: U2FsdGVkX19oBFogyVMRehI8yxFPst81oI9p/yuZRZfwrj1nNzXdjWIZf/BB5SDk6z/wODkLmyltvrPy0lZAAFe55i2lVbulCfEGSHT/jhDIOZcfC9fC7Uul9hG32I4aIkoEVTmy+ZGIZ+COk09a7yYfuJzdE+YAcmuMfnjrPLlPjtgKX92nAo3DCAOkeH/7l/RxVqJAcphRcJmYK324ie9j7MN7Mq22LMx9ooEsCIqDIe1QoK0FSzZfpo6j71kHmt2pJrnAxHO5V/1maO6piEEGUbXJ63QJMbkrbwCk+3jsv9OaeKZM+iJrN7GsO0ACGxdQaA7Pe50rKOYcHh/PI06vBdxQCC4AyEd+DjIzy6DNTYn38o3K5utGB8eNl5+Cg6OP57EOzfCSAL7VzLMlmBDz9kCgpOhzYYE4gGUKZbFCKMB0BNiPArJVnO79eRU6lyGtS+Ke4mv+dlibkciGX7EFnc/QQ2TwVcK3qLm9vjgqh/yddnTXL+wBnjldwCEsCQ8o2GlRP185Deml+PjanbJl/FjZ9ZbeefkI+iHUMGHDzPhOWCejpJuPBZnmkOUpuNXwpS6FMAm6d+4WcpTmSVVOnRL5CFvVZNEb83TLCacEF9Nw6AgIpW/NYT6sHYWzsL9rqIT92pJWbKxjg8PK7HmeJPcllkMOni3CYAsz8kZWkXdLT1FYlvdwl9CJU5/+k1sLTcWNdr+kLEgDj86vShr1Q6/MmYigLXsYdgV8RDjPtoUje4Qf4umn/JRWXSXrkV/Ss455WkoPaiILcNzp4gde7d40mYq0W2CJ96OP6131KdbnFoJ/pOFT/r/XGaPJw2oRjE3/04i2Cen71leqiz3yLsX7Vd3oFqvhLHobI7rV54zrdky0brUSrJRXgeJJxiIHE+tGNubX9maMx92feAiEtBOYndCHrXzj2G3E+OmwzhV5X5OhBofMP+YtpWGYRDhRQax0h3bbKtLZHyH tnPWi6ay z9ZkWO1G8ck5pHibGfGnnYLd54e8iSF3589Y+1AQ9K9D+AqQubVM8lJVsYxvTkrhkUvu9wRIkTW8LIs6TOR2SEAfcUHCK7oGmWiEjFf5jRSAYxwosgWctjshu2WAFAa2xX46YGloxL797KBGYv8rwXJMup2RgApHslS4GvzTpiqt+zHJ1OtafbbtkvLHpO8rJMBAlrSH4n4sJwOX3Y6EsomuUVdJX1xXXjJbJDdQT+2mPLjsARpPFIVCwZhvTQSr6KYnDslxd6RDfsq2oLHAbG7xfwmhkWDbmGioJhP+O0rbLvVmB+USeOAXHeeU6isrrZKTftew3Dj2udDw/8w4OGxb3/LSAykNPh9mc0IBjHXtgzbkPoV6PXl5f12uij+q8qr2f/FJjZ3GykTHwngz54GJRzSMfl4OyIQg+BSsdo0IWfNkhdiU5bH78RWLe4mmcpw49g0MdBiz3IjgEo35zrvzl0Glcj1JFycttskD7VP1Wsjghj46vUt3Pgphh0lGkw4u7gjgSSrUfZ5XWKJInluoPrmy82QymkR7bpvu0tjLd9bQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 14, 2023 at 8:00=E2=80=AFPM Suren Baghdasaryan wrote: > > On Thu, Sep 14, 2023 at 7:09=E2=80=AFPM Matthew Wilcox wrote: > > > > On Thu, Sep 14, 2023 at 06:20:56PM +0000, Suren Baghdasaryan wrote: > > > I think I found the problem and the explanation is much simpler. Whil= e > > > walking the page range, queue_folios_pte_range() encounters an > > > unmovable page and queue_folios_pte_range() returns 1. That causes a > > > break from the loop inside walk_page_range() and no more VMAs get > > > locked. After that the loop calling mbind_range() walks over all VMAs= , > > > even the ones which were skipped by queue_folios_pte_range() and that > > > causes this BUG assertion. > > > > > > Thinking what's the right way to handle this situation (what's the > > > expected behavior here)... > > > I think the safest way would be to modify walk_page_range() and make > > > it continue calling process_vma_walk_lock() for all VMAs in the range > > > even when __walk_page_range() returns a positive err. Any objection o= r > > > alternative suggestions? > > > > So we only return 1 here if MPOL_MF_MOVE* & MPOL_MF_STRICT were > > specified. That means we're going to return an error, no matter what, > > and there's no point in calling mbind_range(). Right? > > > > +++ b/mm/mempolicy.c > > @@ -1334,6 +1334,8 @@ static long do_mbind(unsigned long start, unsigne= d long len, > > ret =3D queue_pages_range(mm, start, end, nmask, > > flags | MPOL_MF_INVERT, &pagelist, true); > > > > + if (ret =3D=3D 1) > > + ret =3D -EIO; > > if (ret < 0) { > > err =3D ret; > > goto up_out; > > > > (I don't really understand this code, so it can't be this simple, can > > it? Why don't we just return -EIO from queue_folios_pte_range() if > > this is the right answer?) > > Yeah, I'm trying to understand the expected behavior of this function > to make sure we are not missing anything. I tried a simple fix that I > suggested in my previous email and it works but I want to understand a > bit more about this function's logic before posting the fix. So, current functionality is that after queue_pages_range() encounters an unmovable page, terminates the loop and returns 1, mbind_range() will still be called for the whole range (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1345), all pages in the pagelist will be migrated (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1355) and only after that the -EIO code will be returned (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1362). So, if we follow Matthew's suggestion we will be altering the current behavior which I assume is not what we want to do. The simple fix I was thinking about that would not alter this behavior is smth like this: diff --git a/mm/pagewalk.c b/mm/pagewalk.c index b7d7e4fcfad7..c37a7e8be4cb 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -493,11 +493,17 @@ int walk_page_range(struct mm_struct *mm, unsigned long start, if (!vma) { /* after the last vma */ walk.vma =3D NULL; next =3D end; + if (err) + continue; + if (ops->pte_hole) err =3D ops->pte_hole(start, next, -1, &wa= lk); } else if (start < vma->vm_start) { /* outside vma */ walk.vma =3D NULL; next =3D min(end, vma->vm_start); + if (err) + continue; + if (ops->pte_hole) err =3D ops->pte_hole(start, next, -1, &wa= lk); } else { /* inside vma */ @@ -505,6 +511,8 @@ int walk_page_range(struct mm_struct *mm, unsigned long start, walk.vma =3D vma; next =3D min(end, vma->vm_end); vma =3D find_vma(mm, vma->vm_end); + if (err) + continue; err =3D walk_page_test(start, next, &walk); if (err > 0) { @@ -520,8 +528,6 @@ int walk_page_range(struct mm_struct *mm, unsigned long start, break; err =3D __walk_page_range(start, next, &walk); } - if (err) - break; } while (start =3D next, start < end); return err; } WDYT?