From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3BBBCD37B5 for ; Sat, 16 Sep 2023 03:57:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D960D8D0049; Fri, 15 Sep 2023 23:57:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D45AF8D003B; Fri, 15 Sep 2023 23:57:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0CC18D0049; Fri, 15 Sep 2023 23:57:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B05B28D003B for ; Fri, 15 Sep 2023 23:57:28 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 620211606E6 for ; Sat, 16 Sep 2023 03:57:28 +0000 (UTC) X-FDA: 81241100976.22.8A8BD16 Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) by imf14.hostedemail.com (Postfix) with ESMTP id 9539110000A for ; Sat, 16 Sep 2023 03:57:26 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=vgqWAymM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of hughd@google.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694836646; a=rsa-sha256; cv=none; b=aTcHCnMcmOZrGIquj2tQ5PZKVTUFTxPlGrinpcm+nHecWQyA+GEeJwou9pO0VRy+xA9cnK TQtdGCF7nl9qDSPQeAaRZi9PVe2tBlf4U+MZJOgtmISRi/S4NKLVTKVXbp3pEL2qd4uXfE 7l75LZ28+vQMEmNwKitHVCl1lKL3rY0= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=vgqWAymM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of hughd@google.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694836646; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DCDJuHGZYWNBYWX3uyxB9QzeHGTQ1OKkFykmHwwuYKs=; b=UzQydN6Pt5/h2CPRnwJbMd3G2yhZ7uoPe1tvei5gk4eE06dnxsNsbaox7/L02KHK1zoued +2U+GfJg/qMm5l0d3Q8HS8JlkW/nxA3sDS6gytmzoyz4JyHpb+jwX5IVEn2QeIm2lyh4AZ sK67i3mt0WdOQLerhgwMBbEvBqwMd8g= Received: by mail-yb1-f179.google.com with SMTP id 3f1490d57ef6-d8195078f69so2316262276.3 for ; Fri, 15 Sep 2023 20:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694836645; x=1695441445; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=DCDJuHGZYWNBYWX3uyxB9QzeHGTQ1OKkFykmHwwuYKs=; b=vgqWAymMPCE6Sz5SXrCPS3o8WNV40WyrzCkf4kCyNnFydPDBYXj5Cwl+wDeX0YVfIj J/vELQyAThkWL8DtNgppwov3kTXKA46P8kpMOouYPHYN1TWgysACh5BZXPqnaPsGhLK6 ljn6WEkVEfcwzBBtnj6dXQlRPhG4x83jMsmDdwAQqVziwH1Zs/lRwVqFNIyRMeP057pd gTJjIrVF5qef8+ZKKmo8EWkZudH8XuTS6nxVrZFRiUPwjKNXzrfgq9HnP5nMPysQiVGW 6Mo/lMwLypKigNIhAqhIikmnaF5waK/XyeuOemLcd1B3qTk6/KFCRICwxW9WWnkABUdj R9Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694836645; x=1695441445; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DCDJuHGZYWNBYWX3uyxB9QzeHGTQ1OKkFykmHwwuYKs=; b=vtghaLqXXWHnSJfYJkK4yNJPHmAPxBEsxbB98YH/kruvXhbHaUVBXreo5qfTeGf9Vf bZi/Knv+QpAUL2N+nF8sFCdO5N7nv74vwHlpuLri590qvyAb4ce/p1QPpxhc9h6+r5Hi Uy93PPLGVJM7FG19v888fcywtKwEWZ8xN9KWoqpFklYRIkOU3r8dU4cZJXioRYfa/N6R J5yYVAsl9/ikIcQRK23OMWWFU1trhFOVx5qogbuQAWyATZutJRx+hEmP+KE/V0DSR6zK RQ39Ca1j9HLlUxBNI/Hnwwqtk6AqYibmwk8ox0sc1rdAv8H5FsnPAwB3eEcqzZswkTbH 5R2w== X-Gm-Message-State: AOJu0YxHGjct6BtlKa4hNrjD5kz1KyqXsqVFXX7G7y2UT1bxjmvWBLo3 VbkYvPd9wnVfDN0eOZ7n+jDQbA== X-Google-Smtp-Source: AGHT+IHdHLhmhEJf9Vc6Bn6ZBhmj6ZjQazOOhDQ39VbDT7h6sHDMEpwsoGzfT5fq1eWhVCIf4sz9+A== X-Received: by 2002:a25:23d5:0:b0:d7e:dd21:9b16 with SMTP id j204-20020a2523d5000000b00d7edd219b16mr3226387ybj.8.1694836645541; Fri, 15 Sep 2023 20:57:25 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c17-20020a5b0bd1000000b00d3596aca5bcsm1083611ybr.34.2023.09.15.20.57.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Sep 2023 20:57:24 -0700 (PDT) Date: Fri, 15 Sep 2023 20:57:22 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Yang Shi cc: Hugh Dickins , Suren Baghdasaryan , Matthew Wilcox , Michal Hocko , Vlastimil Babka , syzbot , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com Subject: Re: [syzbot] [mm?] kernel BUG in vma_replace_policy In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9539110000A X-Stat-Signature: iogokfyprnpwf51hosh3upwtbwk5fmyq X-HE-Tag: 1694836646-780527 X-HE-Meta: U2FsdGVkX1/6QhohB26HWcPmNtOKnNu1VXJSP03KcAP30VU5x1EI3tUbfcldkDJn2uZCxBp/o/IUY2K5o2TeVadaFyaCPOnSIWOdXZEICp6moebyDEkkrwWGJYS63K99m315yKjmlH/VOg2UNEUyrb3XEUu7FhwXIONgi8UIPNb1B5pe8b+yvKakOoW8JBnBTMXAuMz3IxyE6f2MmrzcbwBAheZkl6x2yLMwtAfP/ceRw10RiN3VolYgkJ47tkQIfoAS91wAUzVAPrfuq1loHRUoOd0+3BVn2hBx4kIuSQPvfCZ0fnsEpjv2B2lf2luUrX5MtM5r1wuAXM7DOw5gLth2JBf8Gv8HM7AMZVK+4zCImScDHa/L5JDJnXbefNHoMJ6Ss7u1Lbc6mHDGYrcY9qgN2jh0qQP57XfDgdB1GH0kqWqFsUsAxtnEyuFx3N+zeQOh9u6rUA8W1SlhPzlBPmZAC64vlld6ghFXCgzl9KLCxRrUl40DxzV0qcfF4g5x6SN1ENlQNBeSHrHOTdjhOgbta41hCEcyLIA5ePAgxKtI5WFYpollGbXcqA5BlrSE114ZxhADhK0OBqnKJJh24gcxJYcBh3TDI+z73uI5WF1dZ6OPLeYr2fjNt4baiChbdtB3hqfzXt6OhI3AAgB6jSh6JQU4hjAwXMKgKzcwH1BX0LmD8I8L/jlHa9MHj0nIlWGcCC82ehSOuTgWMKYQcMq0b17ezoEMtDWcc50uuMY5Bg0p1v3si0H2lxnSLijv9QEDfENtwdzJ5CZtAhT04fa+cXVqpAEl4IwjiyZyl2ESqloYEUqO8AXQjesFXdhBbuZ64Ot7xFlNy6Glvji6jzQ2/kx2yJpr+QLBuNJoEsRw5S2Ai2/FPSS54YjjARXvCyfcw3GR1EDSIJiSxtRjehxhZ3XNcH/yTNdFmgIIgQffiMEjNxyal+QIj5+szovxoEmsXyrU9b5RRzhiYP1 LouMjmjy XXm1wREYqdBGJwsFU295s7VycOxOSkwZeWWQd+4BQhIIJg+f0Zj5CWzqR0xgifNshrerRCDQLuVCbTHTMK2EiV3Kprn2eOlRuvbGY3Rm5THlNo8YgedYdPHUhSnDaysfrTgL4ws81ZguXwwIfFN/unCw3ejvf5HWc7ULk/+pDJrNNHvOtOaUi7tz/olqxokf1CetiDHSIGTtGx6ysvR012f/NKSUu+alArCVIchFtlFXn8C6uEMrD7ibAQLaue86Y9n9zioerSm/urlis4Et+dn55t8u4QLjUdhM3VVAuKjpa3bhC6rRLf5yx/aQ52bsMPB4SE608QCce8vf2eRVGoTw4IS/NLGn0c8zpAYBd2sJzzLBZRItgG9JMiUGjyV8WdfJQ/mc7+ocdpSV5zO6JEc3CqG/MyfHo2UFNgB39b5CCUfygDIymGvXdXJb1KOqC3nVFcklvkCXDisttIEGHiCubodsN5iDNAE2SgKkqUGTupdAYKH6leKJKsQq6NYTc2ys8Bz6dD+gWwgI/TFfV2VCElrhZPbdv0sqAG/GrOOfZ0tk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000163, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 15 Sep 2023, Yang Shi wrote: > > Hi Suren and Hugh, > > Thanks for figuring this out. The mbind behavior is a little bit messy > and hard to follow. I tried my best to recall all the changes. Messy and confusing yes; and for every particular behavior, I suspect that by now there exists some release which has done it that way. > > IIUC, mbind did break the vma iteration early in the first place, then > commit 6f4576e3687b ("mempolicy: apply page table walker on > queue_pages_range()") changed the behavior (didn't break vma iteration > early for some cases anymore), but it messed up the return value and > caused some test cases failure, also violated the manual. The return > value issue was fixed by commit a7f40cfe3b7a ("mm: mempolicy: make > mbind() return -EIO when MPOL_MF_STRICT is specified"), this commit > also restored the oldest behavior (break loop early). But it also > breaks the loop early when MPOL_MF_MOVE|MOVEALL is set, kernel should > actually continue the loop to try to migrate all existing pages per > the manual. Oh, I missed that aspect in my description: yes, I think that's the worst of it: MPOL_MF_STRICT alone could break out early because it had nothing more to learn by going further, but it was simply a mistake for the MOVEs to break out early (and arguable what MOVE|STRICT should do). I thought you and I were going to have a debate about this, but we appear to be in agreement. And I'm not sure whether I agree with myself about whether do_mbind() should apply the mbind_range()s when STRICT queue_pages_range() found an unmovable - there are consistency and regression arguments both ways. (I've been repeatedly puzzled by your comment in queue_folios_pte_range() if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { /* MPOL_MF_STRICT must be specified if we get here */ if (!vma_migratable(vma)) { Does that commment about MPOL_MF_STRICT actually belong inside the !vma_migratable(vma) block? Sometimes I think so, but sometimes I remember that the interaction of those flags, and the skipping arranged by queue_pages_test_walk(), is subtler than I imagine.) > It sounds like a regression. I will take a look at it. Thanks! Please do, I don't have the time for it. > > So the logic should conceptually look like: > > if (MPOL_MF_MOVE|MOVEALL) > continue; > if (MPOL_MF_STRICT) > break; > > So it is still possible that some VMAs are not locked if only > MPOL_MF_STRICT is set. Conditionally, I'll agree; but it's too easy for me to agree in the course of trying to get an email out, but on later reflection come to disagree. STRICT|MOVE behavior arguable. I think the best I can do is send you (privately) my approx-v5.2 patch for this (which I never got time to put into even a Google-internal kernel, though an earlier version was there). In part because I did more research back then, and its commit message cites several even older commits than you cite above, which might help to shed more light on the history (or might just be wrong). And in part because it may give you some more ideas of what needs doing: notably qp->nr_failed, because "man 2 migrate_pages" says "On success migrate_pages() returns the number of pages that could not be moved", but we seem to have lost sight of that (from which one may conclude that it's not very important, but I did find it useful when testing); but of course the usual doubts about the right way to count a page when compound. I'll check how easily that patch applies to a known base such as v5.2, maybe trim it to fit better, then send it off to you. Hugh