From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1AAAE7F128 for ; Tue, 26 Sep 2023 20:47:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8562D8D002B; Tue, 26 Sep 2023 16:47:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 806648D0002; Tue, 26 Sep 2023 16:47:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CDFD8D002B; Tue, 26 Sep 2023 16:47:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5D9E98D0002 for ; Tue, 26 Sep 2023 16:47:45 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 21319A0C34 for ; Tue, 26 Sep 2023 20:47:45 +0000 (UTC) X-FDA: 81279934890.21.BACD4B9 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf22.hostedemail.com (Postfix) with ESMTP id 4A1B7C0027 for ; Tue, 26 Sep 2023 20:47:43 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3dQ6RRI1; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695761263; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1S9w5O9YAxaryW06sBN3R7x44RvnVfU5wnpnkpC1rZI=; b=i4zpOKywXnyynw+pJC3zyUzzBYVKKj+1ifqf53BgXfJ4Ur2IvHYKABHw3tbBj32AQlkJAa icHCXX8NeZUZFrVP1gSTH99nzLUfFPyibPLCTfNQLqUq0WZb57lnRnmOCZGd3fUWFz52zx weiqOUVBqtLeAJ5C70TiGsgYFpDuJWs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3dQ6RRI1; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695761263; a=rsa-sha256; cv=none; b=KW1c2k5UAPIrtBvhsSL6qzXU2SjqOqmnBsC00rjamh3aCxVhWp+qaMdqujmfhsuLUeiUYm oM6ZPCaZnZscz2Dhw3fNBrlT17PTBBoHKT9YnvnvzY09T+BwtbOHEipzfkulvIjNPqvyzF O7KvmL+86Vnn2H/Y2WXzc10VUPYyu+o= Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-5a1ec43870cso18752087b3.0 for ; Tue, 26 Sep 2023 13:47:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695761262; x=1696366062; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=1S9w5O9YAxaryW06sBN3R7x44RvnVfU5wnpnkpC1rZI=; b=3dQ6RRI150f/vEKzAs4ObqdWOYe8GpS3kvhFPW2g+L420enLdt6IVqVmgajbQlTYl8 FIsyAhHXoeL5xfftnGldAUpF5UC7MlPhYxntiQaIkDQLmRZNu6y9JtoWYyuBVLWPKhSj DRFs+S7dNiUQaXNB7AQVN1OcTuUpEsis+2RFSTqTTHqlBdwfjoogeXHNkKU+KdnIVffR jdpi5k7i+lkGqVJWv5LUGmBqgzcYMkdB+ImnD1shboYXHl617ncn+co2fpcYRcD7p7VT YTuZ6S1wZRdKpM9xnk6yMxw/cVVOkIf1aPhqN6o+t5HThE6F5dePsF5sEccfBFeQG1+q 9/Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695761262; x=1696366062; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1S9w5O9YAxaryW06sBN3R7x44RvnVfU5wnpnkpC1rZI=; b=Cj+dArlFTeDtbaDf8Hty4Pbuwc9Ozv76BQYEevaLgc9NmoOK4fO20lpFMr+Udw2wI7 0dosYzATpuPMv8Ny1bpStt86dVXIami14O5AvvRX5G1QJuj9FrWclSXRcgvIsenvzinV k6edSabLWFurDrfC3vJlE3KOTsPXFKaRRxqtKZHLn+vZm7ehIHVJD6YHTUmhLZBLFHTa uNxRhMupeBsHKhAuYopCnBRS+tGCNuaACztpgLkrss6LZ0g0s9oz9Er0OBlSCSXnxDwx EjFFDSP1tq+ZEgEo62zy+FHmnjk74QocqBrqxzqT42rPeLuHPJF0ARlAiwgtMXd/0oRa ogiw== X-Gm-Message-State: AOJu0YzOczxR2FBQpHdX1bTmlAxgJHOHwP6iUk1IruBq6gx/gfaE6PIS EnEevngpYLruDdyj5HDpUSH4wg== X-Google-Smtp-Source: AGHT+IEj95JDcUk9vMCjVVUOULMct3B5wjZR/7G6kto4XvQuG7gqpKuMfEuLi4DrhE2dbIxBDgfzrQ== X-Received: by 2002:a81:7286:0:b0:59b:bacb:a84f with SMTP id n128-20020a817286000000b0059bbacba84fmr111050ywc.47.1695761262182; Tue, 26 Sep 2023 13:47:42 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y206-20020a81a1d7000000b00592a0cad26esm3201981ywg.26.2023.09.26.13.47.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 13:47:40 -0700 (PDT) Date: Tue, 26 Sep 2023 13:47:33 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Matthew Wilcox cc: Hugh Dickins , Andrew Morton , Andi Kleen , Christoph Lameter , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 03/12] mempolicy: fix migrate_pages(2) syscall return nr_failed In-Reply-To: Message-ID: <20de9b0-39fd-76bf-ea7f-3e9df0dd79d9@google.com> References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 4A1B7C0027 X-Rspam-User: X-Stat-Signature: ei77rcjen7jem8mr1gtwq69qdhc8ty5i X-Rspamd-Server: rspam01 X-HE-Tag: 1695761263-554936 X-HE-Meta: U2FsdGVkX19UlzCQZRIoaTSgTh5ChkGD/f+bzyK+BoyESCOSUo5Pzyb2TbcpZtzWRGlqMyntNHt4len+PpCoTOeHLHHvZ/t9MV/8iNpk8pJSSvjjClilpz0y0l5YiC62SqHTu7/ZINh7uwikzhZdK/6/0/BWfa3PP1Kc4HgmARFZpRFlB8IOkHLFwAO9i6xDXy/P4dQOxbOsrFpkeRiOpQ+axekXcLoZ6HYLAypz0h4kUiX+dgBuIynE/4gB+PzjkVTjP7GPp1VF0mnvk5lOyx86OZR9CAMDURLHwrv5bXVlD0XZ8q6HLPCt2cvtrThBkrOs0+nXo61BoSI74mg4O/D7ibq3P+8qNSfFqvKAbc5up59anKKnvZ8zhNQksT5o//IIK5gz2qVBPibUp0jeXh6YMfPRAmLEgDZFV25FY7Rc9oqpOeOzlsg7muRQ3c8R125LBn2bTpUVhTs8atIOqL9gNvtCZ/qPOy0hBKRf6s+0Z+Ug1nQGYOBqJo9wPgjrNXNPREO5PTXwnjatp3hHfIbaJSL4l9piWyhkKGIJPL6eE4cff1tyPshTtKJbGsXIFA36C98RoFjMEjBeU7FfTfha5zRArX3/vus8MA1nvwNTiYC+R0bLpCxC8G2lFuFqvTqXiAZxJnndf5XH92oEDDbcnjWNDPTh4jmddIsN5l4mdmgxFnP/lb4yRjrXBaTCLDVw5g0u0tDcAiNT6ELHs8mAeBYixzRAuMnCORtjZeQVOCeLmRNF4YKZSiPUD7OssvBnqW+yiY2++QIvxcvNY+i0JV21AXPLHYjgpxhoA62Gdb8Mwne0E7bcQ8xEPqTc/jw/JxZTxrSiqio3U/+LO62QjOjlXwxU4ooYQliWuTe12g8+DGWuGb4RLXNZvq5fLKg40pW/63h9epM15d5lx/CCv+C6Nbr5RMEpRgu7vpSdXWE83Rn/zkNQ3Gi7nXrhN8J3cCTmIC7qgMHO0+w E7KJuyN5 mQBqFJ+kJsIDZe5phHeEZJGeRlfjJVEDdcXEq69Dw8njtxEGE81en0t48TUCePLr1N0wqfvZYFsWovKfEUnYV+/6iTwVEUUCdNdbg3h8bVidourl/lYCV46jzzVzX/o9wNWglIeotA/9Lu3r/i1z/t6IKBRjgVHFnueorjnYcPkmI7GpXWx92lvBYsPzrpeyPAbKzjfzQbXXWPGfA85LCGX+nxUC+fYvnUXLZG2D0bIjFEEln/7vb424q0nA3Z9QA344XyYJ0Hf2dnLhGyGwnc+PKSIiadr3n+E/WE9S2vXlSEx8CdE1AIgPCEVPz5U0kmoUkfQnjwyhNRyZh+npHzPpohMvUDtfy9m+eU/2gNmeS5B/OBp/howJe4WhFEl48FetLLtVFoS7BSMiASUkDretChFuBYwGethQNckbQvwErFvXSv5QFWTN0/jtWhqPY+yIPBdM0vBsAMpa/jfgJaP/QgO6LLqRuRdeXqLLKRZ7BBvzh2Ugzjo9gLlzQt3xuAPk1z1eFzsXV6XAf6yZ2WsP03NBrtLItX3D3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 25 Sep 2023, Matthew Wilcox wrote: > On Mon, Sep 25, 2023 at 01:24:02AM -0700, Hugh Dickins wrote: > > "man 2 migrate_pages" says "On success migrate_pages() returns the number > > of pages that could not be moved". Although 5.3 and 5.4 commits fixed > > mbind(MPOL_MF_STRICT|MPOL_MF_MOVE*) to fail with EIO when not all pages > > could be moved (because some could not be isolated for migration), > > migrate_pages(2) was left still reporting only those pages failing at the > > migration stage, forgetting those failing at the earlier isolation stage. > > > > Fix that by accumulating a long nr_failed count in struct queue_pages, > > returned by queue_pages_range() when it's not returning an error, for > > adding on to the nr_failed count from migrate_pages() in mm/migrate.c. > > A count of pages? It's more a count of folios, but changing it to pages > > would entail more work (also in mm/migrate.c): does not seem justified. > > I certainly see what you're saying. If a folio is only partially mapped > (in an extreme case, the VMA is PAGE_SIZE and maps one page of a 512-page > folio), then setting nr_failed to folio_nr_pages() is misleading at best. Actually, that wasn't what I was thinking when I said that: but thank you for the comment, you've helped me to see that what I'm actually doing is not what is claimed there. What I was thinking, something I'm taking as an axiom, is that the units of failure when isolating must match the units of failure when migrating, whatever they are. And migrate_pages(), the internal one, has this helpfully explicit comment: * Returns the number of {normal folio, large folio, hugetlb} that were not * migrated, or an error code. The number of large folio splits will be * considered as the number of non-migrated large folio, no matter how many * split folios of the large folio are migrated successfully. (TBH I haven't spent long enough to actually understand what the second sentence is saying: I do realize that splits complicate the issue, but the function wouldn't be expected to return a "number of large folio splits" anyway. One day, I should work out what the code is actually doing, and try to reword that sentence better.) So above I was trying to say that migrate_pages(), the syscall, returns that quantity: totalling the failed-isolation and failed-migration folios. But you've alerted me to how in fact I'm doing an nr_failed++ for each PTE of a failing-to-isolate folio, not as claimed. It looks like I need to record "qp->large" in the case of failure as well as success. (And then bother about when isolation fails on the first PTE, but succeeds by the time of a later PTE? maybe, or maybe that just gets silly.) I must fix that in v2. > > > +static void queue_folios_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, > > unsigned long end, struct mm_walk *walk) ... > > + if (!(qp->flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || > > + !vma_migratable(walk->vma) || > > + !migrate_folio_add(folio, qp->pagelist, qp->flags)) > > + qp->nr_failed++; > > However, I think here, we would do well to increment by HPAGE_PMD_NR. > Or whatever equivalent is flavour of the week. I *really* wanted to do that (and increment nr_failed PTE by PTE as I'm doing, rather than as I claimed), and gave it some thought: but I don't think it can be done - or not without abandoning the axiom (in which case it's impossible to say what migrate_pages(2) is counting), or adding a layer of complication which simply isn't justifiable. Certainly we could change the definition of what migrate_pages(internal) returns (though I haven't researched who depends on it: IIRC-long-ago there's maybe only one other caller who cares, to update a stat); but that still would not help. Because whether migrate_pages(internal) returns 1 or HPAGE_PMD_NR for an unmigratable and unsplittable THP, it has no idea whether that THP got into the pagelist via a PMD or via one or some number more of PTEs. More info would have to be passed down separately, folio by folio: an auxiliary xarray perhaps, but let's not. If it turns out that I'm deluded, and it can be easily done, please clarify one point: you made this comment on queue_folios_pmd(), but what about queue_folios_hugetlb()? Would you nowadays prefer hugetlb to count 1 or folio_nr_pages()? I think the latter. > > Bravo to the other changes. Thanks - I'm guessing your enthusiasm is mainly due to that "qp->large" realization, which we ought to have thought of before. I'm afraid it's going to get more complicated, once COWs are feeding on Ryan's ALFalfA - might need large[MAX_ORDER], or some better way. But no great hurry, nothing will crash if it's occasionally not-quite-right. > > Reviewed-by: Matthew Wilcox (Oracle) Many thanks for all these rapid and encouraging reviews. Hugh