From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7658C7EE37 for ; Tue, 6 Jun 2023 19:59:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 210D38E0001; Tue, 6 Jun 2023 15:59:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C1046B0072; Tue, 6 Jun 2023 15:59:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 061F58E0001; Tue, 6 Jun 2023 15:59:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E579D6B0071 for ; Tue, 6 Jun 2023 15:59:27 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A45261A07DF for ; Tue, 6 Jun 2023 19:59:27 +0000 (UTC) X-FDA: 80873387574.11.60AEBA8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 78E454000D for ; Tue, 6 Jun 2023 19:59:25 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G1BioU04; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686081565; a=rsa-sha256; cv=none; b=WhtMFH8OjANdwz/t/EHSw3zrQnlIqoLQKTkccMTOQ+W/vgzD7hKK0PpwRiy1O9lEO35h3J R9EK545YuIQ6n5cfpT5KT2QUoj8NFZfaV1opSmApufSqcAbiSfXSy++HCA8MYB+Soy1Roe 3wmFyq3TRuJ6z6UVScZFPKMGAKK/7HY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G1BioU04; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686081565; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7UrpOrQtg56crpEBvYaRSePkWzv3vUaAoezpCMlFFYA=; b=3zkkYU2TGIDPEGUNny2dCRbrW0ZfnTb7r8zVZvMzWCM6t2xEofJx/VwvVRKHHcCI3WMLiC hejSA0V3rnpdYqv87HpQN+tLRPkBsZL0PAn1Zvgl4WpEDwcvlETKZemAlmr2pif3Ca64kA arW9jsEu4/iS0RqBwENpH1FtJ8fiEjo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686081564; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7UrpOrQtg56crpEBvYaRSePkWzv3vUaAoezpCMlFFYA=; b=G1BioU04Vewmud+fZXAXuzn+pldzhBUzytaBEH2Mz2EE5rduLeeXfjzSGG7zZYDC+GS6jZ iXBe/9Z3tiemGEORiC67Yh32qzUzrlawRpmiauIhbAkUWhn26qgBc3qnUQqdy5ic3YbqMc bWyymO/Hiurd6Cp4o51YC6jiVF4YvK4= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-636-yB9bZhnONKmhCXnnt0QSqg-1; Tue, 06 Jun 2023 15:59:21 -0400 X-MC-Unique: yB9bZhnONKmhCXnnt0QSqg-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-75b147a2548so132151385a.1 for ; Tue, 06 Jun 2023 12:59:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686081561; x=1688673561; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7UrpOrQtg56crpEBvYaRSePkWzv3vUaAoezpCMlFFYA=; b=Vg+HYszgk2g6LnNy/rq9bom9hD+4TZlyffV3Z1Jx39q8guKa6sUag5ed7jaB1qFEk2 iO8/2KzCbgB9DHwlGBkzf3Dppoo5taf/oIbuGeNLa4ZRji4cH2XP15KYTbuMLaC8m9FO qmcYXpltmBYu/UE2hSO7TDFYcWuUxLcLjqxpTFEF9GzVW6T87RXOWOSQNtgBB3cExcUH VwHTdMJZRPHLlvAsJKX5HgFfREM2ZCap+WqgxhCTRIBBdfHI4PHhJxAfpQSBOwO/Gygm 8sQyu8HYaNKp7DMiliepLyqsMuSgodW1zVYVA57x8PWUE1b29jEnn4sDOEwvZ7iz1WIJ W1ug== X-Gm-Message-State: AC+VfDyA/3h+LBCR/IIkcmEyiFxFZTRKSBIjPAewRpbW+/LPhDP4VssF /qhoCMKTLo91VPpKtwAMjXSoWWBfF5Iele1ZSV4nygtsHyLLok5PCet51L+4/7jpcva5H4td1Pe A5019K4aO/3g= X-Received: by 2002:a05:620a:950:b0:75b:23a1:69e9 with SMTP id w16-20020a05620a095000b0075b23a169e9mr2966960qkw.0.1686081560844; Tue, 06 Jun 2023 12:59:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7BF0OXQlbyP+trA92coc66WCmRXcRkq82pLEgKOfgWrEmj9w4q8xBPweU77nPaiyb+F7D/og== X-Received: by 2002:a05:620a:950:b0:75b:23a1:69e9 with SMTP id w16-20020a05620a095000b0075b23a169e9mr2966939qkw.0.1686081560557; Tue, 06 Jun 2023 12:59:20 -0700 (PDT) Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id n15-20020a05620a152f00b0075932cd3ca0sm5109347qkk.69.2023.06.06.12.59.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 12:59:19 -0700 (PDT) Date: Tue, 6 Jun 2023 15:59:18 -0400 From: Peter Xu To: Yang Shi Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Hildenbrand , Alistair Popple , Andrew Morton , Andrea Arcangeli , "Kirill A . Shutemov" , Johannes Weiner , John Hubbard , Naoya Horiguchi , Muhammad Usama Anjum , Hugh Dickins , Mike Rapoport Subject: Re: [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry Message-ID: References: <20230602230552.350731-1-peterx@redhat.com> <20230602230552.350731-5-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 78E454000D X-Stat-Signature: njuwaq37enw9aednxhck5maghj7anf6k X-HE-Tag: 1686081565-179098 X-HE-Meta: U2FsdGVkX1/ET5j0rsRONd0bBzgnfirPIAjuL3a2Mn8xcBKgQx6tDu068NG6uqI+L1korIUWaBsXYHpli3EGt3iDOpd9f32Gj3nM203YNrAoRUOEGFd5u1+1cqrcWUHRAR93zfjYpzGO7oCpGsfkcfQHozPOGzkay9FH/scakaJ4+l5RHnKXRXMvC0E2OK/IsVmtW/ZS11Sap/D7iXVjjdK8kBjSl3ujkRCV0yrRAR2/09O+ya/yv9G0TAATOXt86qNSgdqR1q7F+lxham8jgufH83YCiP3W1TuPkITiLUXYkZiibTVzHPTEKLldAtj8ncHxEXvlequWm5dWC9lV24dQMhtTrP3EEn97LcypHU3wU2gEcB6Ph0f5ZTpZYx6GskK7XffDNPkPJA6pySmUVe/wxvon1hJaNOYeI4q066o9KpyYk/mJbRWtvX+8Eo42L7Bvc3O9bqbZ6D0pdCXiluEysWLw+QPUbHkQdYxZB0Ty43y50IuPmNzc4jp096jlqs48AyukhJBFka9uPtXP3D510ZmpjriR/I3Nyn/MrgtGnt3XepbgxWdKdPfPULLucIrHfWZDZfw4zFdifqcyUuVE/riPKMY3iUsajVBZ/1Ld1/DRKh6yHDb66oXK5i9rUnEX1JcIWwBYTL0UAPL5xu+Qzlwx7CNVH1ddCWRH11wWl88p21qDEyKIpISt5sBXVm31w7LZM2ddGWJbpJ0cR9MI2siMbApNITlg4Fpq0SJKHwaO3MUIRXB1OeEqp2Twl4FRJhxObvBAuX50HHZnXh8r0VgrkfUiHaxI2kpeuQLtoChhf7uBkpzvkMdTm4XGPjHhabBWj1hLTK81G5ROqwY4aOwsvqybk3LeoJ1aI2scdoCLEgXLRCxww3qe57LKmosRDzzQmJzFG0gNNc7LcALHE0QpEuccZSudkZJMGxgbBb6nJKU4hTKpUhJyUrFfna9uS/r2hlU0atJgYko zh/vhL5p u/HAhOCjKuuanAnpGNWhQ0CRrwduehnG01+/n3ppqUXXB0Vv5l9SzroYoQJIqoyW0apWZ7HGYrHJdh7dE9b8qbnWbVF721FTA26Y2iIHkCYAJLAuqvAaBox7/HgwTqXEJKuRENO/zns9wTpaJhvu4h6ZFgU0RC5tc6l6fY8r2Pr5y1QMNDY6N64VfwhfGeUoz6zOauIY01DUtFZQiFDr+h/hJCYemwhzPY5Gop5Gea4IyvUOOaJTJ+LPUIH8233MMho0XfsZVa8IdA24EV9FMtNtOEjtsOlxP8+OJsTNVE2pNjf4tDaB23/rIpiH9MUo1Jqf9x3lnlrP5/IGfXuLzWk/hCkmCWXo9KdfpYlIZcI+TryBcl5yNRtrgQXHa2Fr1Rcsy1IQp4HVRz04C1nSsu+Pac3aRT59HLuZH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 06, 2023 at 12:12:03PM -0700, Yang Shi wrote: > > > > @@ -1539,8 +1544,10 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, > > > > return err; > > > > } > > > > > > > > - if (pmd_trans_unstable(pmdp)) > > > > + if (pmd_trans_unstable(pmdp)) { > > > > + walk->action = ACTION_AGAIN; > > > > return 0; > > > > > > Had a quick look at the pagemap code, I agree with your analysis, > > > "returning 0" may mess up pagemap, retry should be fine. But I'm > > > wondering whether we should just fill in empty entries. Anyway I don't > > > have a strong opinion on this, just a little bit concerned by > > > potential indefinite retry. > > > > Yes, none pte is still an option. But if we're going to fix this anyway, > > it seems better to fix it with the accurate new thing that poped up, and > > it's even less change (just apply walk->action rather than doing random > > stuff in different call sites). > > > > I see that you have worry on deadloop over this, so I hope to discuss > > altogether here. > > > > Unlike normal checks, pmd_trans_unstable() check means something must have > > changed in the past very short period or it should just never if nothing > > changed concurrently from under us, so it's not a "if (flag==true)" check > > which is even more likely to loop. > > > > If we see the places that I didn't touch, most of them suggested a retry in > > one form or another. So if there's a worry this will also not the first > > time to do a retry (and for such a "unstable" API, that's really the most > > natural thing to do which is to retry until it's stable). > > IIUC other than do_anonymous_page() suggests retry (retry page fault), > others may not, for example: > - follow_pmd_mask: return -EBUSY I assumed a -EBUSY would mean if the caller still needs the page it'll just need to retry. It's actually a very rare errno to return in follow page... e.g. some gup callers may not even be able to handle -EBUSY afaiu, neither does the gup core (__get_user_pages), afaiu it'll just forwarded upwards. My bet is it's just so rare and only used with FOLL_SPLIT_PMD for now. I had a quick look, at least kvm handles -EBUSY as a real fault (hva_to_pfn, where it should translate that -EBUSY into a KVM_PFN_ERR_FAULT), I think it'll crash the hypervisor directly if returned from gup one day (not for now if never with !FOLL_SPLIT_PMD).. > - wp_clean_pmd_entry: actually just retry for pmd_none case, but the > pagewalk code does handle pmd_none by skipping it, so it basically > just retry once This one is very special IMHO, pmd_trans_unstable() should in most cases be used after someone checked pmd value before walking ptes. I had a feeling it's kind of abused, to check whether it's a huge pmd in whatever format.. IMHO it should just use the other pmd_*() helpers but I won't touch it in this series. > - min_core_pte_range: treated as unmapped range by calling > __mincore_unmapped_range Correct. Also pmd_devmap_trans_unstable(), called in 3 call sites: pmd_devmap_trans_unstable[1418] return pmd_devmap(*pmd) || pmd_trans_unstable(pmd); filemap_map_pmd[3423] if (pmd_devmap_trans_unstable(vmf->pmd)) { finish_fault[4390] if (pmd_devmap_trans_unstable(vmf->pmd)) handle_pte_fault[4922] if (pmd_devmap_trans_unstable(vmf->pmd)) They all suggest a retry on the page faults, afaiu. So indeed not all of them retries, but I doubt those ones that are not and whether that's the best we should have. Again, I'll leave that out of this series. > > Anyway I really don't have a strong opinion on this. I may be just > over-concerned. I just thought if nobody cares whether the result is > accurate or not, why do we bother fixing those cases? Because anyway we're already at it and it's easier and cleaner to do so? :) I would say if to fix this I think the most important ones are pagemap and memcg paths so far, but since I'm at it and anyway I checked all of them, I figured maybe I should just make everywhere do right and in the same pattern when handling unstable pmd. Especially, what this patch touched are all using walk_page*() routines (I left special ones in first patches), so it's quite straightforward IMHO to switch altogether using ACTION_AGAIN. Thanks, -- Peter Xu