From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E277C77B7A for ; Wed, 17 May 2023 21:50:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5060B900004; Wed, 17 May 2023 17:50:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B4AF900003; Wed, 17 May 2023 17:50:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 354B4900004; Wed, 17 May 2023 17:50:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 22B38900003 for ; Wed, 17 May 2023 17:50:42 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E312CA0725 for ; Wed, 17 May 2023 21:50:41 +0000 (UTC) X-FDA: 80801091882.27.E91F46D Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf10.hostedemail.com (Postfix) with ESMTP id 1F606C0015 for ; Wed, 17 May 2023 21:50:39 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=PPBehuov; spf=pass (imf10.hostedemail.com: domain of hughd@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684360240; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CYrqZZXifWUb49UwqN4ktjCYYTpIiyxjiceM91X1owo=; b=Bd4+UiPHApvLH1pcuLcG1joy412HBozBzUraZcsSJDuONXKMht66dzlhA1uMl11An2U5YY z2/rooamYjc1riFPhAM26oRciWBiVOSfmXRZZDd/vrROyc/p42vXiPMMK+s6RQTFuOH5m6 VkcgQjmGbAqGY1basXu1Ncm4QtNKbNo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=PPBehuov; spf=pass (imf10.hostedemail.com: domain of hughd@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684360240; a=rsa-sha256; cv=none; b=j3iL47vSHDqXS1SkenKShP9dQP0GXXxmhLbeAEl2Csm0oFq0C5Z64U1D30GjYEcaLYWOGA 9BjZ4xvqMg5KQUOp/67AfhXVxMqjE4uLzYKQVdQKsJVZu+thZz4RDGXmQTl78Oo9NN3hTW 36rxfgNxo0QPV0v8R+Qik8lrO+8amFY= Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-561d5b34e10so10333307b3.1 for ; Wed, 17 May 2023 14:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684360239; x=1686952239; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=CYrqZZXifWUb49UwqN4ktjCYYTpIiyxjiceM91X1owo=; b=PPBehuovPkLBa6/NGbQAXn/eE0OjqEyROrqLX97nN6jBSNnfJoc5x6zVzWooSZlBuR ygh7zMisKFOdApdpWLXdE38Ez48wbdl6bRgelI1M2q7oF1fjbdo+FT/cryTh0RkA2PgM PVNv73NfYOacgZraULasH6MPisPmrZxBB5SdyINIUR+rCWC/xGGFoY/3V4eCh48ezelN vlspmk4A/B3orH5IwCfSCJfMbueYznmKfST0fzCY2VeJqOWgXVBfHyrQGp7gwU4xxk/e 4DAXLkf7AssRwZV1LOldxmp7mNGGXTey4IiHanP7lOcu21DjwGEEnizw4Nh4dLrJMR0W 24RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684360239; x=1686952239; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CYrqZZXifWUb49UwqN4ktjCYYTpIiyxjiceM91X1owo=; b=IkLEeusESy7LzJAGN9sshI5Jkb73oF/sOztZshsjNiyoAuxdu6vZisYs3GsS+u98P2 yMMxA2W6nlKSVh1f/QWgo7JqK7VwoHgayMuXV+sNz936JjtKfAqHF+T/LAUyOMgxyqKH VNdk+Nw+5e7dyiyUsGWWxfRUvI3mgnY1wNary/WX2fO70Lk4Fqa4u5LiFgFYxoP+J4jI 80TdNzqLR66wBPAjJG9ZvKEAqVJa1x49J6a05+Z4MG856qgmYZuB5GDB9BmF8pZneFjs 2YSmxfcFtwX6CHqNYKCIZQNj6wyLmcwSgV3U3GzLpnRVtb+YLPVZPgrpjL3zLNWM3L92 K8OQ== X-Gm-Message-State: AC+VfDx6obmzLvaZRp/Fl71rq2VsUBB1qVS6cbSX3k5Nb1V5YyIAtz2C FMMrxAbjQ7fqXBEJR4+pwc5BjQ== X-Google-Smtp-Source: ACHHUZ5K1W6BqH54lP1fyWtEUpWwZ0lmy+14bsbtce6I/HyB5LShp2aTZ/lznkrMNM/kuQdbHXvf0A== X-Received: by 2002:a81:6744:0:b0:55a:af:2486 with SMTP id b65-20020a816744000000b0055a00af2486mr2691032ywc.18.1684360239031; Wed, 17 May 2023 14:50:39 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 129-20020a251187000000b00ba6ffc7ef35sm785837ybr.65.2023.05.17.14.50.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 May 2023 14:50:38 -0700 (PDT) Date: Wed, 17 May 2023 14:50:28 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Claudio Imbrenda cc: Hugh Dickins , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Russell King , Catalin Marinas , Will Deacon , Geert Uytterhoeven , Greg Ungerer , Michal Simek , Thomas Bogendoerfer , Helge Deller , John David Anglin , "Aneesh Kumar K.V" , Michael Ellerman , Alexandre Ghiti , Palmer Dabbelt , Heiko Carstens , Christian Borntraeger , John Paul Adrian Glaubitz , "David S. Miller" , Chris Zankel , Max Filippov , x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 15/23] s390: allow pte_offset_map_lock() to fail In-Reply-To: <20230517123546.672fb9b0@p-imbrenda> Message-ID: <4a15dbaa-1614-ce-ce1f-f73959cef895@google.com> References: <77a5d8c-406b-7068-4f17-23b7ac53bc83@google.com> <94aec8fe-383f-892-dcbf-d4c14e460a7@google.com> <20230517123546.672fb9b0@p-imbrenda> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1F606C0015 X-Stat-Signature: 1da8a1npshf9xi656pum75zm3uuyurwy X-HE-Tag: 1684360239-956921 X-HE-Meta: U2FsdGVkX18IxKG4Av19nfp5iy2to2LXuQVjb21Urmr3KQDkl8hjvGws8WP1eveFB0zzTPmAERYcgoTVByG+vRdGLqAWuyPY56B72lyHa/DvWg8fgMzmC9FuMqcX66S/tqX1nYNJbKDaW1bH6oUPyrIubiAaKEczHBfKf09DmHz+/Sg1ZE6m9qIc2/fRrEC2v5jmIE18Cw9tMxtZNR9+owq/3hIUYgfdGkseK1p+FMs+AVC+KN0Ws8LpC2n2ar++h1PhuybqarvX8V40hR7rgd6QXNyn6fIvYjNLMtOGKoNC366t5orguqNoWUJeiGVAz1216DTXmxc4IuOT1yvff16yoME1I6IJC0cDvmTiTPiUl5IXwW/otiEzrfvVGu5lc7GCAgdUSdo4K7buFR2RxgdTEPm/WA0szD5y05kOUy81/W7ZrK7nas6LzTor7Dn7+PWCXsEL3DDklsBBRu+Emb2Sd1fLVvw/SWYjIDaXuk1k+450qX6ZbqAD9QvgofORRaIJWeWHMgeKNLK8ba3R+SonvKjifIHvesF1QyyDHTNCS7JZ9rJDth3vix2KbTpdxMDQGmXN+tmibonEu73MQnvvX4mj3ZhH9PprKrn3V2hR8vbRxTMhFcYe4xCyRoj6evwq9l8WUCu0Xq6aPiuB5WPtP9xL68LBmBqUnwzIKfJPgmnLoFJ87ie36TUurDVcbAW8EAyfnkk5jp+3wIOWyDLPTmSwRTrA+pjsq5k6vDilrhmLLxa/FNWE5D0zWCXn1q1rvq+y6jLPcxESaPv7+nuEPwSiAnNEQbA5zE72+cqCkhzGkFdCMte6fm90N+hGtt70W3KeCIh8rDQqTjn5/tXEiP523zud/i6NSZ4V4zuL5BT8z+Vl7qcEy0zmJe7jhAnJmY5x5+R+ixSx/gdDYvjHvlQ/jR1xH8iws+LiIQ3fn7mn34Z1WDhMVofw17nDerzDKicUv275wbWw1W9 C7Kk5mAW rYD5mtTz2iNZjTs4NA7d0qN3kEheORum0QX5HxB/dIsuqBskEQyd+PPPuvF1tvz/4Yv+v7NP6JcwSrl/xhd7Ap2TZkr6wvOjSpWHJOm80ao+bZS5OBo7TeggA08CJbdJYoty3ab2dUIJt6PvU4LVPbf8OLJSGRSHjZqACcI3ejT5ARj9f3nv2KNHws/An88lpPaYN7gNMAigG16sWtHYPuGh8N8emTrWyFmazeGA8e/kUNQq+zGB4B+HsCHalVsEkad0Qz9cPqV2oxxiNrl5YKRqa90dpMOtPPd1pH2imApH0lg+YlSDIa+U8gAnUfFjxziV9BytVDkXk5nSrjrRCBz4O5/RR6ooUAwf6Ivk7SXdsZRn5K/gZuCQ+J/2m7HALn+AaSwxk8aUYGAOGyFdecMzT9X1po53dVi9AdPV3Tli6dxkvEYvTwc7h+g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 17 May 2023, Claudio Imbrenda wrote: > On Tue, 9 May 2023 22:01:16 -0700 (PDT) > Hugh Dickins wrote: > > > In rare transient cases, not yet made possible, pte_offset_map() and > > pte_offset_map_lock() may not find a page table: handle appropriately. > > > > Signed-off-by: Hugh Dickins > > --- > > arch/s390/kernel/uv.c | 2 ++ > > arch/s390/mm/gmap.c | 2 ++ > > arch/s390/mm/pgtable.c | 12 +++++++++--- > > 3 files changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c > > index cb2ee06df286..3c62d1b218b1 100644 > > --- a/arch/s390/kernel/uv.c > > +++ b/arch/s390/kernel/uv.c > > @@ -294,6 +294,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb) > > > > rc = -ENXIO; > > ptep = get_locked_pte(gmap->mm, uaddr, &ptelock); > > + if (!ptep) > > + goto out; You may or may not be asking about this instance too. When I looked at how the code lower down handles -ENXIO (promoting it to -EFAULT if an access fails, or to -EAGAIN to ask for a retry), this looked just right (whereas using -EAGAIN here would be wrong: that expects a "page" which has not been initialized at this point). > > if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) { > > page = pte_page(*ptep); > > rc = -EAGAIN; > > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > > index dc90d1eb0d55..d198fc9475a2 100644 > > --- a/arch/s390/mm/gmap.c > > +++ b/arch/s390/mm/gmap.c > > @@ -2549,6 +2549,8 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start, > > spinlock_t *ptl; > > > > ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); > > + if (!ptep) > > + break; > > so if pte_offset_map_lock fails, we abort and skip both the failed > entry and the rest of the entries? Yes. > > can pte_offset_map_lock be retried immediately if it fails? (consider > that we currently don't allow THP with KVM guests) > > Would something like this: > > do { > ptep = pte_offset_map_lock(...); > mb(); /* maybe? */ > } while (!ptep); > > make sense? No. But you're absolutely right to be asking: thank you for looking into it so carefully - and I realize that it's hard at this stage to judge what's appropriate, when I've not yet even posted the endpoint of these changes, the patches which make it possible not to find a page table here. And I'm intentionally keeping that vague, because although I shall only introduce a THP case, I do expect it to be built upon later in reclaiming empty page tables: it would be nice not to have to change the arch code again when extending further. My "rare transient cases" phrase may be somewhat misleading: one thing that's wrong with your tight pte_offset_map_lock() loop above is that the pmd entry pointing to page table may have been suddenly replaced by a pmd_none() entry; and there's nothing in your loop above to break out if that is so. But if a page table is suddenly removed, that would be because it was either empty, or replaced by a THP entry, or easily reconstructable on demand (by that, I probably mean it was only mapping shared file pages, which can just be refaulted if needed again). The case you're wary of, is if the page table were removed briefly, then put back shortly after: and still contains zero pages further down. That's not something mm does now, nor at the end of my several series, nor that I imagine us wanting to do in future: but I am struggling to find a killer argument to persuade you that it could never be done - most pages in a page table do need rmap tracking, which will BUG if it's broken, but that argument happens not to apply to the zero page. (Hmm, there could be somewhere, where we would find it convenient to remove a page table with intent to do ...something, then validation of that isolated page table fails, so we just put it back again.) Is it good enough for me to promise you that we won't do that? There are several ways in which we could change __zap_zero_pages(), but I don't see them as actually dealing with the concern at hand. One change, I've tended to make at the mm end but did not dare to interfere here: it would seem more sensible to do a single pte_offset_map_lock() outside the loop, return if that fails, increment ptep inside the loop, pte_unmap_unlock() after the loop. But perhaps you have preemption reasons for not wanting that; and although it would eliminate the oddity of half-processing a page table, it would not really resolve the problem at hand: because, what if this page table got removed just before __zap_zero_pages() tries to take the lock, then got put back just after? Another change: I see __zap_zero_pages() is driven by walk_page_range(), and over at the mm end I'm usually setting walk->action to ACTION_AGAIN in these failure cases; but thought that an unnecessary piece of magic here, and cannot see how it could actually help. Your "retry the whole walk_page_range()" suggestion below would be a heavier equivalent of that: but neither way gives confidence, if a page table could actually be removed then reinserted without mmap_write_lock(). I think I want to keep this s390 __zap_zero_pages() issue in mind, it is important and thank you for raising it; but don't see any change to the patch as actually needed. Hugh > > > otherwise maybe it's better to return an error and retry the whole > walk_page_range() in s390_enable_sie() ? it's a slow path anyway. > > > if (is_zero_pfn(pte_pfn(*ptep))) > > ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID)); > > pte_unmap_unlock(ptep, ptl); > > [...]