From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79316C352A1 for ; Tue, 6 Dec 2022 21:51:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACFCA8E0003; Tue, 6 Dec 2022 16:51:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A7F9E8E0001; Tue, 6 Dec 2022 16:51:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F8FC8E0003; Tue, 6 Dec 2022 16:51:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7D2308E0001 for ; Tue, 6 Dec 2022 16:51:10 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 561BCC0DF0 for ; Tue, 6 Dec 2022 21:51:10 +0000 (UTC) X-FDA: 80213227500.04.355AD20 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf08.hostedemail.com (Postfix) with ESMTP id EF342160007 for ; Tue, 6 Dec 2022 21:51:09 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UutCjktM; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670363470; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nj0whFws2Fn002TRCkbhTkgQ4WLnLwXu8owr0nEppH0=; b=PGypBHqKlsnrJ747W7pX1TC2J3BDq51xSCxy0lfOEgoH7mnBq6FF4EkAusaaTfZqtfjIVB M3trLWmyjMb0XkEJvtnnGSnGWxWUfdGd8QQxruMlO+3dPx4U9Y4HjHMIroXoK9U0xsdm1s XUtpdZjrEtEcIXog20CJXshrIRL1zPg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UutCjktM; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670363470; a=rsa-sha256; cv=none; b=ZEBfflQ8wjvorTTbjAzq368m8wXzWwYl4Uw7zNA2t69c8K+ve5g8aPK8/CmUTdzdORsmni sC173AY6TGzNk8wLglbws4dmA+HcM4ZZDnX/4Uknn1O6eD8ArfKNtYjATNvrmsQT8fuep2 RaDhGKyT7Gd54sw/Khw8Ka/t6pola5k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670363469; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Nj0whFws2Fn002TRCkbhTkgQ4WLnLwXu8owr0nEppH0=; b=UutCjktM2K9LMwWNULU195HQlChNOM4d7XFD0wTYE7GoQ/IE4Z1yX7CBFsBOtbwkeOUl4F 3Qhhn7YKzUst836IdeNVfEiK9ZD1HTqFskWKaGzvDXxceu8aktTWye6X7Y9mdw/h1O8A0P u6eHMY8GZeCErxiW0psVzNYp9iIgMxA= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-158-y8VrwCh0M4aID8Kc2NediA-1; Tue, 06 Dec 2022 16:51:08 -0500 X-MC-Unique: y8VrwCh0M4aID8Kc2NediA-1 Received: by mail-qv1-f69.google.com with SMTP id jh2-20020a0562141fc200b004c74bbb0affso21997705qvb.21 for ; Tue, 06 Dec 2022 13:51:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Nj0whFws2Fn002TRCkbhTkgQ4WLnLwXu8owr0nEppH0=; b=IAum/SpE+dYNzPxoW5QXbVE/riSt5LUTJuyGe6GqoI/KFzVcWNVBWHe3NSS/RdG4WB Dmrqi3zzTpueDBiMW2DtwjQ411lEsYsy5lPOAaZ4JP0E+9tyoL+byvbWeZ4HpJv/svSa ATb2ur+x9tYZES4Dn5pFuawzr+AXDtTUuob+mfndUU47+GBIRyYzRV9hKvhbN5wZ5Oir iSwBRHuz7TDZvTExDpdA/oaFFAwyq4IO855XNKCTgWU0PjJ3zUJkSCl9DZVy5zThOWjI ngnmPtNlPvbloFQ0VT/BNeYoSntshqvyD+SlT4+oKhH8rLCx5RhfT6bOFBTgckDl0Jdh tkgA== X-Gm-Message-State: ANoB5plppgp8+P/qR2lpFk6BoSteDRDa1jo2VrjV1a0dy0ZedoXLZXhC Ce7u4eTxPRoxjcIFkajaSg+NeFIcPJrQQju74kWWbt5z2/Kr9hXTQJgyLLtkIKg7yi5rLK1zthV Y77Xopthwu50= X-Received: by 2002:a05:622a:4d89:b0:39c:da20:d454 with SMTP id ff9-20020a05622a4d8900b0039cda20d454mr633723qtb.45.1670363467229; Tue, 06 Dec 2022 13:51:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf5jzrGR9pnH7+zFAmw+Zeqojea6tLoa90BDqkV4NXJMVgz5BQLDIWp/lblniehjYKCxpMZmNw== X-Received: by 2002:a05:622a:4d89:b0:39c:da20:d454 with SMTP id ff9-20020a05622a4d8900b0039cda20d454mr633697qtb.45.1670363465509; Tue, 06 Dec 2022 13:51:05 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id de7-20020a05620a370700b006b615cd8c13sm15779134qkb.106.2022.12.06.13.51.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Dec 2022 13:51:05 -0800 (PST) Date: Tue, 6 Dec 2022 16:51:03 -0500 From: Peter Xu To: John Hubbard Cc: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton , Jann Horn , Andrew Morton , Andrea Arcangeli , Rik van Riel , Nadav Amit , Miaohe Lin , Muchun Song , David Hildenbrand Subject: Re: [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Message-ID: References: <20221129193526.3588187-1-peterx@redhat.com> <20221129193526.3588187-9-peterx@redhat.com> <0813b9ed-3c92-088c-4fb9-45fb648c6e73@nvidia.com> <97e3a8f2-df75-306e-2edf-85976c168955@nvidia.com> MIME-Version: 1.0 In-Reply-To: <97e3a8f2-df75-306e-2edf-85976c168955@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: uhd96t5newbi1amdtrtza6py7erdhe79 X-Rspam-User: X-Spamd-Result: default: False [6.02 / 9.00]; SORBS_IRL_BL(3.00)[209.85.219.69:received]; SUSPICIOUS_RECIPS(1.50)[]; SUBJECT_HAS_UNDERSCORES(1.00)[]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; BAD_REP_POLICIES(0.10)[]; BAYES_HAM(-0.08)[58.21%]; ARC_NA(0.00)[]; TAGGED_RCPT(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_ALLOW(0.00)[redhat.com:s=mimecast20190719]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_TWELVE(0.00)[13]; R_SPF_ALLOW(0.00)[+ip4:170.10.133.0/24]; DMARC_POLICY_ALLOW(0.00)[redhat.com,none]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; DKIM_TRACE(0.00)[redhat.com:+]; TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[] X-Rspamd-Queue-Id: EF342160007 X-Rspamd-Server: rspam06 X-HE-Tag: 1670363469-375715 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 06, 2022 at 01:03:45PM -0800, John Hubbard wrote: > On 12/6/22 08:45, Peter Xu wrote: > > I've got a fixup attached. John, since this got your attention please also > > have a look too in case there's further issues. > > > > Well, one question: Normally, the pattern of "release_lock(A); call f(); > acquire_lock(A);" is tricky, because one must revalidate that the state > protected by A has not changed while the lock was released. However, in > this case, it's letting page fault handling proceed, which already > assumes that pages might be gone, so generally that seems OK. Yes it's tricky, but not as tricky in this case. I hope my documentation supplemented that (in the fixup patch): + * @hugetlb_entry: if set, called for each hugetlb entry. Note that + * currently the hook function is protected by hugetlb + * vma lock to make sure pte_t* and the spinlock is valid + * to access. If the hook function needs to yield the + * thread or retake the vma lock for some reason, it + * needs to properly release the vma lock manually, + * and retake it before the function returns. The vma lock here makes sure the pte_t and the pgtable spinlock being stable. Without the lock, they're prone to be freed in parallel. > > However, I'm lagging behind on understanding what the vma lock actually > protects. It seems to be a hugetlb-specific protection for concurrent > freeing of the page tables? Not exactly freeing, but unsharing. Mike probably has more to say. The series is here: https://lore.kernel.org/all/20220914221810.95771-1-mike.kravetz@oracle.com/#t > If so, then running a page fault handler seems safe. If there's something > else it protects, then we might need to revalidate that after > re-acquiring the vma lock. Nothing to validate here. The only reason to take the vma lock is to match with the caller who assumes the lock taken, so either it'll be released very soon or it prepares for the next hugetlb pgtable walk (huge_pte_offset). > > Also, scattering hugetlb-specific locks throughout mm seems like an > unfortuate thing, I wonder if there is a longer term plan to Not Do > That? So far HMM is really the only one - normally hugetlb_entry() hook is pretty light, so not really throughout the whole mm yet. It's even not urgently needed for the other two places calling cond_sched(), I added it mostly just for completeness, and with the slight hope that maybe we can yield earlier for some pmd unsharers. But yes it's unfortunate, I just didn't come up with a good solution. Suggestion is always welcomed. Thanks, -- Peter Xu