From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A01BCC4332F for ; Wed, 23 Nov 2022 18:56:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C8026B0071; Wed, 23 Nov 2022 13:56:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1786F6B0073; Wed, 23 Nov 2022 13:56:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 040F26B0074; Wed, 23 Nov 2022 13:56:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EB9E46B0071 for ; Wed, 23 Nov 2022 13:56:10 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C771D1A063C for ; Wed, 23 Nov 2022 18:56:10 +0000 (UTC) X-FDA: 80165612100.03.DB2FA6A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id D85F2120011 for ; Wed, 23 Nov 2022 18:56:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669229767; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=a6sqqXPcmfVlW87TsasyyU0paA+ECYRNhoy4wTU2alA=; b=YuPFFjmLps5AAYjEcgsGd8/YDZySwuWgRyoGrraIfQjXu91EI0wSGlszvHAZcO/PpBguxH LqMhAQiSSBQJEDxRne30IIomXys2bYR9TS2PoiXIx816XonBKothgbdO4Mw7W9/5E4sh42 7zuWt+JB68/O1+T4CnIJ0im43sXHmMc= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-369-yhl_DXsIMX-P7Lzp2VMthA-1; Wed, 23 Nov 2022 13:56:04 -0500 X-MC-Unique: yhl_DXsIMX-P7Lzp2VMthA-1 Received: by mail-qv1-f70.google.com with SMTP id b2-20020a0cfe62000000b004bbfb15297dso17202423qvv.19 for ; Wed, 23 Nov 2022 10:56:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=a6sqqXPcmfVlW87TsasyyU0paA+ECYRNhoy4wTU2alA=; b=6Qu3n7J3Kslu2Qxvfb9EbSMBvYlVFAmrufkPPhzLlyGcZLXaSk51EVVXxU/6QKf150 v+8BdD27IytwcIt5jEj9OdwY7jnfR2m6b87js3crGR4MDwrP0AIVADNq9Z+lEeY9EeIH TA/mSHJa677LGoW4GGEaklYPieZKURJ8NbyWCA10FErJUEiujj3lPBkcP4qvNg6zUKt1 R/BSc74eHbJ+IJPcDX22/h6x88WBF3tdKxvPDa12mVnu05kW20uIAI3BdykG3XL7U8mO hKdnXLC/lsr9VYYuiOVOlg03B0NLSlPWSuguOBFrNtHTvsHRkyLtaICpbrWL6zXOoLMk HObg== X-Gm-Message-State: ANoB5pkAAoYQHrnCYN1UXB5aOpMPAW2vzPl/LZ1fnjwCRfGr6l45OCLQ uwWMRo6vTGLqWa9athCbwtMn0vVcH9+oxUiQsqsyeny5rytz/jorJFVgNTNHMwp407VyL7U8d74 UYCHWFXPkMck= X-Received: by 2002:a05:622a:5819:b0:3a5:5d34:66a9 with SMTP id fg25-20020a05622a581900b003a55d3466a9mr11620632qtb.623.1669229763752; Wed, 23 Nov 2022 10:56:03 -0800 (PST) X-Google-Smtp-Source: AA0mqf7tQoi855YgG5ScHXRBMysHyiHRbOz7fpeOvjoKCM79Cg67F0B7UodUtg4i6tl93cBMMKALaA== X-Received: by 2002:a05:622a:5819:b0:3a5:5d34:66a9 with SMTP id fg25-20020a05622a581900b003a55d3466a9mr11620607qtb.623.1669229763484; Wed, 23 Nov 2022 10:56:03 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id k1-20020a05620a414100b006eea4b5abcesm12566678qko.89.2022.11.23.10.56.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Nov 2022 10:56:03 -0800 (PST) Date: Wed, 23 Nov 2022 13:56:01 -0500 From: Peter Xu To: Mike Kravetz Cc: David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rik van Riel , Muchun Song , Andrew Morton , James Houghton , Nadav Amit , Andrea Arcangeli , Miaohe Lin Subject: Re: [PATCH RFC v2 00/12] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Message-ID: References: <20221118011025.2178986-1-peterx@redhat.com> <70376d57-7924-8ac9-9e93-1831248115a0@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669229770; a=rsa-sha256; cv=none; b=2XE3a7KLUUMjRfbK2VQf/Op/YanVbdTtMtB2tN4RxjO/qnhO6wm8Oll2WPqT4YaIJTGVtt Dx9Jik11tHKMjW0KcJmKQJ/raitENvkqS6zKJQtVml5GjcI/fReExGpk1HMTAKcmPzq1V9 NvwjEda6KYi1/ohvvU8nRsC7CLQZAkM= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YuPFFjmL; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669229770; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a6sqqXPcmfVlW87TsasyyU0paA+ECYRNhoy4wTU2alA=; b=iQeqNRTWl7IuxysFMGbfw8VNHP9r1OrbyEVPNvGmUHPKdX+4M1XRhRnNcywa7ivvBfX2TY 4HpnABsEm1HdG4QB0+70GjUqrnksUps52g4601xYpdSXh3+OvZWc6CVjGntwMIoUdYsAZK zLtkBRRYj5wZrae03n5CRNqJO/f1G2k= X-Rspam-User: X-Stat-Signature: wjat4z1ns3bbs34jygemzfbqfoytek6n X-Rspamd-Queue-Id: D85F2120011 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YuPFFjmL; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam07 X-HE-Tag: 1669229769-595426 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 23, 2022 at 10:21:30AM -0800, Mike Kravetz wrote: > On 11/23/22 10:09, Peter Xu wrote: > > On Wed, Nov 23, 2022 at 10:40:40AM +0100, David Hildenbrand wrote: > > > Let me try understand the basic problem first: > > > > > > hugetlb walks page tables semi-lockless: while we hold the mmap lock, we > > > don't grab the page table locks. That's very hugetlb specific handling and I > > > assume hugetlb uses different mechanisms to sync against MADV_DONTNEED, > > > concurrent page fault s... but that's no news. hugetlb is weird in many ways > > > :) > > > > > > So, IIUC, you want a mechanism to synchronize against PMD unsharing. Can't > > > we use some very basic locking for that? > > > > Yes we can in most cases. Please refer to above paragraph [1] where I > > referred Mike's recent work on vma lock. That's the basic locking we need > > so far to protect pmd unsharing. I'll attach the link too in the next > > post, which is here: > > > > https://lore.kernel.org/r/20220914221810.95771-1-mike.kravetz@oracle.com > > > > > > > > Using RCU / disabling local irqs seems a bit excessive because we *are* > > > holding the mmap lock and only care about concurrent unsharing > > > > The series wanted to address where the vma lock is not easy to take. It > > originates from when I was reading Mike's other patch, I forgot why I did > > that but I just noticed there's some code path that we may not want to take > > a sleepable lock, e.g. in follow page code. > > Yes, it was the patch suggested by David, > > https://lore.kernel.org/linux-mm/20221030225825.40872-1-mike.kravetz@oracle.com/ > > The issue was that FOLL_NOWAIT could be passed into follow_page_mask. If so, > then we do not want potentially sleep on the mutex. > > Since you both are on this thread, I thought of/noticed a related issue. In > follow_hugetlb_page, it looks like we can call hugetlb_fault if FOLL_NOWAIT > is set. hugetlb_fault certainly has the potential for sleeping. Is this also > a similar issue? Yeah maybe the clean way to do this is when FAULT_FLAG_RETRY_NOWAIT is set we should always try to not sleep at all. But maybe that's also not urgently needed. So far I don't see any real non-sleepable caller of it exists - the only one (kvm) can actually sleep.. It's definitely not wanted, as kvm only attach NOWAIT for an async fault, so ideally any wait should be offloaded into async threads. Now with the hugetlb code being able to sleep with NOWAIT, the waiting time will be accounted to real fault time of vcpu and partly invalidate async page fault handling. Said that, it also means no immediate fault would trigger either. It's just that for the pmd unshare we can start to at least use non-sleep version of the locks. Now I'm more concerned with huge_pmd_share(), which seems to have no good option but only the RCU approach. One other thing I noticed is I cannot quickly figure out whether follow_hugetlb_page() is needed anymore, since follow_page_mask() seems to be also fine with walking hugetlb pgtables. follow_hugetlb_page() can be traced back to the git initial commit, I had a feeling that the old version of follow_page_mask() doesn't support hugetlb, but now after it's supported maybe we can drop follow_hugetlb_page() as a whole? -- Peter Xu