From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60F4FC433F5 for ; Tue, 3 May 2022 23:16:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8074C6B0071; Tue, 3 May 2022 19:15:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 78F336B0072; Tue, 3 May 2022 19:15:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 608026B0073; Tue, 3 May 2022 19:15:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4FFF06B0071 for ; Tue, 3 May 2022 19:15:59 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1E91D295CA for ; Tue, 3 May 2022 23:15:59 +0000 (UTC) X-FDA: 79425991638.02.4BC23E4 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf04.hostedemail.com (Postfix) with ESMTP id B61D640091 for ; Tue, 3 May 2022 23:15:51 +0000 (UTC) Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-2f7c57ee6feso196416437b3.2 for ; Tue, 03 May 2022 16:15:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zjRlZgTi/esjtP1R8TFUE8oyCrPmBm15ZczzsoLj5I4=; b=oAcsDfB7750w+owbU2bBj3pjY9gOdXmTF28gsX1v1Yza93xIOGG6N38asX32PatnUO C87KvFLrvpMKD9MjPDw4YeapxfDZAycE3wd9jvI0DUGAZlUS1VYpcFQOmxNqTqIhB5+5 ZArzQbn/CJ34UCu0ceTUd8g3PwiI82Jb/UZFOPqSMiNb03TZJuerKaA+dmxMabcmxG3v zGTMe22cBu5dgoqmtOhejzScmxPiOEfL7zlrEuCFHIlpCoGT+a/rbN+AtIhJqjR3F3eP AtVS6T7ybvD2ZPvO64Pglu3U7M5u75Ol+fU3/p3w8KEKlfrIdRRwjhV/t0nrRc/o4TvJ U/fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zjRlZgTi/esjtP1R8TFUE8oyCrPmBm15ZczzsoLj5I4=; b=YqhUxJlxrJ8RltPQ16DBzBDSZUrtNndpMNt+zrpoCNPS24yFIrfhme3cioDAOgL9q7 wHYrs+mWQhR1Aa/S941ejzP1Aes2PkTP3/Hfh760LC0vnrk/vriljCsT4NHPXnlqy5uc GmrrYyFgLDFNH9eY3F48ZoVwr4NI1WLFV6vLY+dcMvXk9XA58GdCrGN4Bc8MOp/tJfuQ XMOuS+5etJgSQDSYxZHu7OcYtNguKO2EsA3drPxxEWVichbmEK+KvslaSb83i9ZNS4R0 bmyM01PqxRGb6tdNjQtMfL27BIh/YJoFZQSxB/7p7NZlbqpTHWl7zBeBqQZor2hUaK5M de6A== X-Gm-Message-State: AOAM530xN6Lmn7GhF15h+e/hhbcYfhYST0d6NnkMflB31mCYiNwYn1N6 DpDa/J+Nm1YsBHdZ7sZcdTR/L8k+NIQuTLnSC4bfNA== X-Google-Smtp-Source: ABdhPJwc+Ijxgn6zG0q/Eoq1RZHnAzGoB1cVJxys4MzkG8MfrvYSVVV6M1qDaZTMFfRWY55KfQtE+DkjsPqcs05UPhs= X-Received: by 2002:a0d:d543:0:b0:2f7:e554:68c with SMTP id x64-20020a0dd543000000b002f7e554068cmr18077821ywd.380.1651619757761; Tue, 03 May 2022 16:15:57 -0700 (PDT) MIME-Version: 1.0 References: <20220503155913.GA1187610@paulmck-ThinkPad-P17-Gen-1> <20220503163905.GM1790663@paulmck-ThinkPad-P17-Gen-1> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 3 May 2022 16:15:46 -0700 Message-ID: Subject: Re: Memory allocation on speculative fastpaths To: Matthew Wilcox Cc: "Paul E. McKenney" , Michal Hocko , "Liam R. Howlett" , Michel Lespinasse , Johannes Weiner , linux-mm , LKML , David Hildenbrand , Davidlohr Bueso Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B61D640091 X-Stat-Signature: 5jtg7jeg3btiygr4zrjai5m91b6rebwu X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=oAcsDfB7; spf=pass (imf04.hostedemail.com: domain of surenb@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1651619751-848229 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox wrote: > > On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: > > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > Just following up from off-list discussions yesterday. > > > > > > > > The requirements to allocate on an RCU-protected speculative fastpath > > > > seem to be as follows: > > > > > > > > 1. Never sleep. > > > > 2. Never reclaim. > > > > 3. Leave emergency pools alone. > > > > > > > > Any others? > > > > > > > > If those rules suffice, and if my understanding of the GFP flags is > > > > correct (ha!!!), then the following GFP flags should cover this: > > > > > > > > __GFP_NOMEMALLOC | __GFP_NOWARN > > > > > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > > > > Ah, good point on GFP_NOWAIT, thank you! > > Johannes (I think it was?) made the point to me that if we have another > task very slowly freeing memory, a task in this path can take advantage > of that other task's hard work and never go into reclaim. So the > approach we should take is: > > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > if (failure) { > rcu_read_unlock(); > do_reclaim(); > return FAULT_FLAG_RETRY; > } > > ... but all this is now moot since the approach we agreed to yesterday > is: I think the discussion was about the above approach and Johannes suggested to fallback to the normal pagefault handling with mmap_lock locked if PMD does not exist. Please correct me if I misunderstood here. > > rcu_read_lock(); > vma = vma_lookup(); > if (down_read_trylock(&vma->sem)) { > rcu_read_unlock(); > } else { > rcu_read_unlock(); > mmap_read_lock(mm); > vma = vma_lookup(); > down_read(&vma->sem); > } > > ... and we then execute the page table allocation under the protection of > the vma->sem. > > At least, that's what I think we agreed to yesterday. Honestly, I don't remember discussing vma->sem at all. My understanding is that two approaches were differing by the section covered by rcu_read_lock/rcu_read_unlock. The solution that you suggested would handle pagefault completely under RCU as long as it's possible and would fallback to the mmap_lock if it's impossible, while Michel's implementation was taking rcu_read_lock/rcu_read_unlock for smaller sections and would use vma->seq_number to detect any vma changes between these sections. Your suggested approach sounds simpler and the way I understood the comments is that we should give it a try. Did I miss anything? Thanks, Suren. >