From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FD92C07E95 for ; Wed, 7 Jul 2021 21:06:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0CF8361C77 for ; Wed, 7 Jul 2021 21:06:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0CF8361C77 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D21796B0011; Wed, 7 Jul 2021 17:06:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD1D86B0070; Wed, 7 Jul 2021 17:06:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B99266B0071; Wed, 7 Jul 2021 17:06:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 8F05B6B0011 for ; Wed, 7 Jul 2021 17:06:18 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E5DA226DE4 for ; Wed, 7 Jul 2021 21:06:17 +0000 (UTC) X-FDA: 78337024794.02.4FC2774 Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by imf15.hostedemail.com (Postfix) with ESMTP id 86A08D000080 for ; Wed, 7 Jul 2021 21:06:17 +0000 (UTC) Received: by mail-io1-f53.google.com with SMTP id k11so5446723ioa.5 for ; Wed, 07 Jul 2021 14:06:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+0wM/2+93HPOOZpIMUtQMPBQcZGKatuACzxxoFN87Do=; b=aLQqcVTVO4qJmBZn8w0t/YXHboJPIwZ1BBtC0HOOcyidJuyvq140rPxjrfZTOUTJYH B0eC4cRQyCHvtFIPZZogxy+e4jD4GPXpzZJnYJiTaPj4cGxBVOQdjPkzO5fIdlctO+JR th4UBUAtgyBOVZsIF+0y2yRc+mFrdImxtD69VpzD4dJTvJLkSO/Op24Elr1uxcVc1tQK QNS3i50EdbpTh7TfiossnjTQ0YTDep38fK7lHoGIlwkwtqSOcM8BoZoO894upQ+gwlJa 5hytuMBCKaUOse33+CfKtAsDpyB8eUAEeSeM8V0sD0lhe9g8VMH8fXD3YNenE5TFANLB +E2A== X-Gm-Message-State: AOAM5332ZQ3U76ncgs6/GOHzlJ45l+PtokXHIjZfdcjzq2NwPAXqMe6E RBCFW0CdkEh9bRmQAelK0eg= X-Google-Smtp-Source: ABdhPJxeTrGLZn2M7PhsKro+3iFOnM2bXcS3AptZckdRj5vtaqs5UD3Yn1wmRIRpoNQMl8O7LtrGhA== X-Received: by 2002:a02:cd1a:: with SMTP id g26mr11273707jaq.38.1625691976865; Wed, 07 Jul 2021 14:06:16 -0700 (PDT) Received: from google.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id k5sm10203ilu.70.2021.07.07.14.06.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 14:06:16 -0700 (PDT) Date: Wed, 7 Jul 2021 21:06:15 +0000 From: Dennis Zhou To: Linus Torvalds Cc: Tejun Heo , Christoph Lameter , Linux-MM , Linux Kernel Mailing List Subject: Re: [GIT PULL] percpu fixes for v5.14-rc1 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf15.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com X-Rspamd-Server: rspam05 X-Stat-Signature: 1is8qjgama6dofrbfgyqs85ekqh91tji X-Rspamd-Queue-Id: 86A08D000080 X-Rspam-User: nil X-HE-Tag: 1625691977-350679 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 07, 2021 at 11:40:45AM -0700, Linus Torvalds wrote: > On Wed, Jul 7, 2021 at 6:00 AM Dennis Zhou wrote: > > > > This is just a single change to fix percpu depopulation. The code relied > > on depopulation code written specifically for the free path and relied > > on vmalloc to do the tlb flush lazily. As we're modifying the backing > > pages during the lifetime of a chunk, we need to also flush the tlb > > accordingly. > > I pulled this, but I ended up unpulling after looking at the fix. > > The fix may be perfectly correct, but I'm looking at that > pcpu_reclaim_populated() function, and I want somebody to explain to > me what it's ok to drop and re-take the 'pcpu_lock' and just continue. > > Because whatever it was protecting is now not protected any more. > > It *looks* like it's intended to protect the pcpu_chunk_lists[] > content, and some other functions that do this look ok. So for > example, pcpu_balance_free() at least removes the 'chunk' from the > pcpu_chunk_lists[] before it drops the lock and then works on the > chunk contents. > > But pcpu_reclaim_populated() seems to *leave* chunk on the > pcpu_chunk_lists[], drop the lock, and then continue to use 'chunk'. > > That odd "release lock and continue to use the data it's supposed to > protect" seems to be pre-existing, but > > (a) this is the code that caused problems to begin with > > and > > (b) it seems to now happen even more. > > So maybe this code is right. But it looks very odd to me, and I'd like > to get more explanations of _why_ it would be ok before I pull this > fix, since there seems to be a deeper underlying problem in the code > that this tries to fix. > Yeah I agree. I think I've inadvertently made this more complex than it needs to be and intend to clean it up. Percpu has a mutex lock and spinlock. The mutex lock is responsible for lifetime of a chunk and correctness of the backing pages, chunk->populated[]. The spinlock protects the other members of the chunk as well as pcpu_chunk_lists[]. The pcpu_balance_workfn() is used to serialize the freeing of chunks and maintenance of a floating # of pages to suffice atomic allocations. This is where depopulation is being bolted onto. It uses the serialization of: mutex_lock(&pcpu_alloc_mutex); pcpu_balance_free() pcpu_reclaim_populated() pcpu_balance_populated() mutex_unlock(&pcpu_alloc_mutex); while holding the mutex lock to know that the chunk we are modifying will not be freed. It's this that makes it okay to just pick up the lock and continue. Depopulation adds complexity to the lifecycle of a chunk through others freeing back percpu allocations. So, unlike pcpu_balance_free(), taking the chunk off of pcpu_chunk_lists[] doesn't guarantee someone else isn't accessing the chunk because they are freeing an object back. To handle this, a notion of isolated chunks is introduced and these chunks become effectively hidden to the allocation path. We are also holding the mutex lock throughout this process which prevents the allocation path from simultaenously adding backing pages to a chunk. I hope this explanation makes sense. Thanks, Dennis