From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 290BACDB47E for ; Sun, 15 Oct 2023 18:53:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E15E6B0202; Sun, 15 Oct 2023 14:53:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0922E6B0203; Sun, 15 Oct 2023 14:53:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9B996B0205; Sun, 15 Oct 2023 14:53:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DA2366B0202 for ; Sun, 15 Oct 2023 14:53:07 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AF1C2120898 for ; Sun, 15 Oct 2023 18:53:07 +0000 (UTC) X-FDA: 81348593214.09.2FE01D9 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by imf30.hostedemail.com (Postfix) with ESMTP id 2447080002 for ; Sun, 15 Oct 2023 18:53:04 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fOg679F5; spf=none (imf30.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697395985; a=rsa-sha256; cv=none; b=2QiJIlPECOahygLSI0vWYIpUFmZgWm/kuOE60BJq1lTZk3gupYzWDrsa8JNWUIXDJ0tenZ tRfOy+Rt7gnyYJlKBy2C1C/Gx4RB/HXwf6Wgs8bbVkBZI1MXVyz9NeTAQctPUer4zTBAjp kK149U2abCgy+V9UdG4bq0ihRcMWCq0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fOg679F5; spf=none (imf30.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697395985; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7ybP0ma4tc+Xq4kUDw8war/2xM2/22QQy+SPdH2O6+k=; b=nq5sMmJGvVR1qMncZGGgVB0nVLCuFU7A8C1EhbxrNiXjmqPYIIm+BkKWsvvYP1a+cN+ZXR Q1bLGUjwkhMyBbYwD45Ows+UNyp6qWTPJCtvMuuhGjveKLRKvhdJ7j5eH3DQ+LxRMTg2uu GSc1VWzmeJIzq/tLZjsWgjCxYkaMwsc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697395985; x=1728931985; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=452iiupu1odqW6I+HChLA99o8E7DPv+HDaY8t/F97ZU=; b=fOg679F5vi/RUZcyyhBVQOqJXvclgwPkwKmgMT9SWcAWevSPyBkxsWWL uOwvEHsAYZngBkKdRilisDbanxUDAytxNR8xBrBorSXfFiePTG/hrUdXc wseXOlgeArruPF7Zo9++EE+M9PJxIr9R2FGXQQ/qsPnNFcSUCSxav0dT7 ZzjhkiMcg6gc//Ml9hzGYFR21NDjZDO+J8kmCfILJMUVbBYnV+e2IH555 qQwuyw/kMm9u7gSPcLCBjjYM1SfOWZ175Yr7Or4E1/RZ1xMWcG84VGznZ sr82sVCi9uFdo7qduIcrBDwiGxsXC3nVaQmbbI6bvaB48QrfKfTwOjsP2 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="385248202" X-IronPort-AV: E=Sophos;i="6.03,226,1694761200"; d="scan'208";a="385248202" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 11:53:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10863"; a="790561121" X-IronPort-AV: E=Sophos;i="6.03,226,1694761200"; d="scan'208";a="790561121" Received: from bmihaile-mobl1.ger.corp.intel.com (HELO box.shutemov.name) ([10.249.37.165]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2023 11:52:54 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id BD735109D0A; Sun, 15 Oct 2023 21:52:51 +0300 (+03) Date: Sun, 15 Oct 2023 21:52:51 +0300 From: "Kirill A. Shutemov" To: Nikolay Borisov Cc: Borislav Petkov , Andy Lutomirski , Dave Hansen , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Dario Faggioli , Mike Rapoport , David Hildenbrand , Mel Gorman , marcelo.cerri@canonical.com, tim.gardner@canonical.com, khalid.elmously@canonical.com, philip.cox@canonical.com, aarcange@redhat.com, peterx@redhat.com, x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, stable@kernel.org Subject: Re: [PATCH] efi/unaccepted: Fix soft lockups caused by parallel memory acceptance Message-ID: <20231015185251.umkdsr7jx2qrlm2x@box.shutemov.name> References: <20231014204040.28765-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2447080002 X-Stat-Signature: saz5pozun8igbtay6houpfu3jmnagjrs X-Rspam-User: X-HE-Tag: 1697395984-353677 X-HE-Meta: U2FsdGVkX18ZHsYBhvUdOn3/i38n7VyeN0kpiSZhoOcFrEqEiYEbjt1Yxx+L6LSMfBBq8iCkC+AXHYNrsrRWeJT4OZdJKVCCxF3M04WeZQumuEhCZmys2THZg4I3ehWpL6oJ21mwkw+GPOOEhK1ZyZT2SrtMiAQaCeBP710JsKQx54g92HjzQElYKNpbmYOSjmLAXHJI9c5CLbJC836lkk/yIwipN7PVG9DcuxuvtuCNXn4VG89f/q6I3jcgpeAFgACVP5WDgTJneQkTIJ/iMMJ1C0HyD4N8RfTbrVibUKG6ZMZ26bOMiYY+aahGfgBQ1sPvnaKSZYNIwMeaC03lgSPXp/agfEw7hnvdw8xiI2T9YacnO4txy0R/FJR+a2+pRIP6e1ATkqA/tckLnmJ+3azYKUNW3X8Ax2fzDewEycywMD/65ygA9xNBz9dZLZ3qRW937kNywsh0iAGdYRn/9qC6A8x7WjLLmPiNKb/06bjYOoZt7b2loBKpo9CdFVfMzcTGWer8gmdMi9y0XHgOu0hyLrm20IZIYiautS6lCxJvk0tkv73/Z3WsB6Ft32N0WHnJmZI3NrZnZOHjqYM3AArwajoprR20lDUXpYydw5FHyvT3NXY5k6ssq0io20eYXuau3dugHzs1KjJTiCtN+F3pe6WuGV2nR2kBwkp3Z0JlcJzUtDpNOPfKoTN9H2G8BzciBkF3cYxrdS/ww+FmhBynLTXdvjFX0FMtxeEx94nCIRKouJwqLvBCDf/zcYIIDXGOX77AGbov1OkwV7vnfmHu9no37kY1m/DYay125em0ats4JwbN+VxEZhRyX5t+/CmAyyYn6GkL2mba2ZAvNd5QJKafsSuWy3Ibgj6odwWSrKhwJmpue83ImV9+ObKZHp7UrAaXqCAr5B182eVXfbgBUhdiCs0OoxWCw4+dp82ZbpwPS85vLXLrozPJZdMrXDK401jzosaeJ66lt9N 7F7mt4O1 ENZVycJPFHI6wKVDJzJfwJa0mlQ8BgHakdRqDhDNiatj7gl5waBmJUyTtnzNfQQQDUGrqbJ1wpcsOyDf1PnHQynOvus8WIULevjLUFA2zTgOVvsWBbVHkefYa19uHzlX+g2MKMhPN9z2/Tnu4tJ8IS+CXEKcHKTCfgdREzUq5pxISxg4mZIk0V9Ob/Kr+AuXW8gJNgZBRFUym4xopHbUCGMwbXrWhiWO/wL65 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Oct 15, 2023 at 08:02:16PM +0300, Nikolay Borisov wrote: > > > On 14.10.23 г. 23:40 ч., Kirill A. Shutemov wrote: > > Michael reported soft lockups on a system that has unaccepted memory. > > This occurs when a user attempts to allocate and accept memory on > > multiple CPUs simultaneously. > > > > The root cause of the issue is that memory acceptance is serialized with > > a spinlock, allowing only one CPU to accept memory at a time. The other > > CPUs spin and wait for their turn, leading to starvation and soft lockup > > reports. > > > > To address this, the code has been modified to release the spinlock > > while accepting memory. This allows for parallel memory acceptance on > > multiple CPUs. > > > > A newly introduced "accepting_list" keeps track of which memory is > > currently being accepted. This is necessary to prevent parallel > > acceptance of the same memory block. If a collision occurs, the lock is > > released and the process is retried. > > > > Such collisions should rarely occur. The main path for memory acceptance > > is the page allocator, which accepts memory in MAX_ORDER chunks. As long > > as MAX_ORDER is equal to or larger than the unit_size, collisions will > > never occur because the caller fully owns the memory block being > > accepted. > > > > Aside from the page allocator, only memblock and deferered_free_range() > > accept memory, but this only happens during boot. > > > > The code has been tested with unit_size == 128MiB to trigger collisions > > and validate the retry codepath. > > > > Signed-off-by: Kirill A. Shutemov > > Reported-by: Michael Roth > Fixes: 2053bc57f367 ("efi: Add unaccepted memory support") > > Cc: > > --- > > drivers/firmware/efi/unaccepted_memory.c | 55 ++++++++++++++++++++++-- > > 1 file changed, 51 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c > > index 853f7dc3c21d..8af0306c8e5c 100644 > > --- a/drivers/firmware/efi/unaccepted_memory.c > > +++ b/drivers/firmware/efi/unaccepted_memory.c > > @@ -5,9 +5,17 @@ > > #include > > #include > > -/* Protects unaccepted memory bitmap */ > > +/* Protects unaccepted memory bitmap and accepting_list */ > > static DEFINE_SPINLOCK(unaccepted_memory_lock); > > +struct accept_range { > > + struct list_head list; > > + unsigned long start; > > + unsigned long end; > > +}; > > + > > +static LIST_HEAD(accepting_list); > > + > > /* > > * accept_memory() -- Consult bitmap and accept the memory if needed. > > * > > @@ -24,6 +32,7 @@ void accept_memory(phys_addr_t start, phys_addr_t end) > > { > > struct efi_unaccepted_memory *unaccepted; > > unsigned long range_start, range_end; > > + struct accept_range range, *entry; > > unsigned long flags; > > u64 unit_size; > > @@ -78,20 +87,58 @@ void accept_memory(phys_addr_t start, phys_addr_t end) > > if (end > unaccepted->size * unit_size * BITS_PER_BYTE) > > end = unaccepted->size * unit_size * BITS_PER_BYTE; > > - range_start = start / unit_size; > > - > > + range.start = start / unit_size; > > + range.end = DIV_ROUND_UP(end, unit_size); > > +retry: > > spin_lock_irqsave(&unaccepted_memory_lock, flags); > > + > > + /* > > + * Check if anybody works on accepting the same range of the memory. > > + * > > + * The check with unit_size granularity. It is crucial to catch all > > + * accept requests to the same unit_size block, even if they don't > > + * overlap on physical address level. > > + */ > > + list_for_each_entry(entry, &accepting_list, list) { > > + if (entry->end < range.start) > > + continue; > > + if (entry->start >= range.end) > > + continue; > > + > > + /* > > + * Somebody else accepting the range. Or at least part of it. > > + * > > + * Drop the lock and retry until it is complete. > > + */ > > + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); > > + cond_resched(); > > + goto retry; > > + } > > So this works for the cases where we have concurrent acceptance of the same > range. What about the same range being accepted multiple times, one after > the other, the current code doesn't prevent this. That's why we have the bitmap. The bits got cleared there after the first accept. On the second, attempt for_each_set_bitrange_from() will skip the range. > What if you check whether the current range is fully contained within the > duplicate entry and if it's fully covered simply return ? If it is fully covered we still need to wait until somebody else finish the accept, so we cannot "just return". We can try to return if we saw the range on accepting_list list before, but it is disappeared, indicating that accept has been completed. But I don't think this optimization worthwhile. As I mentioned before, the collision is hardly happens. One more spin and bitmap check would not make a difference. And it adds complexity. > > > + > > + /* > > + * Register that the range is about to be accepted. > > + * Make sure nobody else will accept it. > > + */ > > + list_add(&range.list, &accepting_list); > > + > > + range_start = range.start; > > for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap, > > - DIV_ROUND_UP(end, unit_size)) { > > + range.end) { > > unsigned long phys_start, phys_end; > > unsigned long len = range_end - range_start; > > phys_start = range_start * unit_size + unaccepted->phys_base; > > phys_end = range_end * unit_size + unaccepted->phys_base; > > + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); > > + > > arch_accept_memory(phys_start, phys_end); > > + > > + spin_lock_irqsave(&unaccepted_memory_lock, flags); > > bitmap_clear(unaccepted->bitmap, range_start, len); > > } > > + > > + list_del(&range.list); > > spin_unlock_irqrestore(&unaccepted_memory_lock, flags); > > } -- Kiryl Shutsemau / Kirill A. Shutemov