From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BA3CC6FD1D for ; Tue, 21 Mar 2023 10:05:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF11E6B0075; Tue, 21 Mar 2023 06:05:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA10C6B0078; Tue, 21 Mar 2023 06:05:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C69726B007B; Tue, 21 Mar 2023 06:05:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B7FCF6B0075 for ; Tue, 21 Mar 2023 06:05:33 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8926B1C6A17 for ; Tue, 21 Mar 2023 10:05:33 +0000 (UTC) X-FDA: 80592473346.23.C298D82 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf28.hostedemail.com (Postfix) with ESMTP id 6D017C0013 for ; Tue, 21 Mar 2023 10:05:31 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=vPR3bujv; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679393131; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w1duAursv3dYZ+FsVJdTwkT5BNT6wALYEXhbvzy09mQ=; b=fFDwpkDQ8Czzg5dft9ZUH8SnG9wbzujrlxtBerN4zCM2fSPOgN/y6xXfULnUHJ6/DXZlPK 5dOi94ihBdV81g/smnxaSo/nEVAIby/HXJEP5Rg9MrjhuCptW2w5zxRZ9mOgw5DG8LK3hH CBnUHeUHgwRx/slxms3eWrrRROdMbEU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=vPR3bujv; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679393131; a=rsa-sha256; cv=none; b=oXlvm2yvOx0hlI3QCr2jKRjo624z7KL2hyOBq8AkelFLyBihCPI3eNSFiBgdVyAelekKHg YcbOr28586El53pJsoSwEI5ydNTE4Snmx/zJVr5BeNJkj8bOTJO19cHIckjeW4KBS7kz+S cgc9cnT8b2O01VRzFLc+bZBSOCaDbq0= Received: by mail-pl1-f172.google.com with SMTP id k2so15441592pll.8 for ; Tue, 21 Mar 2023 03:05:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; t=1679393130; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=w1duAursv3dYZ+FsVJdTwkT5BNT6wALYEXhbvzy09mQ=; b=vPR3bujvFheanHsY3IbfrdpPJIVjpCDA3EV+NEHfJ704uUcA+b2+DFulJoIw7nm5KZ CwIyAddycqQ1LlGJtLfkgSRYuZgcSrYVQIHdYHw63xygyeKh0QwpLipr5rWcskrSSwJ/ 28lWCHzGTuNaZaKBL0LeqeAUDUDZPODyCY0aTM+PIw9NFkZSP3wTO1nsy+LsqNomA1mb l0fjcuSfp0xUU/FhpjwBdaCpgC6Mba46yfNm/QWvqT8AdQYiW/3kVT19pwz69e9yewhO MS7OBGx5QVDHMCJplxalHeBaZywaXZehp/4QiiSoYXm7Khk6Cta17VBBND4MVbdmPzh2 0a0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679393130; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=w1duAursv3dYZ+FsVJdTwkT5BNT6wALYEXhbvzy09mQ=; b=tK97QSAlwhRwS780qJIbmWkPYJs0o2sgpN1/E+vbASH/GYImc2z2DUAiBt6BtKd9XT vVa8hQVVzV3ZEH4A5CHDFlbVoMftQd6/DCLqNkcPCqicu4BJ+zp//R7wPNNvzWNPyV3y YekypTnO4M3wKhLkYsaWgFQ8QLgb59p+I+DRz+Z5UgrO1o0raLL5OWL2zhs6XFLnGfJF la8NKz6MXZgR3/a7t6jWlZqDh6SCA6tE/x0vBQcJO7TLyiW8do5aPK9L+GzZ1XAG6/VA 7AhnLeNNpB5tFK+Pge8x9+8eu5xeBiQeG+y2QwtF+6rZLHQIMAMnd3tqy9KtNPJ+cLpg VhSQ== X-Gm-Message-State: AO0yUKXDIBU0Rv8ezyVI3SR18iD1t8q/gPVD+fzw+aJAIfPxvuDSJ6/Z BrYoToSaz5+rnmbxCn7admQz3A== X-Google-Smtp-Source: AK7set/vx6sFqB5DNFvFYx7r90EmXOZbKs/Cr4AJaTCz6F9iFLF/J+jY/m9VQEE+zmz1MHocXjBfdw== X-Received: by 2002:a17:902:f94e:b0:19f:3b86:4715 with SMTP id kx14-20020a170902f94e00b0019f3b864715mr1560718plb.8.1679393130131; Tue, 21 Mar 2023 03:05:30 -0700 (PDT) Received: from destitution (pa49-196-94-140.pa.vic.optusnet.com.au. [49.196.94.140]) by smtp.gmail.com with ESMTPSA id x2-20020a170902b40200b001a1cf0744a2sm3750245plr.247.2023.03.21.03.05.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 03:05:29 -0700 (PDT) Received: from dave by destitution with local (Exim 4.96) (envelope-from ) id 1peYru-000aOt-0z; Tue, 21 Mar 2023 21:05:26 +1100 Date: Tue, 21 Mar 2023 21:05:26 +1100 From: Dave Chinner To: Lorenzo Stoakes Cc: Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Baoquan He , Matthew Wilcox , David Hildenbrand , Liu Shixin , Jiri Olsa Subject: Re: [PATCH v2 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock and vmap_block->lock Message-ID: References: <6c7f1ac0aeb55faaa46a09108d3999e4595870d9.1679209395.git.lstoakes@gmail.com> <8cd31bcd-dad4-44e3-920f-299a656aea98@lucifer.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8cd31bcd-dad4-44e3-920f-299a656aea98@lucifer.local> X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 6D017C0013 X-Stat-Signature: up7u8haoapjmzcf7xupwxr638quxmxwq X-HE-Tag: 1679393131-587036 X-HE-Meta: U2FsdGVkX18igDULrre0J7Zvg9+8Oi2FkJmecsEIByQFOSzNAxgVrLB5YftTNoOlL2CDnSgo/ZjalkIIxfOn9Hb2+UL9n9KAlFkgfXdx+kDBiZ/Bcvcs/OLZchUoqDT4Tv2ALYFU5yFfXMILb//FUMPokRTU7ShF/cey6Zl1jMF4XzUp5Ll/6fWDsIW1ZAUSRlkbC17546hdhT9iLSDVAKA3eKLKJ9fUlnDZ2ZsbLczz7puoSdtW2WHUUAt5wGccs5Lhj8RlAhvt3aL2AqKXhQ/e69lqmzGvudpKY799PR9XQggNOrgRRKqp1RQOhrlqOg0IiM+7kmmiUnWwaOwgKFxB93290J7yPRQ5hqVcjH2lnt5chwvBcjQ2XzaeHd4tuM/NmKGp3cX4W2WSxgaeZIK4OltVgdWdRTrLY8ZOo596NRT386vTuegLo8Ax30rqCF6AXLFF/EIQzmoN80G7Li5GF4tkK9uUIRk7lpYJjWhMyRd46cUoqNHcyC5xKrxSclf2fXMYqopfB4/2vajt0zXxmVdjOAiUQZtLj0We+fm2WnSYbAV4q5gZ1BgdlL0Xmr7lZF/cn6z7F0yS4K93G9J7O2qw6dobSe4Tge3+H1absc/CsWlaFccGV5RD4kHrcsZgJwKcIquuFKBJh7npp4ZJPeTNqK8Sw8gVuYAeU4Hkv7afhlTrXvZUVNVTdTiRRQyYFkhD+4NBGqi8K1Xs+ViXZX0lEvsHTf/2IR3Wqq31BTVb8/F1W2XezQxVkQFebddnfUPmF4w5nJMGJAnW4X5ixqeCRyBj9+aZ7DkUkyj0TtWXqy1QFeSJUSmF1RUo81NhSwT71NnorHS0+Ek1j8FCJ8NuQPo6DfY9FOH9GtXPY1eIsPurc9sinaCEGH+WPgs8VDXNqx3WinMeKTnOfaZKYCljwqYiem8eiVfqAlrEaLVyAceadh74nVytNqGNsONfY8qb5+IenhDGEsx og/rjcSb 4YDaJJToV5PAMTXDFc+cpeR0IUf0CS84nVbR3mq4iXaMETt+3+z4frdY6mHEAR1DKEYN2w4625cDe6JaXhdoYO+EYWvDSt35XSZ+4wVunMaLZauRkn8n4Xnon26EKA2ro3xWgNiNGV2AB5+EAFIY0tp5GUnOMh7bO6vTZcYbianFAqmcJO/5EV13qA96Rl3XJmrBSMsJVFMowJyPJs1lwwOUzfL564ZfjPHSVjZuWbOSzuqrQ6FlbxGLwB03D81/AhH3xF/6uMHRZxpi2sd+Krzhy9aoFB/GYigFUqMKttSNgDFzmbSCggPMkzjgkVPQRzhGR4H46Mm0WaBvfqMxIeI//3WadDXy8rQ0nONrLL5F5NT1bIU+ePdGOMCssqFcaZSdxSOhKUpqfChRmqiZRp2FD9P+6GLX4fU8PQl3uvNqPh/volHmPtpNQrEM5/rPiHb7CUZOCQbWOSPF5RmUyZgrw9itwEL1VsLEc4sVViWr80sQmGYEDU+jvLqSbWS61WVKItTdMt5yNcOICWsx5dZFS8GFbSuKr826/atTobutdtlA23SbNuDRRdAMkHXUMZhrtILZw9vNo1ElVa+nDuA3TRB7SQY2K4gg1A7AvXq464Hro8CJTaoz6DYCzTEi2R6uJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 21, 2023 at 07:45:56AM +0000, Lorenzo Stoakes wrote: > On Tue, Mar 21, 2023 at 06:23:39AM +0100, Uladzislau Rezki wrote: > > On Tue, Mar 21, 2023 at 12:09:12PM +1100, Dave Chinner wrote: > > > On Sun, Mar 19, 2023 at 07:09:31AM +0000, Lorenzo Stoakes wrote: > > > > vmalloc() is, by design, not permitted to be used in atomic context and > > > > already contains components which may sleep, so avoiding spin locks is not > > > > a problem from the perspective of atomic context. > > > > > > > > The global vmap_area_lock is held when the red/black tree rooted in > > > > vmap_are_root is accessed and thus is rather long-held and under > > > > potentially high contention. It is likely to be under contention for reads > > > > rather than write, so replace it with a rwsem. > > > > > > > > Each individual vmap_block->lock is likely to be held for less time but > > > > under low contention, so a mutex is not an outrageous choice here. > > > > > > > > A subset of test_vmalloc.sh performance results:- > > > > > > > > fix_size_alloc_test 0.40% > > > > full_fit_alloc_test 2.08% > > > > long_busy_list_alloc_test 0.34% > > > > random_size_alloc_test -0.25% > > > > random_size_align_alloc_test 0.06% > > > > ... > > > > all tests cycles 0.2% > > > > > > > > This represents a tiny reduction in performance that sits barely above > > > > noise. > > > > > > I'm travelling right now, but give me a few days and I'll test this > > > against the XFS workloads that hammer the global vmalloc spin lock > > > really, really badly. XFS can use vm_map_ram and vmalloc really > > > heavily for metadata buffers and hit the global spin lock from every > > > CPU in the system at the same time (i.e. highly concurrent > > > workloads). vmalloc is also heavily used in the hottest path > > > throught the journal where we process and calculate delta changes to > > > several million items every second, again spread across every CPU in > > > the system at the same time. > > > > > > We really need the global spinlock to go away completely, but in the > > > mean time a shared read lock should help a little bit.... > > > > > Hugely appreciated Dave, however I must disappoint on the rwsem as I have now > reworked my patch set to use the original locks in order to satisfy Willy's > desire to make vmalloc atomic in future, and Uladzislau's desire to not have a > ~6% performance hit - > https://lore.kernel.org/all/cover.1679354384.git.lstoakes@gmail.com/ Yeah, I'd already read that. What I want to do, though, is to determine whether the problem shared access contention or exclusive access contention. If it's exclusive access contention, then an rwsem will do nothing to alleviate the problem, and that's kinda critical to know before any fix for the contention problems are worked out... > > I am working on it. I submitted a proposal how to eliminate it: > > > > > > > > Hello, LSF. > > > > Title: Introduce a per-cpu-vmap-cache to eliminate a vmap lock contention > > > > Description: > > Currently the vmap code is not scaled to number of CPU cores in a system > > because a global vmap space is protected by a single spinlock. Such approach > > has a clear bottleneck if many CPUs simultaneously access to one resource. > > > > In this talk i would like to describe a drawback, show some data related > > to contentions and places where those occur in a code. Apart of that i > > would like to share ideas how to eliminate it providing a few approaches > > and compare them. If you want data about contention problems with vmalloc > > Requirements: > > * It should be a per-cpu approach; Hmmmm. My 2c worth on this: That is not a requirement. That's a -solution-. The requirement is that independent concurrent vmalloc/vfree operations do not severely contend with each other. Yes, the solution will probably involve sharding the resource space across mulitple independent structures (as we do in filesystems with block groups, allocations groups, etc) but that does not necessarily need the structures to be per-cpu. e.g per-node vmalloc arenas might be sufficient and allow more expensive but more efficient indexing structures to be used because we don't have to care about the explosion of memory that fine-grained per-cpu indexing generally entails. This may also fit in to the existing per-node structure of the memory reclaim infrastructure to manage things like compaction, balancing, etc of vmalloc space assigned to the given node. Hence I think saying "per-cpu is a requirement" kinda prevents exploration of other novel solutions that may have advantages other than "just solves the concurrency problem"... > > * Search of freed ptrs should not interfere with other freeing(as much as we can); > > * - offload allocated areas(buzy ones) per-cpu; > > * Cache ready sized objects or merge them into one big per-cpu-space(split on demand); > > * Lazily-freed areas either drained per-cpu individually or by one CPU for all; > > * Prefetch a fixed size in front and allocate per-cpu I'd call these desired traits and/or potential optimisations, not hard requirements. > > Goals: > > * Implement a per-cpu way of allocation to eliminate a contention. The goal should be to "allow contention-free vmalloc operations", not that we implement a specific solution. Cheers, Dave. -- Dave Chinner david@fromorbit.com