From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C476CF3945 for ; Thu, 19 Sep 2024 14:31:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B4146B008C; Thu, 19 Sep 2024 10:31:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 563CE6B0093; Thu, 19 Sep 2024 10:31:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42AFF6B0095; Thu, 19 Sep 2024 10:31:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 243AA6B008C for ; Thu, 19 Sep 2024 10:31:04 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9A1EDC0436 for ; Thu, 19 Sep 2024 14:31:03 +0000 (UTC) X-FDA: 82581724806.25.F0B3F74 Received: from mail-oo1-f53.google.com (mail-oo1-f53.google.com [209.85.161.53]) by imf16.hostedemail.com (Postfix) with ESMTP id 768A818001A for ; Thu, 19 Sep 2024 14:31:01 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEEisp8v; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.161.53 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726756169; a=rsa-sha256; cv=none; b=osAlrpjNUioyBTYJFwH8a9SdeQToF2JgmHKfHzlXkqGWctKgozEyVwBe/agRPq+1jaUBoQ a7Gm01sThCju/nwh1WGMsE/5TPTcwAcA/iWN+OV0n3G0ha9wBc4wD0GPfZUKcY1geYUAUv T/b0GrqHKeoW2yteohoqVjgFDesdCkY= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEEisp8v; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.161.53 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726756169; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZiPyQtok24yVSzumHz7/j5nUGPIHzy2zjNQ4k/nbsBA=; b=VHliEbFv7rWP7lA6QI+Fg0IHT6Wb4UbGB/pQHzpS/+rvB8SvDv5P8e9ouNAMxOM5ap8PgK a71gIrlcVKH4tlGg31Wpiqkr4G/7YLBw51w6rcbZv/j6wPZyYtKVbqG8q2NlS0gILur5/p cVjIhDh4u+AeuQkEH9acIGzCuyiPZwM= Received: by mail-oo1-f53.google.com with SMTP id 006d021491bc7-5e1ba0adcb0so497007eaf.0 for ; Thu, 19 Sep 2024 07:31:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726756260; x=1727361060; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ZiPyQtok24yVSzumHz7/j5nUGPIHzy2zjNQ4k/nbsBA=; b=PEEisp8vIlwzR5x25VhFm0qimPJr79bVyTklFfSh69iitbXIpUMBAKz3t2qzbBFv9b n/jRtatieaKerTptTH4fqFHkjo43RPnr4ZmHYLqPsY+9Y7i/VUCpRgZJJ0hpCj4NqtMz cB2rj6ZxUOi2UpxXxGo3Yj+c/u9O8LVp8HzdRgvnsTzs4X/Ie5hlvlO5HIVjAIipgjp6 zX1rG7ntaweE6CVdsaJPV2sy/UVo6g7r6Z10gGQaO2Z/pe+reo08bbaTuRoSTO06t1jI TS6VV+h5LBpEZxDuIhlf1x3dOzvQwhr4tvIT7m3tO6BqR5nf7TVBi5BNLS1GyzCT4Jw2 bsPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726756260; x=1727361060; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ZiPyQtok24yVSzumHz7/j5nUGPIHzy2zjNQ4k/nbsBA=; b=wij4sFwPTQm9DdsWurupukFsO8/h2cYfTYXRSXFCGKyCS2Oxwdu5ryVRH36GBdy+CE mdzFn3sS0+Jph7m0zMnNBvnoP3PE34ZNRx/nTM93QI6S7zK7sLguc6GoEQZpKCWDdv2I lCf5BdX0vFzBFRLUZr9xVg1LxrEjmR7WuCbOLyo8uymX/99PbRhdGhLsy1tpzUr+NwJ4 zABCI4yRIJp7vneEl3P25e4yMg9Dcm7Qplo5Dc9FBg0OdgOVzIRCAPqkmiBziesN2ara db09nLZo0caZ1Ff7k1jSfaOOVxfAtz9MUxfO3/DATRKnfExkYuI5aevHyjAs9CE8UIT0 uVZw== X-Forwarded-Encrypted: i=1; AJvYcCVaBOcQ1lezfYLDvYnwA0m6x0gHk9RWQ2f5MelnDvUc+ndiC6BATkz0giNRBgy983n9aJWvEB1qqA==@kvack.org X-Gm-Message-State: AOJu0YzDgfuylhCwWzRwa2b5MSyLJXWEBokbsrpjfH4ZMnDbGZJWfUiQ qrub7ahUmnbV44rF0No+aHRxhTXfIeiVXye/jF0PccVXdvagXxW4 X-Google-Smtp-Source: AGHT+IHFCCX3zwh0AEuYYEJIlgTq/4aYBTceJd7qLehHaLQIROmZ4+w37oqZqU3UN2mwapakGnh1aA== X-Received: by 2002:a05:6358:2492:b0:1b8:6074:b5a with SMTP id e5c5f4694b2df-1bb14dca507mr765456255d.1.1726756260257; Thu, 19 Sep 2024 07:31:00 -0700 (PDT) Received: from f (cst-prg-94-182.cust.vodafone.cz. [46.135.94.182]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c75e46f29asm7947046d6.36.2024.09.19.07.30.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Sep 2024 07:30:59 -0700 (PDT) Date: Thu, 19 Sep 2024 16:30:42 +0200 From: Mateusz Guzik To: Neeraj Upadhyay Cc: Linus Torvalds , Boqun Feng , linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@vger.kernel.org, "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Mark Rutland , Thomas Gleixner , Kent Overstreet , Vlastimil Babka , maged.michael@gmail.com Subject: Re: [RFC PATCH 0/4] Add hazard pointers to kernel Message-ID: References: <20240917143402.930114-1-boqun.feng@gmail.com> <050d17f6-7db4-4a05-b4a5-6d5ab4f361cf@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <050d17f6-7db4-4a05-b4a5-6d5ab4f361cf@amd.com> X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 768A818001A X-Stat-Signature: 95pso7ghh3n5exwydm7h98firqt6ijty X-Rspam-User: X-HE-Tag: 1726756261-273023 X-HE-Meta: U2FsdGVkX19QJy/jjHFiacPMOlOSIo7bi+M0Z1MAPLwd1mNiuWhxIcmwJharYX1NfbMATnSQ88LTfl6ak5LR7a0HBARJPXAMyHcHH/+l24lanjwOoBdbsINym4D/ZOyd9Lk85XRIex6KY2KJWB3aP/kHdHGQkw+omMs3wDNmI5TZYJhIqAryHiJN3jn1XqpnAkdEt++TSKs1txTsuiuoXHHNkPw4SlaLcCN5fBLb4g70PH5+DecE4YQPVzQYujfW/PfVpOpdDUjUtE7qDdP6UDoimYiatN2SWxFzNIPVUKyo4M3Ey0j818jZr66l6Q3BIdxeO1onRQbyf4+kGHONYu3EMM7RNsBnG/B/tOaTWr07Z5T1qdc+9uMbp/vJpWOyvP989/qgn2re6IMzVU8nJQ2PRGluAPqZjSrIM6SxT+N4vvr4R3YJWA0Z6UYRHiQ+Zf71OaNMAy3+/yapuTmsBWWCzrOpr66BkJxO+z2CqulsFckQmOBgUTcEYC8QVtwqZbYP+KX5Zfg+rrCp8JhRnaC6BRFaYPX+7cGPiieD2gmR1VfFaSiOKQBovRcoHd78Cgb8NqYDnu9klxD4p3264oFeNjPIzd1EYaaiyGKEyUiNhdfnWfCTu9LSU60Kc0dS6Q3qmoEQrIa/RDNsEwQbt+oP2ExR3RUf49JsG2FHlfp+8kzsfnVm52DIHXqCm3VVYmhTug6hKmw1s3h10XrxK/y84HsmFD9JgHbi1yjwC8sdHx8TWdPKiKkma6Soulp3blxDuAksAz4J7KDVH4ds50QS79EdzbBWYeAXWGBzDoRgIlY2pISystDEvv49DUBBhWgpIedlParu8fyfU9QWBXBSQLmhVsP+8TgVzTYM5jk5ssMbsdMbbYgR4iMt7nRsGBWpu+Dn1qQ4ZXx9rDGz2pZR+8KvrKhLj0GZKdcg9/0H0mBzR/JAwWQ+xs8uwNUWm9+kVA4qdK3FhWmGRxQ W4faf+W3 8a7GHOJDj+gzv4znVsxoV2rpNQ47QMiB9QgkSaMyo/oCB19zCv/LzU52YjJ0QMB9CFO9VChjpPXPNaQ12fGqCOTFRf/U7AdatMKpEFDYPXacyVKMSzK7ZjkaVC4HaPZf69F6i1WziI1kKRlgxYGqVUfz4ya/ojrZ9I3SGFt5FsvffdioAx/4yxmCqMKpIzWCDWIZNk9hlHGtgqOpGCZctZrRA05iC8ulUtEQ/vc150070U77EjbD2Rie0sGuSbqZwm9S2O/y7twLZD34f65RAMwJkBRv0j6sdIxrzXurwmGHI6gpjTWs9KhhkKoy0AOZcgg+hiAgqzvFV3X5f6CqeMBMTw0IRbMuesqKj65pHDNNd7HmHg3otrgq5nPUMk4MyVmyJy6NP+jiRhuoUSrqJ+1yvMrpFx1QelhFMjtxqt9OQvSQ5HqrAHtbmhMjojn9yW53mOcq+c8c2B97r94lP4+Z+wN04pMczLsKJDyGU8c/fcY0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 19, 2024 at 04:14:05AM +0530, Neeraj Upadhyay wrote: > On 9/18/2024 12:48 PM, Linus Torvalds wrote: > > On Tue, 17 Sept 2024 at 16:34, Boqun Feng wrote: > >> > >> This series introduces hazard pointers [1] to kernel space. A TL;DR > >> description of hazard pointers is "a scalable refcounting mechanim > >> with RCU-like API". More information can be found at [2]. > > > > Please give actual "this is useful for X, and here is an actual real > > load with numbers showing why it matters". > > > > One of the use case where we had seen improvement is - Nginx > web server throughput scalability with AppArmor enabled. For this use > case we see refcount scalability problem when kref operations > are done for AppArmor label object in Nginx worker's context. More > details about this are captured @ [1] [2]. > > When we switch from kref to hazard pointer in apparmor_file_open(), > we see ~7% improvement in Nginx throughput for this use case. > > While we were working on this problem, this refcount scalability issue got > resolved recently with conditional ref acquisition [3] (however, there are new > developments in apparmor code which might bring back the refcount problem [4]). > The open/close thing is still serializing across different processes, the slowdown just got lower. As in apparmor *as is* continues to be a problem at big enough scale. Per my messages in the area in the past, I'm confident this is fixable with changing the refcount model to cache ref changes per-thread. I employed this very scheme $elsewhere. Since equivalent mechanism is applicable to creds this may want to be implemented as something under lib/. I even started to work on it for Linux, but real life got in the way and then I could not be arsed to finish. It is a little reminiscenet of per-cpu refs. Here is the outline again: kref usage gets replaced with a touple of { kref users; s64 refs; } task_struct grows a pointer to the cached label and refs counter on it when a new thread is created it bumps users and stores the pointer. on destruction it decrements users and rolls up the local changes. Similarly, if it turns out the label has to change during thread's lifetime, the same thing happens. In pseudo-code for apparmor_file_open(): if (unlikely(current->aa_cached_label != check_label())) { /* do a replacement here */ } /* just bump the local counter, no synchronisation with other * cpus in the common case */ current->aa_cached_label_refs++; In apparmor_file_close(): /* common case fast path */ if (file->aa_label == current->aa_cached_label) { current->aa_cached_label_refs--; return; } /* we get here if apparmor got reconfigured or this is a file we * inherited from another proc which had a different label and * this is the last fput */ kref_put(file->aa_label); Conceptually there is almost nothing to see here. As outlined above stale labels would clear themselves out as threads open files. However, a thread which stubborly refuses to call allocate a new file obj may hold on to a stale label indefinitely. One way to sort it out: I presume there is a spot somewhere in user<->kernel transition handling which updates the credentials pointer, should it have changed. $elsewhere I patched it up with a "cow" generation counter. If not matching with the real task struct you know you need to take the fast path and check creds, apparmor and whatever else. No extra branches in the fast path, but a new int does have to be read. Given that task_struct is a little bit of a cluster fuck I don't think it's a problem. That would be a rough sketch, anyone interested can fill in the details. This still performs serializing atomics in *certain* cases, but avoids them in almost all cases and there is nothing complicated about this that I see, just some effort to implement. So I don't believe patching up RCU with hazard pointers is warranted if apparmor is the only justification. Anyway no ETA from my end, anyone interested is free to take the idea or do better.