From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A925C433F5 for ; Mon, 29 Nov 2021 23:39:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C38616B006C; Mon, 29 Nov 2021 18:38:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE80D6B0072; Mon, 29 Nov 2021 18:38:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A88F96B0073; Mon, 29 Nov 2021 18:38:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id 98FE96B006C for ; Mon, 29 Nov 2021 18:38:59 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 59575181B048B for ; Mon, 29 Nov 2021 23:38:49 +0000 (UTC) X-FDA: 78863585178.08.55F4477 Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48]) by imf31.hostedemail.com (Postfix) with ESMTP id 7B5431046307 for ; Mon, 29 Nov 2021 23:38:37 +0000 (UTC) Received: by mail-ua1-f48.google.com with SMTP id l24so37716820uak.2 for ; Mon, 29 Nov 2021 15:38:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=posk.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c7M8bjVkZS4Crx9n4ZGGz1oMJSYLhjEhdQ0JNSPjlt8=; b=V35tdCUBBDMXJ7fCiyIuqsXXPU6QrRaT/GWOg2xVg1xzv9E5BFQ9W2UkyjwlXj2KaF 9XkB2/ifBLIKQUxvo4amGtjsEPf6PcquFjfc2/JGz/0Qgawp9Sh6ArGxEZP/v8fX1dbx NghpiQwCtWy/9xielIZZzVDQxmEpglv71V4Rmy99Al84IfweaGcbTjztI6rz3ObcRPqS nbCpI/ai5Ewqm6hXwvWFy7quXey5FxlQCjOQZGWm6l89751vvnW4xp0VivyHXIo3mCcQ MCZtlRJQIATPZXw8FG4Q+fMEtxyP8IswiZlyChOMOMiFNLul2jtLwfedipIcZBH5OvP4 Iqlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c7M8bjVkZS4Crx9n4ZGGz1oMJSYLhjEhdQ0JNSPjlt8=; b=iwojy1qa1nQnoBCKldS5ThOwED5AS3g6VtkguznN2u2EIcL1hIebK0bJQi0l7ubpEm MWJRAnpErb7+vq7MyJfhvAW1O5tWqeM3lndeDq6XOhxEZdFF7Qi8MDWw/FtJUQMevoI2 chSytRGj0CYKyDYElC5qGY2nFIPl/700oejxUhN1VKvI3DJJCVzvOjRleOUecU/IfomR KyG+L9peQKfm+20OEa53Hk/BwCDBsjtlUnzkz6g3JJgL4ahq4BOPoYlCVLz2d1HMYIk/ buK1Ol67GeqdlgWdWChT0/O0hPTPXThkjiz8PXC1uGJOL3sxcbTZOf8DESssvRSX5nMN Dt1Q== X-Gm-Message-State: AOAM533z9evb7dbxI8AtDiiNstjZSnEMFaaimRGQ4POzb2Uew9ezqwHe G+GQItemlmYCvj2UuDWUyMc3KVEQWL0p8UCuCWt2Lg== X-Google-Smtp-Source: ABdhPJyQvp4x5bByQwLpBEygbxk1oCfNM99yBAAEpPNfSg1xfZNjK+9yDN/dWx3ThizfuvtRLx8gzzf2dPwNKXRRRVI= X-Received: by 2002:a9f:218c:: with SMTP id 12mr51476140uac.71.1638229128058; Mon, 29 Nov 2021 15:38:48 -0800 (PST) MIME-Version: 1.0 References: <20211122211327.5931-1-posk@google.com> <20211122211327.5931-4-posk@google.com> <20211124200822.GF721624@worktop.programming.kicks-ass.net> <20211129210841.GO721624@worktop.programming.kicks-ass.net> In-Reply-To: <20211129210841.GO721624@worktop.programming.kicks-ass.net> From: Peter Oskolkov Date: Mon, 29 Nov 2021 15:38:38 -0800 Message-ID: Subject: Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls To: Peter Zijlstra Cc: Ingo Molnar , Thomas Gleixner , Andrew Morton , Dave Hansen , Andy Lutomirski , Linux Memory Management List , Linux Kernel Mailing List , linux-api@vger.kernel.org, Paul Turner , Ben Segall , Peter Oskolkov , Andrei Vagin , Jann Horn , Thierry Delisle Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7B5431046307 X-Stat-Signature: 9wm6rdyu8ccsjtbmxpdwooj569ugkqhz Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=posk.io header.s=google header.b=V35tdCUB; spf=pass (imf31.hostedemail.com: domain of posk@posk.io designates 209.85.222.48 as permitted sender) smtp.mailfrom=posk@posk.io; dmarc=none X-HE-Tag: 1638229117-725314 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 29, 2021 at 1:08 PM Peter Zijlstra wrote: [...] > > > > Another big concern I have is that you removed UMCG_TF_LOCKED. I > > > > > > OOh yes, I forgot to mention that. I couldn't figure out what it was > > > supposed to do. [...] > > So then A does: > > A::next_tid = C.tid; > sys_umcg_wait(); > > Which will: > > pin(A); > pin(S0); > > cmpxchg(A::state, RUNNING, RUNNABLE); Hmm.... That's another difference between your patch and mine: my approach was "the side that initiates the change updates the state". So in my code the userspace changes the current task's state RUNNING => RUNNABLE and the next task's state, or the server's state, RUNNABLE => RUNNING before calling sys_umcg_wait(). The kernel changed worker states to BLOCKED/RUNNABLE during block/wake detection, and marked servers RUNNING when waking them during block/wake detection; but all applicable state changes for sys_umcg_wait() happen in the userspace. The reasoning behind this approach was: - do in kernel only that which cannot be done in the userspace, to make the kernel code smaller/simpler - similar to how futexes work: futex_wait does not change the futex value to the desired value, but just checks whether the futex value matches the desired value - similar to how futexes work, concurrent state changes can happen in the userspace without calling into the kernel at all for example: - (a): worker A goes to sleep into sys_umcg_wait() - (b): worker B wants to context switch into worker A "a moment" later - due to preemption/interrupts/pagefaults/whatnot, (b) happens in reality before (a) in my patchset, the situation above happily resolves in the userspace so that worker A keeps running without ever calling sys_umcg_wait(). Again, I don't think this is deal breaking, and your approach will work, just a bit less efficiently in some cases :) I'm still not sure we can live without UMCG_TF_LOCKED. What if worker A transfers its server to worker B that A intends to context switch into, and then worker A pagefaults or gets interrupted before calling sys_umcg_wait()? The server will be woken up and will see that it is assigned to worker B; now what? If worker A is "locked" before the whole thing starts, the pagefault/interrupt will not trigger block/wake detection, worker A will keep RUNNING for all intended purposes, and eventually will call sys_umcg_wait() as it had intended... [...]