From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CB94C433DF for ; Mon, 13 Jul 2020 16:48:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6020D207D4 for ; Mon, 13 Jul 2020 16:48:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="R+6LRuLZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6020D207D4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E72958D0008; Mon, 13 Jul 2020 12:48:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E208C8D0001; Mon, 13 Jul 2020 12:48:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D377E8D0008; Mon, 13 Jul 2020 12:48:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id BBDB78D0001 for ; Mon, 13 Jul 2020 12:48:23 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 351BB4DA6 for ; Mon, 13 Jul 2020 16:48:23 +0000 (UTC) X-FDA: 77033635686.04.geese14_6016cce26ee9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 0DC69800CF2A for ; Mon, 13 Jul 2020 16:48:23 +0000 (UTC) X-HE-Tag: geese14_6016cce26ee9 X-Filterd-Recvd-Size: 6117 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Jul 2020 16:48:22 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id p3so6281477pgh.3 for ; Mon, 13 Jul 2020 09:48:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=INPs3nxni2V4l7Wi5FNDIFPFwYguU/xf1qgvb0BuhC4=; b=R+6LRuLZbvncOfmOwYY5q8IVDwZiV+RDPEu6DkwQ7DghW24zrjlhrUTVAvXZR3UUHH mkBdeBPq/GcwQRrBAUjLvieWEIgcEhNfLRhInWKe9n46bjAtdCWUhOcKDGBJWiwjkxgR 6Rj9VkbiyOt3ADuG2ZXGxNQKLMZBBmZCoDqbFodM2xjHs/iLWVOY/Bl5KLgNhYdeleLA edC+A3d9UvHuGkddJK37cHRQypGFRiX+ha42c3w/yAXtJr1q/VMTuk7Vb7ki7dk5K3bW ZS52GW3lsY3/2uK3rwUG7dp8gk5XfTe7av83x249Xfkf9EShHRcn4+fBVSnTnIstdvKZ 5a9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=INPs3nxni2V4l7Wi5FNDIFPFwYguU/xf1qgvb0BuhC4=; b=j6AJ6mTHMDMNSQX/0cygLFCn4GL63NAiGAoy6b+fWK0iyGI+rkejoyfWMKG6HzTNIk 9c+Pc7fSEF6lDRWO3317pGBvBU/ciqRsMS/w4NcXz0JjlKIIN5Gm9hy1egud48T7veHx 0ILBXHbUD6P5sOr4Sm4ekgRBvdWRzG7CR/ilbokBObtGt0KFl2Xy2QIFXtsKJysLnuWH Ksvrek4bU/UmTlslgtdGUx9713ZJtSLDSev4n2vSrgxlqS1XCGUrsEYoTrY0u/pldGez eZJkaRFTGXrAV5lTNxlsgoda879NLr3OCM902fL3JsUTG5TTQ56MdsWdSnVniDEmHF// 2spA== X-Gm-Message-State: AOAM532937WJmR3/gWP0UlgcWycpo3tGel4NItL1kcDjT6i3aiGU3s7G RHwV0pcH3KTTbPfDOBTXCqA= X-Google-Smtp-Source: ABdhPJyzYE/gbgNRIvd8T1nEKwOmWyQ0D6wBLrY+CaPnFOU0pgGH0kYtLwOHfmaSiihyfPeLfiiMRg== X-Received: by 2002:aa7:8ac3:: with SMTP id b3mr723937pfd.45.1594658901764; Mon, 13 Jul 2020 09:48:21 -0700 (PDT) Received: from localhost (110-174-173-27.tpgi.com.au. [110.174.173.27]) by smtp.gmail.com with ESMTPSA id m68sm121909pje.24.2020.07.13.09.48.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Jul 2020 09:48:21 -0700 (PDT) Date: Tue, 14 Jul 2020 02:48:15 +1000 From: Nicholas Piggin Subject: Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option To: Andy Lutomirski Cc: Anton Blanchard , Arnd Bergmann , linux-arch , LKML , Linux-MM , linuxppc-dev , Mathieu Desnoyers , Peter Zijlstra , X86 ML References: <20200710015646.2020871-1-npiggin@gmail.com> <20200710015646.2020871-8-npiggin@gmail.com> In-Reply-To: MIME-Version: 1.0 Message-Id: <1594658283.qabzoxga67.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0DC69800CF2A X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Andy Lutomirski's message of July 14, 2020 1:59 am: > On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin wrote: >> >> On big systems, the mm refcount can become highly contented when doing >> a lot of context switching with threaded applications (particularly >> switching between the idle thread and an application thread). >> >> Abandoning lazy tlb slows switching down quite a bit in the important >> user->idle->user cases, so so instead implement a non-refcounted scheme >> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down >> any remaining lazy ones. >> >> On a 16-socket 192-core POWER8 system, a context switching benchmark >> with as many software threads as CPUs (so each switch will go in and >> out of idle), upstream can achieve a rate of about 1 million context >> switches per second. After this patch it goes up to 118 million. >> >=20 > I read the patch a couple of times, and I have a suggestion that could > be nonsense. You are, effectively, using mm_cpumask() as a sort of > refcount. You're saying "hey, this mm has no more references, but it > still has nonempty mm_cpumask(), so let's send an IPI and shoot down > those references too." I'm wondering whether you actually need the > IPI. What if, instead, you actually treated mm_cpumask as a refcount > for real? Roughly, in __mmdrop(), you would only free the page tables > if mm_cpumask() is empty. And, in the code that removes a CPU from > mm_cpumask(), you would check if mm_users =3D=3D 0 and, if so, check if > you just removed the last bit from mm_cpumask and potentially free the > mm. >=20 > Getting the locking right here could be a bit tricky -- you need to > avoid two CPUs simultaneously exiting lazy TLB and thinking they > should free the mm, and you also need to avoid an mm with mm_users > hitting zero concurrently with the last remote CPU using it lazily > exiting lazy TLB. Perhaps this could be resolved by having mm_count > =3D=3D 1 mean "mm_cpumask() is might contain bits and, if so, it owns the > mm" and mm_count =3D=3D 0 meaning "now it's dead" and using some careful > cmpxchg or dec_return to make sure that only one CPU frees it. >=20 > Or maybe you'd need a lock or RCU for this, but the idea would be to > only ever take the lock after mm_users goes to zero. I don't think it's nonsense, it could be a good way to avoid IPIs. I haven't seen much problem here that made me too concerned about IPIs=20 yet, so I think the simple patch may be good enough to start with for powerpc. I'm looking at avoiding/reducing the IPIs by combining the unlazying with the exit TLB flush without doing anything fancy with ref counting, but we'll see. Thanks, Nick