From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE437EE57DF for ; Mon, 11 Sep 2023 20:28:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5727A6B02F7; Mon, 11 Sep 2023 16:28:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5225F6B02F8; Mon, 11 Sep 2023 16:28:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C2C06B02F9; Mon, 11 Sep 2023 16:28:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 292FF6B02F7 for ; Mon, 11 Sep 2023 16:28:43 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E8C031608F2 for ; Mon, 11 Sep 2023 20:28:42 +0000 (UTC) X-FDA: 81225454884.28.CF78E9F Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf22.hostedemail.com (Postfix) with ESMTP id 1C79EC0013 for ; Mon, 11 Sep 2023 20:28:40 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=b7CwCOq2; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694464121; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u1FlQJT5CLEqBN4p3+flYnqVBEBTd/hK6rhM4bfpcQo=; b=RQ5ujt+bBmIoZCnPgREX05qUnp+E0JGIVk6NqvW5jnrZNsu5VhezSfprRz6UqJutj8fcO9 9RXWslzMmXmlFjcZDVRbpasdGnQILJ7n833ttPvUamtlA7oGdKXK2TxWx2P7QV85gxq70v B+Cv/UOfoR3Ie/hMaIhmuP4CrUyPddw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694464121; a=rsa-sha256; cv=none; b=CiXeo9zbrOZ7rrODH/wOfjHuONb/OaodLHACNy2p3BM+LattBj5q/6R5LUQFyVP7eq28GV j3a48/U414lJSDcT7Nqj+Udc/EMUmOrwLUIq4HCFHfOB5N7HTgUkPjZZA0DoDJ3vkp7mBb uV1iq4rCrkKgXLod3lYr6Up/qCkLbL8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=b7CwCOq2; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-9aa0495f9cfso664581366b.1 for ; Mon, 11 Sep 2023 13:28:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694464119; x=1695068919; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=u1FlQJT5CLEqBN4p3+flYnqVBEBTd/hK6rhM4bfpcQo=; b=b7CwCOq22prF5ykCqaVYnPP/VTMLGh7MoxMaHQSA5JJkzss/apmZj5cZYBVmet98Jq q2mV0rHInlg1nzwidnioOYaA3M/FXsf9EF8my+M/WZCEVGZjMeOO3NNGaPExPfGOPnuf VdeNQsxJ63j83GJfEy4/wUzzjX7k1PBgeXBx8jlSPmKqYBD/ivy2ioEWOXqQc5TL31RE N24uZtKoG1SWTWKc6AR5/W2DCKsx7mIijx1gQduDL8/3GYlBxwjdNrbYdxtqYBSVNFQS HzfR/dVkGo9owE9Wexw2FnfG8O9jEdJ836lCCyMCE02BHQlOgrFkX1StkmMF50inmh4f VbYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694464119; x=1695068919; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u1FlQJT5CLEqBN4p3+flYnqVBEBTd/hK6rhM4bfpcQo=; b=Cd54l3oDWMxneATDIY2RBGXzxZTQPC3HyX2BWzo3g0zYdTc4Y1va8/kmJrbLIHtkN/ OBZSbx6NrA86czkLjd6rkrruVZhD1kHkQ2uD34cdwNX8Hnfd53egez4pvaAH25HS5SUf ipVwVlSZKypXvmK74Y5eiFgYW3O1bpV0gWH3x2aJAohttKskXY2UyADTiveBDE0XRL9l lwunZNvJ8DKosuwAjOmhIcfXx6uWA8ZBA+SvIQ33ax3WwGZivHhfBloWFKZzj4az3gqx fUE4YoYlXmSgkH40j93RnxClJ0mHKn7rFAwO958wC6mOb7vw3fRxNtTEMix94illGkG+ Hzlw== X-Gm-Message-State: AOJu0YzQbSTFAGxv67HPyLaFTuCPT9jN3nX4IEG7XCowjWTtKgcZUsc6 MhImKPs6ABcxEbjTn9d0JBrTIaki3Rw/2lFajwIxPQ== X-Google-Smtp-Source: AGHT+IEH+9F3gycVz0qnwgTvEbn9nbBIu3p9WJ/4/wKz5bCMECF/tNxMgywpY5TqLFcGd5V515Ju3Zc6mdoqTJJ/WNs= X-Received: by 2002:a17:907:7244:b0:9a9:f6e7:43fe with SMTP id ds4-20020a170907724400b009a9f6e743femr917666ejc.24.1694464119298; Mon, 11 Sep 2023 13:28:39 -0700 (PDT) MIME-Version: 1.0 References: <20230831165611.2610118-1-yosryahmed@google.com> <20230831165611.2610118-5-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Mon, 11 Sep 2023 13:28:00 -0700 Message-ID: Subject: Re: [PATCH v4 4/4] mm: memcg: use non-unified stats flushing for userspace reads To: Tejun Heo Cc: Wei Xu , Michal Hocko , Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Waiman Long , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Greg Thelen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1C79EC0013 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 16qxupu3w6c7fzh1gok9e4foqn736356 X-HE-Tag: 1694464120-451330 X-HE-Meta: U2FsdGVkX1+BF4b6/p5CkYYh82bUvjdEEUpEe4bntlWQ05nknUm597W8VfY/kiwQmMpFai02BAAbJ83o23dr1ninf2+uAZ7T0Uj8CXtMzFFsyoYYUb5EIaB7NRk8JnHXq/MDOmikGB7ArBmUap762XsXdvFB32TFgMrbeioiFNljVo5+w9XQWJBV0X7o93cr3Xy9KZzqgmTJj5lHkizo2CVKaUcJv4OUIXu3k/sPrhTu58TTVGill+HEjf6ulSqKsxW3+XNwjObycB/VpuDHPg+WihC2RqaRQek9peRCtwQQWEp5tTnMr18Fx7K8EiApxUmoLuSxrYRnO+myxQXruF7IeJ75pbm0yTArzylz2cMDyuou2y3DG9AzF7PO85SrR1PbWFxqFEZB2wpo8M/U3/QUV4M3E/0JnnkXZ4BjoBiNAXuX375S9110lQC8qjwH94mGUISP5Ml21iIrTUbFlX/8wIVpxETRkEbzLQWraBy6Tklcy3EMS2sZizqUKOZTDQQDSnfa3mQ8SQRO6AQc3Qq9NexWVmKv2USMomjUJykDuCG82WtV+sHw0qDHPjbdZDW7bfuReRil7/m3Ig9M+MqAnk36Gjgc4zyTdqmeRlVhlkAvWGy7fLOkrxYbzpj9YuKLVN6UaE2a/OVdxt6XTyndrYLAoPWgxdBulVfqrWfhgve8hkkrU4uwU/vT8teGc1/4UxEAyW+AluxSPmmRsnGtSHNGuTeRt3WkMO38FKD6P5wAiQkexfmfYoivhLj8MBquBDIUfHhAZGgT4pL3KyqsFzwIGJKDNDP6KJAW/q9V3Vqkw0UxR1GxXoU+JTKFjgyK+t86C0T4WWxpA9QfFFYnsvuMZAapLl4GkAqYWP4oY6kCMR+V0wJwwlX6PEymFdSwvNCwlVxSgeD9xLUSD2tIRwOVZbm0YfN0yZzRqtJLVHZBAEcvps3E9E+sXhWtlCPTpy6nn1ktLz7krPJ NwYPnG2c Ms3EPsoT/V6ecQUTVET6iRIul+t6qRbAUDDF6eZRFrgA0++YzkZCm/dIxODG3Erh/DBrXS1NMUJI6DgFkMH0Z3/XRvaT9CVuCHvgM/Pb2FXJU03wNiB1MvxGQw7vLD7UCIa3IXvFBcFpiMLfrBLruir1lHh39YfSEwhauxwI7j5dgqcH53u66r6JNWqnvg2AgLZoLrGb8RefoohtTMYwyYbRHfAsiK+zv5B2Vg4EKbBpiwU7UKUOD7OXPHhwyYleZFoxMregF1UjMLx6kHQ7J+mjaCGgA+Gn8xAKo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 11, 2023 at 1:21=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Mon, Sep 11, 2023 at 01:01:25PM -0700, Wei Xu wrote: > > Yes, it is the same test (10K contending readers). The kernel change > > is to remove stats_user_flush_mutex from mem_cgroup_user_flush_stats() > > so that the concurrent mem_cgroup_user_flush_stats() requests directly > > contend on cgroup_rstat_lock in cgroup_rstat_flush(). > > I don't think it'd be a good idea to twist rstat and other kernel interna= l > code to accommodate 10k parallel readers. If we want to support that, let= 's > explicitly support that by implementing better batching in the read path. > The only guarantee you need is that there has been at least one flush sin= ce > the read attempt started, so we can do sth like the following in the read > path: > > 1. Grab a waiter lock. Remember the current timestamp. > > 2. Try lock flush mutex. If obtained, drop the waiter lock, flush. Regrab > the waiter lock, update the latest flush time to my start time, wake u= p > waiters on the waitqueue (maybe do custom wakeups based on start time?= ). > > 3. Release the waiter lock and sleep on the waitqueue. > > 4. When woken up, regarb the waiter lock, compare whether the latest flus= h > timestamp is later than my start time, if so, return the latest result= . > If not go back to #2. > > Maybe the above isn't the best way to do it but you get the general idea. > When you have that many concurrent readers, most of them won't need to > actually flush. I am testing something vaguely similar to this conceptually, but doesn't depend on timestamps. I replaced the mutex with a semaphore, and I added a fallback logic to unified flushing with a timeout: static void mem_cgroup_user_flush_stats(struct mem_cgroup *memcg) { static DEFINE_SEMAPHORE(user_flush_sem, 1); if (atomic_read(&stats_flush_order) <=3D STATS_FLUSH_THRESHOLD) return; if (!down_timeout(&user_flush_sem, msecs_to_jiffies(1))) { do_stats_flush(memcg); up(&user_flush_sem); } else { do_unified_stats_flush(true); } } In do_unified_stats_flush(), I added a wait argument. If set, the caller will wait for any ongoing flushers before returning (but it never attempts to flush, so no contention on the underlying rstat lock). I implemented this using completions. I am running some tests now, but this should make sure userspace read latency is bounded by 1ms + unified flush time. We basically attempt to flush our subtree only, if we can't after 1ms, we fallback to unified flushing. Another benefit I am seeing here is that I tried switching in-kernel flushers to also use the completion in do_unified_stats_flush(). Basically instead of skipping entirely when someone else is flushing, they just wait for them to finish (without being serialized or contending the lock). I see no regressions in my parallel reclaim benchmark. This should make sure no one ever skips a flush, while still avoiding too much serialization/contention. I suspect this should make reclaim heuristics (and other in-kernel flushers) slightly better. I will run Wei's benchmark next to see how userspace read latency is affect= ed. > > Thanks. > > -- > tejun