From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C970FC433ED for ; Tue, 20 Apr 2021 19:09:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F25E6613D0 for ; Tue, 20 Apr 2021 19:09:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F25E6613D0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 117BA6B006C; Tue, 20 Apr 2021 15:09:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C8EF6B006E; Tue, 20 Apr 2021 15:09:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5CA16B0070; Tue, 20 Apr 2021 15:09:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id C24356B006C for ; Tue, 20 Apr 2021 15:09:41 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 75CAC181AEF1F for ; Tue, 20 Apr 2021 19:09:41 +0000 (UTC) X-FDA: 78053684562.31.8EF1D09 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id AD359A0003A6 for ; Tue, 20 Apr 2021 19:09:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618945780; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ADqf//8Mzh80P94odXSMaff1QLcxK4pUkpeVBlMf/kw=; b=Z7XUz03et5tfTeRPJwsK9pBu1rrv5Xs/jJ+QbH4MWjH8VCqgn8HhvJ0bJUfoKWHaaJ+hi1 ho06RQb3BkJDbsJ4rFXsscVFy5RfN7W8w3Ym2RZkcjN82HSFtIQbQH3HxMalAt/454+tQ6 7AjjElgRh+J35R7bK/8vDNKWuPL8mg0= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-268-EPuCHoxKPUmGEylRlvHNkg-1; Tue, 20 Apr 2021 15:09:05 -0400 X-MC-Unique: EPuCHoxKPUmGEylRlvHNkg-1 Received: by mail-qt1-f200.google.com with SMTP id v1-20020a05622a1301b02901b8238285bbso6297968qtk.0 for ; Tue, 20 Apr 2021 12:09:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=ADqf//8Mzh80P94odXSMaff1QLcxK4pUkpeVBlMf/kw=; b=qlz3rinXFhtG/KwRmDuGUQEHdTiTDulWll0XZPB0yuAQKXkndkmnsFqxq4J24cdVOT YEi/uwMV9StIQ7A8KAllIliedDrMDogliTbxDM24T0enhCiqLICK4w10c+GxvXa6p4mC iyOAKFwo1Xg8L/fMzyjGPBnkyQQzq+jOrUdzjXXj5TZvmlo8aALa0wZKKpcdjtyzRpGc f4twjP76LCr0tSxCsPY7Bt95HEYRo+wI9URoXFGyDUFtT/IsumgrnDyb/ylJy4n8Bi4R rnQn721MiwpuDOt1i+3mPsdY4BvruGpF7QXTmTG5x23LfLsPoMEy1hIlfHc5pqn6jUJw AFnQ== X-Gm-Message-State: AOAM532PbbPbTzZnHjE1DZfXOUXErE+0v4XrtEs2LeGkKeuzY6rBBcdI LlRwKU5+wXKMy6F3gq4oPku+FB1E15RuNzIeqCTr/LBSnuNc37zREm8dgBNJpdMhdp4GnG7/oOk VAGBF3Jj88Ks= X-Received: by 2002:a37:9586:: with SMTP id x128mr18596294qkd.61.1618945744531; Tue, 20 Apr 2021 12:09:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwCgyJTAXtU75ZtsvsaHDCgL3huwabplSfyYMhmJMHY2x0E7jTHUedXNA4KApy+WW0YuNx3xQ== X-Received: by 2002:a37:9586:: with SMTP id x128mr18596260qkd.61.1618945744221; Tue, 20 Apr 2021 12:09:04 -0700 (PDT) Received: from llong.remote.csb ([2601:191:8500:76c0::cdbc]) by smtp.gmail.com with ESMTPSA id r5sm11751930qtp.75.2021.04.20.12.09.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 20 Apr 2021 12:09:03 -0700 (PDT) From: Waiman Long X-Google-Original-From: Waiman Long Subject: Re: [PATCH v4 4/5] mm/memcg: Save both reclaimable & unreclaimable bytes in object stock To: Johannes Weiner Cc: Michal Hocko , Vladimir Davydov , Andrew Morton , Tejun Heo , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Muchun Song , Alex Shi , Chris Down , Yafang Shao , Wei Yang , Masayoshi Mizuma , Xing Zhengjun , Matthew Wilcox References: <20210419000032.5432-1-longman@redhat.com> <20210419000032.5432-5-longman@redhat.com> Message-ID: <73992e36-376f-b7d3-dde5-d287c7696e72@redhat.com> Date: Tue, 20 Apr 2021 15:09:01 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=llong@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: AD359A0003A6 X-Stat-Signature: ea5og5sfisaefafexjqg86534m5ps96p Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618945778-768352 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/19/21 12:55 PM, Johannes Weiner wrote: > On Sun, Apr 18, 2021 at 08:00:31PM -0400, Waiman Long wrote: >> Currently, the object stock structure caches either reclaimable vmstat >> bytes or unreclaimable vmstat bytes in its object stock structure. The >> hit rate can be improved if both types of vmstat data can be cached >> especially for single-node system. >> >> This patch supports the cacheing of both type of vmstat data, though >> at the expense of a slightly increased complexity in the caching code. >> For large object (>=3D PAGE_SIZE), vmstat array is done directly witho= ut >> going through the stock caching step. >> >> On a 2-socket Cascade Lake server with instrumentation enabled, the >> miss rates are shown in the table below. >> >> Initial bootup: >> >> Kernel __mod_objcg_state mod_objcg_state %age >> ------ ----------------- --------------- ---- >> Before patch 634400 3243830 19.6% >> After patch 419810 3182424 13.2% >> >> Parallel kernel build: >> >> Kernel __mod_objcg_state mod_objcg_state %age >> ------ ----------------- --------------- ---- >> Before patch 24329265 142512465 17.1% >> After patch 24051721 142445825 16.9% >> >> There was a decrease of miss rate after initial system bootup. However= , >> the miss rate for parallel kernel build remained about the same probab= ly >> because most of the touched kmemcache objects were reclaimable inodes >> and dentries. >> >> Signed-off-by: Waiman Long >> --- >> mm/memcontrol.c | 79 +++++++++++++++++++++++++++++++----------------= -- >> 1 file changed, 51 insertions(+), 28 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index c13502eab282..a6dd18f6d8a8 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -2212,8 +2212,8 @@ struct obj_stock { >> struct obj_cgroup *cached_objcg; >> struct pglist_data *cached_pgdat; >> unsigned int nr_bytes; >> - int vmstat_idx; >> - int vmstat_bytes; >> + int reclaimable_bytes; /* NR_SLAB_RECLAIMABLE_B */ >> + int unreclaimable_bytes; /* NR_SLAB_UNRECLAIMABLE_B */ > How about > > int nr_slab_reclaimable_b; > int nr_slab_unreclaimable_b; > > so you don't need the comments? Sure, will make the change. >> #else >> int dummy[0]; >> #endif >> @@ -3217,40 +3217,56 @@ void mod_objcg_state(struct obj_cgroup *objcg,= struct pglist_data *pgdat, >> enum node_stat_item idx, int nr) >> { >> unsigned long flags; >> - struct obj_stock *stock =3D get_obj_stock(&flags); >> + struct obj_stock *stock; >> + int *bytes, *alt_bytes, alt_idx; >> + >> + /* >> + * Directly update vmstat array for big object. >> + */ >> + if (unlikely(abs(nr) >=3D PAGE_SIZE)) >> + goto update_vmstat; > This looks like an optimization independent of the vmstat item split? It may not be that helpful. I am going to take it out in the next version= . > >> + stock =3D get_obj_stock(&flags); >> + if (idx =3D=3D NR_SLAB_RECLAIMABLE_B) { >> + bytes =3D &stock->reclaimable_bytes; >> + alt_bytes =3D &stock->unreclaimable_bytes; >> + alt_idx =3D NR_SLAB_UNRECLAIMABLE_B; >> + } else { >> + bytes =3D &stock->unreclaimable_bytes; >> + alt_bytes =3D &stock->reclaimable_bytes; >> + alt_idx =3D NR_SLAB_RECLAIMABLE_B; >> + } >> =20 >> /* >> - * Save vmstat data in stock and skip vmstat array update unless >> - * accumulating over a page of vmstat data or when pgdat or idx >> + * Try to save vmstat data in stock and skip vmstat array update >> + * unless accumulating over a page of vmstat data or when pgdat >> * changes. >> */ >> if (stock->cached_objcg !=3D objcg) { >> /* Output the current data as is */ >> - } else if (!stock->vmstat_bytes) { >> - /* Save the current data */ >> - stock->vmstat_bytes =3D nr; >> - stock->vmstat_idx =3D idx; >> - stock->cached_pgdat =3D pgdat; >> - nr =3D 0; >> - } else if ((stock->cached_pgdat !=3D pgdat) || >> - (stock->vmstat_idx !=3D idx)) { >> - /* Output the cached data & save the current data */ >> - swap(nr, stock->vmstat_bytes); >> - swap(idx, stock->vmstat_idx); >> + } else if (stock->cached_pgdat !=3D pgdat) { >> + /* Save the current data and output cached data, if any */ >> + swap(nr, *bytes); >> swap(pgdat, stock->cached_pgdat); >> + if (*alt_bytes) { >> + __mod_objcg_state(objcg, pgdat, alt_idx, >> + *alt_bytes); >> + *alt_bytes =3D 0; >> + } > As per the other email, I really don't think optimizing the pgdat > switch (in a percpu cache) is worth this level of complexity. I am going to simplify it in the next version. > >> } else { >> - stock->vmstat_bytes +=3D nr; >> - if (abs(stock->vmstat_bytes) > PAGE_SIZE) { >> - nr =3D stock->vmstat_bytes; >> - stock->vmstat_bytes =3D 0; >> + *bytes +=3D nr; >> + if (abs(*bytes) > PAGE_SIZE) { >> + nr =3D *bytes; >> + *bytes =3D 0; >> } else { >> nr =3D 0; >> } >> } >> - if (nr) >> - __mod_objcg_state(objcg, pgdat, idx, nr); >> - >> put_obj_stock(flags); >> + if (!nr) >> + return; >> +update_vmstat: >> + __mod_objcg_state(objcg, pgdat, idx, nr); >> } >> =20 >> static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int= nr_bytes) >> @@ -3303,12 +3319,19 @@ static void drain_obj_stock(struct obj_stock *= stock) >> /* >> * Flush the vmstat data in current stock >> */ >> - if (stock->vmstat_bytes) { >> - __mod_objcg_state(old, stock->cached_pgdat, stock->vmstat_idx, >> - stock->vmstat_bytes); >> + if (stock->reclaimable_bytes || stock->unreclaimable_bytes) { >> + int bytes; >> + >> + if ((bytes =3D stock->reclaimable_bytes)) >> + __mod_objcg_state(old, stock->cached_pgdat, >> + NR_SLAB_RECLAIMABLE_B, bytes); >> + if ((bytes =3D stock->unreclaimable_bytes)) >> + __mod_objcg_state(old, stock->cached_pgdat, >> + NR_SLAB_UNRECLAIMABLE_B, bytes); > The int bytes indirection isn't necessary. It's easier to read even > with the extra lines required to repeat the long stock member names, > because that is quite a common pattern (if (stuff) frob(stuff)). OK, I will eliminate the bytes variable. > > __mod_objcg_state() also each time does rcu_read_lock() toggling and a > memcg lookup that could be batched, which I think is further proof > that it should just be inlined here. > I am also thinking that eliminate unnecessary=20 rcu_read_lock/rcu_read_unlock may help performance a bit. However, that=20 will be done in another patch after I have done more performance=20 testing. I am=C2=A0 not going to bother with that in this series. Cheers, Longman