From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2603C2BA2B for ; Thu, 9 Apr 2020 15:25:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6EC302078E for ; Thu, 9 Apr 2020 15:25:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6EC302078E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E89DF8E000E; Thu, 9 Apr 2020 11:25:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E39918E0006; Thu, 9 Apr 2020 11:25:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D50028E000E; Thu, 9 Apr 2020 11:25:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id CA5AC8E0006 for ; Thu, 9 Apr 2020 11:25:43 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 993E4499612 for ; Thu, 9 Apr 2020 15:25:43 +0000 (UTC) X-FDA: 76688691366.13.voice56_894326cc2ca10 X-HE-Tag: voice56_894326cc2ca10 X-Filterd-Recvd-Size: 6651 Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Apr 2020 15:25:43 +0000 (UTC) Received: by mail-wr1-f65.google.com with SMTP id v5so12365524wrp.12 for ; Thu, 09 Apr 2020 08:25:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=PHqFG+zPr67l6WAhAnHtOaS5fGVxsmLJYOWvTgqFL40=; b=Ra0xgWY6ldGEwOREPrYrqOlACByQ/4banCeyaRCUUI073OtiNl2ng03ICNEitPhiqD 1Xsc+wWfk4x5S1KKw/iU38qkT9dCzcOr8SFYzxnZhabS4QrTn+Qwn76t2ZWxjGMcQPFy DNf79AblAJwkujfUg4lhaOSNol+fmtfSIuRqExt8flZHGFzHHertYLY0FCd8rNrHE8s/ j80PmOkpHsCToNKT6mu0A3WNz9xx8IG1010jn3Lf88JubCHNXsWyxltprYSDEGSGVK4D EuprhCR9Cg2DBHa+c4L4AjDuy3BnR3pT8WUIDA64+T5+W1O9DmIqluMBIsMqQDq5sj99 ft0w== X-Gm-Message-State: AGi0PuZgSaG5s/sgfi0IUaHGrdTeLABNl9McqPUGQRAdA5dVcMCY7VoY vUVCiI2arVVZR4+kIARNWrw= X-Google-Smtp-Source: APiQypIThHSyLxh1zmyzhXTWWzDpswLnWYjvkGJ6UomuRlmNgq5mctaCVeu0hIMRJaXSWe4EY9QMtA== X-Received: by 2002:adf:e6ce:: with SMTP id y14mr5820087wrm.45.1586445942075; Thu, 09 Apr 2020 08:25:42 -0700 (PDT) Received: from localhost (ip-37-188-180-223.eurotel.cz. [37.188.180.223]) by smtp.gmail.com with ESMTPSA id m13sm10706079wrx.40.2020.04.09.08.25.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Apr 2020 08:25:41 -0700 (PDT) Date: Thu, 9 Apr 2020 17:25:40 +0200 From: Michal Hocko To: Bruno =?iso-8859-1?Q?Pr=E9mont?= Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Vladimir Davydov , Chris Down Subject: Re: Memory CG and 5.1 to 5.6 uprade slows backup Message-ID: <20200409152540.GP18386@dhcp22.suse.cz> References: <20200409112505.2e1fc150@hemera.lan.sysophe.eu> <20200409094615.GE18386@dhcp22.suse.cz> <20200409121733.1a5ba17c@hemera.lan.sysophe.eu> <20200409103400.GF18386@dhcp22.suse.cz> <20200409170926.182354c3@hemera.lan.sysophe.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20200409170926.182354c3@hemera.lan.sysophe.eu> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 09-04-20 17:09:26, Bruno Pr=E9mont wrote: > On Thu, 9 Apr 2020 12:34:00 +0200Michal Hocko wrote: >=20 > > On Thu 09-04-20 12:17:33, Bruno Pr=E9mont wrote: > > > On Thu, 9 Apr 2020 11:46:15 Michal Hocko wrote: =20 > > > > [Cc Chris] > > > >=20 > > > > On Thu 09-04-20 11:25:05, Bruno Pr=E9mont wrote: =20 > > > > > Hi, > > > > >=20 > > > > > Upgrading from 5.1 kernel to 5.6 kernel on a production system = using > > > > > cgroups (v2) and having backup process in a memory.high=3D2G cg= roup > > > > > sees backup being highly throttled (there are about 1.5T to be > > > > > backuped). =20 > > > >=20 > > > > What does /proc/sys/vm/dirty_* say? =20 > > >=20 > > > /proc/sys/vm/dirty_background_bytes:0 > > > /proc/sys/vm/dirty_background_ratio:10 > > > /proc/sys/vm/dirty_bytes:0 > > > /proc/sys/vm/dirty_expire_centisecs:3000 > > > /proc/sys/vm/dirty_ratio:20 > > > /proc/sys/vm/dirty_writeback_centisecs:500 =20 > >=20 > > Sorry, but I forgot ask for the total amount of memory. But it seems > > this is 64GB and 10% dirty ration might mean a lot of dirty memory. > > Does the same happen if you reduce those knobs to something smaller t= han > > 2G? _bytes alternatives should be useful for that purpose. >=20 > Well, tuning it to /proc/sys/vm/dirty_background_bytes:268435456 > /proc/sys/vm/dirty_background_ratio:0 > /proc/sys/vm/dirty_bytes:536870912 > /proc/sys/vm/dirty_expire_centisecs:3000 > /proc/sys/vm/dirty_ratio:0 > /proc/sys/vm/dirty_writeback_centisecs:500 > does not make any difference. OK, it was a wild guess because cgroup v2 should be able to throttle heavy writers and be memcg aware AFAIR. But good to have it confirmed. [...] > > > > Is it possible that the reclaim is not making progress on too man= y > > > > dirty pages and that triggers the back off mechanism that has bee= n > > > > implemented recently in 5.4 (have a look at 0e4b01df8659 ("mm, > > > > memcg: throttle allocators when failing reclaim over memory.high"= ) > > > > and e26733e0d0ec ("mm, memcg: throttle allocators based on > > > > ancestral memory.high"). =20 > > >=20 > > > Could be though in that case it's throttling the wrong task/cgroup > > > as far as I can see (at least from cgroup's memory stats) or being > > > blocked by state external to the cgroup. > > > Will have a look at those patches so get a better idea at what they > > > change. =20 > >=20 > > Could you check where is the task of your interest throttled? > > /proc//stack should give you a clue. >=20 > As guessed by Chris, it's > [<0>] mem_cgroup_handle_over_high+0x121/0x170 > [<0>] exit_to_usermode_loop+0x67/0xa0 > [<0>] do_syscall_64+0x149/0x170 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >=20 >=20 > And I know no way to tell kernel "drop all caches" for a specific cgrou= p > nor how to list the inactive files assigned to a given cgroup (knowing > which ones they are and their idle state could help understanding why > they aren't being reclaimed). >=20 >=20 >=20 > Could it be that cache is being prevented from being reclaimed by a tas= k > in another cgroup? >=20 > e.g. > cgroup/system/backup > first reads $files (reads each once) > cgroup/workload/bla > second&more reads $files >=20 > Would $files remain associated to cgroup/system/backup and not > reclaimed there instead of being reassigned to cgroup/workload/bla? No, page cache is first-touch-gets-charged. But there is certainly a interference possible if the memory is somehow pinned - e.g. mlock - by a task from another cgroup or internally by FS. Your earlier stat snapshot doesn't indicate a big problem with the reclaim though: memory.stat:pgscan 47519855 memory.stat:pgsteal 44933838 This tells the overall reclaim effectiveness was 94%. Could you try to gather snapshots with a 1s granularity starting before your run your backup to see how those numbers evolve? Ideally with timestamps to compare with the actual stall information. Another option would be to enable vmscan tracepoints but let's try with stats first. --=20 Michal Hocko SUSE Labs