From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87E82C3DA63 for ; Wed, 24 Jul 2024 16:11:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCB196B0088; Wed, 24 Jul 2024 12:11:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B7A676B0089; Wed, 24 Jul 2024 12:11:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A42EA6B008A; Wed, 24 Jul 2024 12:11:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 86D746B0088 for ; Wed, 24 Jul 2024 12:11:58 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2C8EE40422 for ; Wed, 24 Jul 2024 16:11:58 +0000 (UTC) X-FDA: 82375137516.14.6779CC3 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf29.hostedemail.com (Postfix) with ESMTP id 5D383120026 for ; Wed, 24 Jul 2024 16:11:54 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=ZWMF93No; dmarc=pass (policy=reject) header.from=vimeo.com; spf=pass (imf29.hostedemail.com: domain of davidf@vimeo.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=davidf@vimeo.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721837461; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ukA2Zx3b8wZGOkUcJ2HRKI3kn32vcEPpsjBc+UAQhVc=; b=0OgV8v+gvq8FLm+d/C/sXRdltunm7ST3TXssNibN3HZkAM9XnlOq+FrEp8o7k1FDzXrYnR 36ad5rGh8nhl+xtZk+U7j3yjf5Ab7CYUx462bTm+FAubWhM0JgMdaOcZFPg8me5Q2z4dcq K2dRfjcF0eCHL6ngfwgFooSDpzT3IsM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721837461; a=rsa-sha256; cv=none; b=oVOq6oPLIc6UUAlfP8nGXlR/MTmPbPzYxGQn4tPOzpB27GHdtqlMxGs0yFMNe3irupDTHP ctFzVAvs0Mp/8Vz/JVar0LHxvjpk0uge3oqVw4XWlStca3yJLVIcnFpqLqRbmUyO2c6f/H fJ7UVHp6dPOOJ9AnIZbA+UnbWaG3B/Q= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=ZWMF93No; dmarc=pass (policy=reject) header.from=vimeo.com; spf=pass (imf29.hostedemail.com: domain of davidf@vimeo.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=davidf@vimeo.com Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-70d150e8153so4157b3a.0 for ; Wed, 24 Jul 2024 09:11:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vimeo.com; s=google; t=1721837513; x=1722442313; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ukA2Zx3b8wZGOkUcJ2HRKI3kn32vcEPpsjBc+UAQhVc=; b=ZWMF93NoXJyXGn26Uo1Nf7OFoalix4LpIuEp3nyxJQpitXa9uO72zdBHyih1wvhL1q U5MDZJfm74/jiloAO0jB16Omgo8yicVzBQHQBDGsOBEzDVSAmmTJkM3IN6t2RfQ5MjDR dVGISkbfCSuGYfMdSVCuAq+sMtLh74oNaYs8w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721837513; x=1722442313; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ukA2Zx3b8wZGOkUcJ2HRKI3kn32vcEPpsjBc+UAQhVc=; b=RPzgmfhzKx7wdhM59k9UlwQFlxQv3vk1VHeCSbu0azvGsy0XsGmkmunoWZKj+xVXzD mttnbecv9F1mcz7vu9AKUeKQt6HJf+nRgjYQu8V4kps1kJ0ke2Cm8KVlDh0lBrjtH5Uk zKQwJhIew8SeNe2QOWxyDen3j5FqmqWF91Ue/2LrQoY+sl2L3Ey+pbmi2ZCAEQSb3atK fQp3cevXkEtRwX77/1mgXDuRQuvbawyTF3qeauD/eXdK1NeTq/0aWwc2Pl6Jlfu+ZaiI 9wD8P39RhknAgXeqJmUnrJHrXtZAeWFpjYRlMg+kR/SzHhDVn2p5cwVB/uI0Uu/u7Gg4 IKgQ== X-Forwarded-Encrypted: i=1; AJvYcCXeQka39L2vGxu/pQGL+lbm+M2tq72JIx6ttvPD2+IORGgAXAGqteKSYGlsmCr5xhpvSRKsyxkSyz3bHBVGP7LWpwI= X-Gm-Message-State: AOJu0Yyi4DopvXJdV7sMAUruPiFHrrXAfWrWq+niz3OBYV/zAOPJCW5I 2APw6hOMZvZyGc+umPmmdBI5RfZrdECe1Qxpd0A2nOfUgW840sPbmz072HrVM1QjUJNRDXcveTV 6aYE+903lvsGpdeMfckN3wyfblG+nHoNiNyoP5A== X-Google-Smtp-Source: AGHT+IFft/rd5GJFaDCp1Z+xZGOXG6YGzwJeUwPo+FT5WjcIvrkdnSROQL9L0yZcM6+GlwIV+LphZ/daRpk+t+KK0sM= X-Received: by 2002:a05:6a00:6f60:b0:708:41c4:8849 with SMTP id d2e1a72fcca58-70eaa220707mr165997b3a.9.1721837512611; Wed, 24 Jul 2024 09:11:52 -0700 (PDT) MIME-Version: 1.0 References: <20240723233149.3226636-1-davidf@vimeo.com> <20240723233149.3226636-2-davidf@vimeo.com> <22a95c76-4e9e-482e-b95d-cb0f01971d98@redhat.com> <20240724114905.GB389003@cmpxchg.org> In-Reply-To: <20240724114905.GB389003@cmpxchg.org> From: David Finkel Date: Wed, 24 Jul 2024 12:11:41 -0400 Message-ID: Subject: Re: [PATCH 1/2] mm, memcg: cg2 memory{.swap,}.peak write handlers To: Johannes Weiner Cc: Waiman Long , Muchun Song , Tejun Heo , Roman Gushchin , Andrew Morton , core-services@vimeo.com, Jonathan Corbet , Michal Hocko , Shakeel Butt , Shuah Khan , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5D383120026 X-Stat-Signature: r5s8f9drtnzykoi4w8gkfhuyhi3ext15 X-Rspam-User: X-HE-Tag: 1721837514-398745 X-HE-Meta: U2FsdGVkX1+0ulnD76npHd3XW9kafjW0LIQ5yoHGvMD54g0DTnPMMFaop0bhf0vkCLQbssQHZXlQLip6nHMUWMF0/DnNA9V275oeD/9YOcP4/C1jYduMx/V86SQhwcejrjW+80Oto3M+M80HfNhKaNavDsN0wrBV364GgaQh3vLcYwHhv76H1O6Au2Jw0a9wVvuEWHFwj6+w7e6TkYkblO4B0rbD6HiqK8ZI4TZ+XbipeW/32Z6EKz27NlnSqP+W1De5+6rVm6sN3z1EI76bHobDpsi6njJKrsl2hpkKJp9LjkBYegy+QXzQf2kJF57qBDZhO/OdNdGdrbWz7Ua41H4Rsnf05aOkSpj+/sSCQ4fkysDRZqittT1/mfccYIh3nKmdixn4KOv0iHD0ILUqnRXGbGnRF1jF279dV4FNXsrU8a+GQntS+lMOtlh5nbVA6GVdyVW2+LEl5P2y4q/LzfmBGpo4tagmegyvUGN6KCt8p+2YmZxF0iYpx/JnT/7byNy2j1MDsVDuC8toTdQ7igqWSyHHhqAyC1giw/j0DUU5X4jwZkV58zo6v0RL+KJ2dTaU7JKdhmoY8aZsZBIrO5RALaEjFcICYdf54yQUCAKvOeKESV/+IiSwPIvD+r01U3kr1tjls1hC3WDxVOrIgW2k20E1ntcTqTTvcyxAviMfc2nypBEvVcxT8f1s11mvYAbm/qYRUEjcP2u9HqlGnejzcuXmUw9AAPqvYtEhsfgRxCY/mtsNFneNRD5exgrDi17eTiRuqppZwgpATq0o85URAFtkIgT1gjXoyafM1DpFdcQqbZ6g60rkxjzdEGgGK3yuD/oW7LOioPlvZt+oy6AuI2Ft/TnniPazof5XAWTxEF6/gUk4e/LAZytjQlQ87jE+P6qStDzrUqHKLXhr7dcR6ccAXUk0pNbNd8ongsz3qQiwhu3mIvZncGs/iUIvma/30yReKrM7vJn7+we MeMS5hhj yzLhE8hjrOnOepSrFXi0yl/GWqnVAajYOmctnURE4kaNvY/3aS2OA3UPqtXJLchrmNtWR/AnQ2ZMy8Sj/9zPxAE0uQ66i7qKM+AqY+wtzH0hKEp8QYiUv23GlNvAZdXXToq2G30tnTdhd/NJifpOkD4UFOJs90Ot/NesT1VgwAlBUDjfcON2aVM4NsB6QZV4pGo/nOgFPRhiZr0uOqj/Hhl+j4Xhx4ng50O7FSJJVVba98QBLasKzM6K3BsZ0VFkfM/jl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 24, 2024 at 7:49=E2=80=AFAM Johannes Weiner wrote: > > On Tue, Jul 23, 2024 at 09:55:19PM -0400, Waiman Long wrote: > > Could you use the "-v " option of git-format-patch to add a version > > number to the patch title? Without that, it can be confusing as to > > whether the patch is new or a resend of the previous one. > > +1 Sorry, I forgot that that flag exists. I'll use that with the next patch. (which I'll send out shortly) > > > > @@ -775,6 +775,11 @@ struct cgroup_subsys { > > > > > > extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; > > > > > > +struct cgroup_of_peak { > > > + long value; > > > + struct list_head list; > > > +}; > > The name "cgroup_of_peak" is kind of confusing. Maybe local_peak? > > It's the peak associated with an 'of' (which is a known concept in > cgroup code), and it pairs up nicely with of_peak(). I'd prefer > keeping that over local_peak. > > > > @@ -26,6 +26,7 @@ struct page_counter { > > > atomic_long_t children_low_usage; > > > > > > unsigned long watermark; > > > + unsigned long local_watermark; /* track min of fd-local resets */ > > track "min"? I thought it is used to track local maximum after a reset. > > Yeah, the comment doesn't sound quite right. Yeah, it's not explicitly the min. At reset-time, it's the current value at reset-time, and all the fd-local watermarks will all be greater than or equal. Which does effectively make it the min of watermarks at the time it's being set by the reset code. However, yeah, the page-charging code will increase it, which makes it not a min. > > However, I think we'd be hard-pressed to explain correctly and > comprehensively what this thing does in <40 characters. > > I'd just remove the comment tbh. Yeah, I definitely didn't think that comment through. Deleting. > > > > @@ -78,7 +79,10 @@ int page_counter_memparse(const char *buf, const c= har *max, > > > > > > static inline void page_counter_reset_watermark(struct page_counter= *counter) > > > { > > > - counter->watermark =3D page_counter_read(counter); > > > + unsigned long usage =3D page_counter_read(counter); > > > + > > > + counter->watermark =3D usage; > > > + counter->local_watermark =3D usage; > > > } > > > > > > > Could you set the local_watermark first before setting watermark? There > > is a very small time window that the invariant "local_watermark <=3D > > watermark" is not true. > > Does it matter? Only cgroup1 supports global resets; only cgroup2 > supports local peaks watching. This doesn't add anything to the race > that already exists between reset and global watermark update on cg1. > Hmm, since the global watermark update is now conditional on both watermark= s being <=3D the current usage, it does make sense. Witht that said, since we're assigning without any barriers, as-is, the CPU and compiler are quite free to re-order them anyway. I've swapped them and added a comment. > > > @@ -3950,12 +3955,90 @@ static u64 memory_current_read(struct cgroup_= subsys_state *css, > > > return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE; > > > } > > > > > > -static u64 memory_peak_read(struct cgroup_subsys_state *css, > > > - struct cftype *cft) > > > +static int peak_show(struct seq_file *sf, void *v, struct page_count= er *pc) > > > { > > > - struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); > > > + struct cgroup_of_peak *ofp =3D of_peak(sf->private); > > > + s64 fd_peak =3D ofp->value, peak; > > > + > > > + /* User wants global or local peak? */ > > > + if (fd_peak =3D=3D -1) > > > + peak =3D pc->watermark; > > > + else > > > + peak =3D max(fd_peak, (s64)pc->local_watermark); > > Should you save the local_watermark value into ofp->value if > > local_watermark is bigger? This will ensure that each successive read o= f > > the fd is monotonically increasing. Otherwise the value may go up or > > down if there are multiple resets in between. > > The reset saves local_watermark into ofp->value if it's bigger..? > > I do see another problem, though. The compiler might issue multiple > reads to ofp->value in arbitrary order. We could print max(-1, ...) > which is nonsense. Saving ofp->value into a local variable is the > right idea, but the compiler might still issue two reads anyway. It > needs a READ_ONCE() to force a single read. Thanks, I didn't realize the compiler had the latitude to decide to read from that struct field a second time when referencing the local variable. Added. > > I'd use unsigned long for fd_peak. This way the "specialness" is on > the -1UL comparison. The max() must be between two positive numbers, > so the (s64) there is confusing. I've switched fd_peak to `u64`. Thanks again, -- David Finkel Senior Principal Software Engineer, Core Services