From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CCE9D3C92A for ; Sun, 20 Oct 2024 16:05:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1A326B007B; Sun, 20 Oct 2024 12:05:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCA1F6B0082; Sun, 20 Oct 2024 12:05:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A91D46B0083; Sun, 20 Oct 2024 12:05:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8A8FC6B007B for ; Sun, 20 Oct 2024 12:05:39 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F27DD141232 for ; Sun, 20 Oct 2024 16:05:23 +0000 (UTC) X-FDA: 82694455494.25.F330DC7 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) by imf26.hostedemail.com (Postfix) with ESMTP id B483114001A for ; Sun, 20 Oct 2024 16:05:26 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Tc7Ls2AS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729440287; a=rsa-sha256; cv=none; b=EI5hZnU0vN1R3h0DSBecFKZ4aortLMCz0hCy1JZOT0SeCdOYylHqNChqd6R/kJ6ZLvpapd 5pda6WoKXJhp/5178qeRbvwdHTxDjg/wjYBXnpjYVSUnY+7eaHk/B2gHj8reMcunJX7Tms ovDrRSOaSpjFQPPqswHmgZFNC3swSis= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Tc7Ls2AS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729440287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zJQ7NzBQmKMeNc7M9zrhC1o48BxNYFY3M9Ix09latD8=; b=PJ501zvxgiuGFvc9zPjK9GPjSTeSqdagsThZBx8vDrfJ93uRna9dJcs5gdIXFYV+cd0EK4 iqtxGQnnLBxpxnjzQF/IdsnfI08lsQXa+DDbjhi7gnhFyBHCZI32MSN6rKUYhHNWKyI0nc 0olMdnKg23L71lMjOUGCNNrqxAha23A= Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-37d47b38336so2504186f8f.3 for ; Sun, 20 Oct 2024 09:05:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729440335; x=1730045135; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zJQ7NzBQmKMeNc7M9zrhC1o48BxNYFY3M9Ix09latD8=; b=Tc7Ls2ASEgcnN51GP8/6K6mcYg0yfouuhawvxBYthBU4F1eXWrNydnEpgnTMv+tj20 ApzYiaWo67STqTi5daoP59Ew418Kv3B18M581+5k/DRVsaYe9T0Lcg37cOSudu8FTHc9 jOztKxRG+xFa4p6g+p8J8Qw8j15+ejKVZYhJqSChO7R4O3dSJSAf3zeutVrvVSJ3N8JE dwIlBNrxxRtbd0wXTf406kLUlowdyqY2AVaRmAiXvsQsXlR8A3RnhRCVyTo41ZtpgCqh YangaR3VIcDjZAxei28KoK3qCK2WpeZpo0jGwudrCVAXSzoqnjfcuCIuVVxOmDY+NACC qITg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729440335; x=1730045135; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zJQ7NzBQmKMeNc7M9zrhC1o48BxNYFY3M9Ix09latD8=; b=mTjh9pa/hdx2CTdj7DcYNc5XxrkPfP+hRdTZDdvwEzfP+7nujLTK3wmej9QRuggrWw AoxzRL4z1jKAbwUyudDpj0D7ejBE60rjaBZO4mrXP4sbnPyABy6WwqzlEnKkyDzEmfCF De9iIwgAWHvvF0Noz3RMsvvLIsJBiLU0h18EH2PhMOSLzVdtNY1o3meqnRxmXja3XJx2 dFXzWZegmJ3Z9Q1SYP1VjiLvDMpc/Xhd9TQ8kbbfk/NFwoU64XE0cOb7KA/d8KAVFXku JfpcPVrkD3T/aPqDurgdhQmnVM+mCIJUBGS6hE5S7apMwwEZwAOeJP8lsjaSu1wF72Mz 3YXQ== X-Forwarded-Encrypted: i=1; AJvYcCWpM68z5guAZQU0PGCldpf2u6yvhWb+8u6kTnDTH5Ch797gpBxzQQ9048T9Biw76slqc5e2xeokJg==@kvack.org X-Gm-Message-State: AOJu0YzTn4eVZX7WhWdwSsUZdMiZlMP/tHmUWM9jOf8AmSx8e6I/q3pU nKr+YYdy+YjuvGC8ZsiKosMVNZt/3dl7lqsgCt32IOUlp2j7doVcpS99hiKWa+i63ixWTLoVpks eggjBLPliFsgK/+kLMOr2K0ASSa0= X-Google-Smtp-Source: AGHT+IFZ/Fy2Ii2u5SIrL80Yn1WMaw2/2zE2RJB2oGdpgKR21e8zwfzBaVyl3DbRVRkWkeI1wZdNcW9hHLkkRdWvQGo= X-Received: by 2002:adf:f9d0:0:b0:374:c4e2:3ca7 with SMTP id ffacd0b85a97d-37eab4d13b7mr4964756f8f.5.1729440335166; Sun, 20 Oct 2024 09:05:35 -0700 (PDT) MIME-Version: 1.0 References: <20241018105351.1960345-1-linyunsheng@huawei.com> <20241018105351.1960345-11-linyunsheng@huawei.com> In-Reply-To: From: Alexander Duyck Date: Sun, 20 Oct 2024 09:04:58 -0700 Message-ID: Subject: Re: [PATCH net-next v22 10/14] mm: page_frag: introduce prepare/probe/commit API To: Yunsheng Lin Cc: Yunsheng Lin , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: B483114001A X-Rspamd-Server: rspam01 X-Stat-Signature: wkxxhzsrtcmg8yd7hcc7coaixk4ioww8 X-HE-Tag: 1729440326-852672 X-HE-Meta: U2FsdGVkX1+0ADwqcQ+j9U7LtfauYOulMc7JWLFYSOLKMaSom4oCbU0ZilTwWpY0iJyL+F2FFUv+vDQwGxFEaaCjGPeR9DfGScEzgexZD+TWp+S+b5aZlJHMTj7bno54SQ/7bf8dC/n3UIPRsTm10KxXA63rCmUwIeMfubfPHU3k7ZQrw9EPmr+3q/QSGsTWpUr57CiO9TAPDHMxEHtIglquWww64ZzvtAWstgx46ds9aAy6+d8OcOU48vi6uyJ4SuOdJHs1wouzSTZrTMnOgmMk0elJKcjZj08ZL1lYZSKoL/Q1/I1tD69Spd5B+J9jauv+Rq0W1aWliWU7wjutJEBRobe1rdFzGSH5ePkwB2pFI20jNkRWjsJaSnrzMlmf7XagO7wwzZuRf8bWkcglTbV+Uu5YQC2UMYrf//0SHcwjEEZcPX7ibvViMF36aBAhH8oAcv/Xd4dqPQvSWQZrO0X9XZX8wrsFsxIEqPgRAQo/+AQcFFSXqFNrBT5YSViG4vb9v++J3brwhHBm2urqd0gdXPR7dxwgyyhOOQfIGHbxNkwfkDlzhwMC3LFpCuiOLIYGFi/rKJj47SUXbMDDyNV9cPAjHfKgb5ugJnaGDWj5yK0a5mEtlh4D4RBIieJRWrnTX3IPMD0Xr2Kb3rP8NdUCF6aaJEYTkrHZ/jyutjNNeolAQ/6Bx4eoDYJPPCRJFJJlrAk1/tN+uOcnSPi0Pob3fHCtymO6sZElQsTmUtMpAvSa+4ajbtTb7Ud2LgR/H9rTn7cOh1yuc9xO9JBtx2F1XsXNVpHS9nh/y4dkBSOC3tz6UKTrGpAXM7Nb8gGlHZIlNWmar1hxfMymx6uhXpDJ6RBMv85BHEnfeCrgfmCzdEZxrmrizVPVvjrqv2PgftdUPZp//LYHElB6oN0NYLbZF3Yd/qw3dM8d+I38FqwkFkf3fYGX1A/eJebpRBeuyetGBOYl9uc+v4SbjOv qVUMA59n QaBRMwL0094eqPZKsBp44ysxQane/ExBIjD8mZEhQUEj8SISt4PI8GkYRceXoZdWAYPkj753GxhOMn8bDjgvELcxR/Ea3TpaTIhDXfEt9I5EJvm3V43TWD6KxrA5Kj0fgzAdtQI8jfKuQoq3SPvfo8rgahIsaj5lwvLxg7kAmGh4DP1GxlLEwuS3kotNC7K7wfyGSzMF3Ey/2LsN1XzsP4xc/+s/Xf3mnWDs3AxfiB06WH3caX841yyQSHHHMQl359KYj3lIv5ttqS3kxbUqUf8UPErGA4tyEdZz68tGZ9ux4bA29mmzgn7zUnwjyWPxffXYpvUE1EMidqwvTmRcS6qMIHw+4h8sQLrILw4ZVY4rfZ+k8x3EhTiGOsP1D7riqUWf7K9YdEs3S5ur5YRBUc1f8uIBIPCoXUSeWjkrm+M1/rRg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Oct 19, 2024 at 1:33=E2=80=AFAM Yunsheng Lin wrote: > > On 10/19/2024 2:03 AM, Alexander Duyck wrote: > > > > > Not a huge fan of introducing a ton of new API calls and then having > > to have them all applied at once in the follow-on patches. Ideally the > > functions and the header documentation for them would be introduced in > > the same patch as well as examples on how it would be used. > > > > I really think we should break these up as some are used in one case, > > and others in another and it is a pain to have a pile of abstractions > > that are all using these functions in different ways. > > I am guessing this patch may be split into three parts to make it more > reviewable and easier to discuss here: > 1. Prepare & commit related API, which is still the large one. > 2. Probe API related API. In my mind the first two listed here are much more related to each other than this abort api. > 3. Abort API. I wonder if we couldn't look at introducing this first as it is actually closer to the existing API in terms of how you might use it. The only spot of commonality I can think of in terms of all these is that we would need to be able to verify the VA, offset, and size. I partly wonder if for our page frag API we couldn't get away with passing a virtual address instead of a page for the page frag. It would save us having to do the virt_to_page or page_to_virt when trying to verify a commit or a revert. > And it is worthing mentioning that even if this patch is split into more > patches, it seems impossible to break patch 12 up as almost everything > related to changing "page_frag" to "page_frag_cache" need to be one > patch to avoid compile error. That is partly true. One issue is that there are more changes there than just changing out the page APIs. It seems like you went in performing optimizations as soon as you were changing out the page allocation method used. For example one thing that jumps out at me was the removal of linear_to_page and its replacement with spd_fill_linear_page which seems to take on other pieces of the function as well as you made it a return path of its own when that section wasn't previously. Ideally changing out the APIs used should be more about doing just that and avoiding additional optimization or deviations from the original coded path if possible. > > > >> +static inline void page_frag_alloc_abort(struct page_frag_cache *nc, > >> + unsigned int fragsz) > >> +{ > >> + VM_BUG_ON(fragsz > nc->offset); > >> + > >> + nc->pagecnt_bias++; > >> + nc->offset -=3D fragsz; > >> +} > >> + > > > > We should probably have the same checks here you had on the earlier > > commit. We should not be allowing blind changes. If we are using the > > commit or abort interfaces we should be verifying a page frag with > > them to verify that the request to modify this is legitimate. > > As an example in 'Preparation & committing API' section of patch 13, the > abort API is used to abort the operation of page_frag_alloc_*() related > API, so 'page_frag' is not available for doing those checking like the > commit API. For some case without the needing of complicated prepare & > commit API like tun_build_skb(), the abort API can be used to abort the > operation of page_frag_alloc_*() related API when bpf_prog_run_xdp() > returns XDP_DROP knowing that no one else is taking extra reference to > the just allocated fragment. > > +Allocation & freeing API > +------------------------ > + > +.. code-block:: c > + > + void *va; > + > + va =3D page_frag_alloc_align(nc, size, gfp, align); > + if (!va) > + goto do_error; > + > + err =3D do_something(va, size); > + if (err) { > + page_frag_alloc_abort(nc, size); > + goto do_error; > + } > + > + ... > + > + page_frag_free(va); > > > If there is a need to abort the commit API operation, we probably call > it something like page_frag_commit_abort()? I would argue that using an abort API in such a case is likely not valid then. What we most likely need to be doing is passing the va as a part of the abort request. With that we should be able to work our way backwards to get back to verifying the fragment came from the correct page before we allow stuffing it back on the page. > > > >> void page_frag_free(void *addr); > >> > >> #endif > >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > >> index f55d34cf7d43..5ea4b663ab8e 100644 > >> --- a/mm/page_frag_cache.c > >> +++ b/mm/page_frag_cache.c > >> @@ -112,6 +112,27 @@ unsigned int __page_frag_cache_commit_noref(struc= t page_frag_cache *nc, > >> } > >> EXPORT_SYMBOL(__page_frag_cache_commit_noref); > >> > >> +void *__page_frag_alloc_refill_probe_align(struct page_frag_cache *nc= , > >> + unsigned int fragsz, > >> + struct page_frag *pfrag, > >> + unsigned int align_mask) > >> +{ > >> + unsigned long encoded_page =3D nc->encoded_page; > >> + unsigned int size, offset; > >> + > >> + size =3D PAGE_SIZE << encoded_page_decode_order(encoded_page); > >> + offset =3D __ALIGN_KERNEL_MASK(nc->offset, ~align_mask); > >> + if (unlikely(!encoded_page || offset + fragsz > size)) > >> + return NULL; > >> + > >> + pfrag->page =3D encoded_page_decode_page(encoded_page); > >> + pfrag->size =3D size - offset; > >> + pfrag->offset =3D offset; > >> + > >> + return encoded_page_decode_virt(encoded_page) + offset; > >> +} > >> +EXPORT_SYMBOL(__page_frag_alloc_refill_probe_align); > >> + > > > > If I am not mistaken this would be the equivalent of allocating a size > > 0 fragment right? The only difference is that you are copying out the > > "remaining" size, but we could get that from the offset if we knew the > > size couldn't we? Would it maybe make sense to look at limiting this > > to PAGE_SIZE instead of passing the size of the actual fragment? > > I am not sure if I understand what does "limiting this to PAGE_SIZE" > mean here. Right now you are returning pfrag->size =3D size - offset. I am wondering if we should be returning something more like "pfrag->size =3D PAGE_SIZE - (offset % PAGE_SIZE)". > I probably should mention the usecase of probe API here. For the usecase > of mptcp_sendmsg(), the minimum size of a fragment can be smaller when > the new fragment can be coalesced to previous fragment as there is an > extra memory needed for some header if the fragment can not be coalesced > to previous fragment. The probe API is mainly used to see if there is > any memory left in the 'page_frag_cache' that can be coalesced to > previous fragment. What is the fragment size we are talking about? In my example above we would basically be looking at rounding the page off to the nearest PAGE_SIZE block before we would have to repeat the call to grab the next PAGE_SIZE block. Since the request size for the page frag alloc API is supposed to be limited to 4K or less it would make sense to limit the probe API similarly.