From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DC3BC4345F for ; Thu, 25 Apr 2024 16:42:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9A9E6B0085; Thu, 25 Apr 2024 12:42:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4A8E6B0087; Thu, 25 Apr 2024 12:42:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B393D6B0088; Thu, 25 Apr 2024 12:42:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 957ED6B0085 for ; Thu, 25 Apr 2024 12:42:58 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D6EAB805E6 for ; Thu, 25 Apr 2024 16:42:53 +0000 (UTC) X-FDA: 82048623426.25.0A8EF1B Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf07.hostedemail.com (Postfix) with ESMTP id E847740009 for ; Thu, 25 Apr 2024 16:42:51 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NxCgSP6w; spf=pass (imf07.hostedemail.com: domain of vdonnefort@google.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=vdonnefort@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714063372; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pd9muZXAE1QeC2Q0jSj7EXZonh/pvXQTuPC6IhHyTRM=; b=NORc71qMiUIKwRe2/DOsZD4YRhzZp+EbqnPV7XgEfbKBppbhTnVZX7xBAur0pyUI8VCsEE AoMqWL+NL3h4n3WIrwbiuklVH12f5AviaSs0pbIZd7ccrBhGbHwAwJ3duY1URA2Y/cnGh0 jefrJ1iNtqp+Hl0z9160p2UTdh60Zvc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714063372; a=rsa-sha256; cv=none; b=zw/pzjlPN16iGshkXh6+XPrgGtvFGWbSViEc2jssL7Gf2KFEAvYm5A5HdpgcgVft6WrWUG fFN60CQ/AImxmAK75i6YrrReU6U2JO6eeo7gbvl2l26lVsslMEXzLyoLdiAIvn5P/QRHj0 qgJUBYITN7Spzgvwk6U4GQElh5Q54oY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NxCgSP6w; spf=pass (imf07.hostedemail.com: domain of vdonnefort@google.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=vdonnefort@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-41af670185fso9305235e9.0 for ; Thu, 25 Apr 2024 09:42:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714063370; x=1714668170; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=pd9muZXAE1QeC2Q0jSj7EXZonh/pvXQTuPC6IhHyTRM=; b=NxCgSP6wvG/x/QmDKmQ98bGa46wULetuBphIehtUaeVnIEPIvfbOlCwJVKs9Tv0tBw /4xkP5/aC/eCh4jD/EHKmpzZAAZ3LlkvzKez6UAV/X+qtqn4nts23rDdjLGGIDg5Zjus gkQfv44Na4tWfzpNVOZEcnkjR/JLpq4kKDRWSn5i7Dnpcf/cVylbvi3HMG5zE9vwTO7b J7z3UXgSHQg/xPL+OIjzQqeJMr+Ey54QYjF65QhA4LEW3D6qqPXlkEQNGVc7wqB2xafb PqIKWl91rKBoRGYNUElgfliyNjouGmlZwvRrqp9ZTf23cofIrWwA3sKfXUASG4Kb7baa V0qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714063370; x=1714668170; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pd9muZXAE1QeC2Q0jSj7EXZonh/pvXQTuPC6IhHyTRM=; b=ng2BM/8vK+aDtMnhhlizIG/h2LuqSiMEBg3mVSt9X2y/Q2EiW0t0iwA3nxP2qvGZGl tpzriBal+00CJXdjBBcBNcwv+HTrliLbgX6mftgNN2xQkIafoWwl/M9AA9epfkpAF+nl XvBwj5wIP2iefHFM3QjupEPAVL3FjdMxKhd1ubtMhWltlsbsCNHe0ZC66xpNLln9vZd2 wPwJcRym1taM2dGjUBCtTHGGjy8ZCcae8v/JwP6fqZlbz6NY5jrthsU1rdgUGCy9nyG/ QACml0PKVOqim/Dx3zYz5gKP441RcMtlr33RAIwL2gmwPzvJpGUH+OIlKnX9/r8qb+tK pXtA== X-Forwarded-Encrypted: i=1; AJvYcCVoaApCSI9qdaI9IIipV0z+Pa7fwfoMP7rbMUYgiu/3yVjklsK1eemiImwzLcxitHlMJFECigsRWddqxjeNma37MZk= X-Gm-Message-State: AOJu0Yyp+9dXHqKrqd8W+8OYcQYxURS/Z4422WFFwOOXujjuEgxI5Knd S2Y2xZlir9PstV3ABIMRbYjxeVQD97Jf+Lb3ezT7Yh6YH3W6guBGaG3z+hSheA== X-Google-Smtp-Source: AGHT+IHsaeXu4R31cOaI1TWvONiQZe3Gmyq7LQ0VxmJDOkoZTHbtpY2o/zXKK9857hF89/2Wr3VBdg== X-Received: by 2002:a05:600c:1c0f:b0:41a:c170:701f with SMTP id j15-20020a05600c1c0f00b0041ac170701fmr4799480wms.38.1714063370017; Thu, 25 Apr 2024 09:42:50 -0700 (PDT) Received: from google.com (88.140.78.34.bc.googleusercontent.com. [34.78.140.88]) by smtp.gmail.com with ESMTPSA id a12-20020a056000188c00b00347eb354b30sm20590173wri.84.2024.04.25.09.42.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 09:42:49 -0700 (PDT) Date: Thu, 25 Apr 2024 17:42:46 +0100 From: Vincent Donnefort To: David Hildenbrand Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, kernel-team@android.com, rdunlap@infradead.org, rppt@kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v21 2/5] ring-buffer: Introducing ring-buffer mapping functions Message-ID: References: <20240423232728.1492340-1-vdonnefort@google.com> <20240423232728.1492340-3-vdonnefort@google.com> <04137a08-8918-422c-8512-beb2074a427e@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <04137a08-8918-422c-8512-beb2074a427e@redhat.com> X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E847740009 X-Stat-Signature: rmiwzfsbfooao1gq87h1toj5gard1d94 X-Rspam-User: X-HE-Tag: 1714063371-922675 X-HE-Meta: U2FsdGVkX19giHi19DaQj+X/ULCGXm+h5U45zKnSo1qTd3s2CzGLQaNQRisQCwV4uCYiJbMRaCBcVU3dX1k75vqjZF6P8PJRea32irJhVPVywlqGMYBlg+1UtzAV4EgKl0AwAnt128kVwZR0uzY3f7r6K21L8yzj3fs4dTPlbR0ifldspkWKco82Qw4Nhuajvy0syM6M6UTTIe4QyTpcSNiExnAZ2j1KVaP2zDgVQ5ImOTST99PNX1dEoLyByYJkigsdBNl2MLE42N4qshloWHgvtYvfOJxbTI1wDPBZU2Yntf050lem8INxO5SGG3OEkt+65lN34Fksd3a1MEWndX+4aUsjHafKXEdRkwfcHiUXUgIdkufcqr3e7PMZsWC1/FAbZqOX3m/ky7dq79OC2t9n0JQEws4fli1YG3tS5EMbdnn7cfLywGeoaexIwxDT8w1jEoqarLEeI4a29pS/o2ND57w2GKTvYQ3MPlw/sYwxPz0vPbVDr4OFKmmczSx8P3caDhsWUOi3e2/L6VM5ySUy61lEVdkoa9GW5tvqbRt7TFrNPMLxFqHei2GY2jgEOPUAmPYxAAIWLVglLI15aVIsB+7WoLvsyqQnfsnwAL5l5G4Idxlrn6fVDM/2A58To4BTk5rqUHsO54aMDuKqrL4nBKnI/CH84zohxLG+ptWx/WaPUSExiG7d78RyT/Rdw8ish4Tg2JeYocWG8NyITDwD0vXwiGqHGD9Ohtk75VmVS1Sy/qrr7i3HwZxb9YWGmfJcruKu7CyrIBxTOQVEeWxUzo7knmLSuTloCq9iHfNvW23yPRyW1acAycBUQineRqsQbsqsJS50h1ayi4quPSSdgl14nOvshNNLXiPEsV1kvdA1hJsjnNeaWz3/5x5yIO81AFR99GE0NlS6fs1XxOGx56pWzQi0b+7FOnNM3Vr1voxx+jxko7H4UAxSUpfW05q8mGNTV01oZ7jBaex n8XsHWHG EYmaxuSqR2X402yzd74j9ypQfWBcdQk10hjUs8jrj/BLgrj930smS2BO5Bbh67bMYzTSCiuiM31Uk+xy2gx5lKRPOsH1CEQ3ctlNLDqCMGkAdsOBlVDKkwwSF4ukTV4Sr3AvGcIJspJC6FOTFK2JtGNkL+4t3GfYhgL02FItmhUVHydjDLjQVI82mAm53MkHMzZq40Vx3P3IwWXulPd/J3T5v7d1/hiI+c2Znv8wZp0iRmzlAwJMT6462dyDqKpeKo36MyQz3Ff6Aw0EJxQ9Mv7lOzWYRERGDWOwBzceS6Qwdc4yCf59UP+6DTk221ImTgA3JHZ+7T2/pPoi9m45Op2i3OJ3SZJR68sLGn4mZUspQseU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 24, 2024 at 10:55:54PM +0200, David Hildenbrand wrote: > On 24.04.24 22:31, Vincent Donnefort wrote: > > Hi David, > > > > Thanks for your quick response. > > > > On Wed, Apr 24, 2024 at 05:26:39PM +0200, David Hildenbrand wrote: > > > > > > I gave it some more thought, and I think we are still missing something (I > > > wish PFNMAP/MIXEDMAP wouldn't be that hard). > > > > > > > + > > > > +/* > > > > + * +--------------+ pgoff == 0 > > > > + * | meta page | > > > > + * +--------------+ pgoff == 1 > > > > + * | subbuffer 0 | > > > > + * | | > > > > + * +--------------+ pgoff == (1 + (1 << subbuf_order)) > > > > + * | subbuffer 1 | > > > > + * | | > > > > + * ... > > > > + */ > > > > +#ifdef CONFIG_MMU > > > > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer, > > > > + struct vm_area_struct *vma) > > > > +{ > > > > + unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff; > > > > + unsigned int subbuf_pages, subbuf_order; > > > > + struct page **pages; > > > > + int p = 0, s = 0; > > > > + int err; > > > > + > > > > > > I'd add some comments here like > > > > > > /* Refuse any MAP_PRIVATE or writable mappings. */ > > > > + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC || > > > > + !(vma->vm_flags & VM_MAYSHARE)) > > > > + return -EPERM; > > > > + > > > > > > /* > > > * Make sure the mapping cannot become writable later. Also, tell the VM > > > * to not touch these pages pages (VM_DONTCOPY | VM_DONTDUMP) and tell > > > * GUP to leave them alone as well (VM_IO). > > > */ > > > > + vm_flags_mod(vma, > > > > + VM_MIXEDMAP | VM_PFNMAP | > > > > + VM_DONTCOPY | VM_DONTDUMP | VM_DONTEXPAND | VM_IO, > > > > + VM_MAYWRITE); > > > > > > I am still really unsure about VM_PFNMAP ... it's not a PFNMAP at all and, > > > as stated, vm_insert_pages() even complains quite a lot when it would have > > > to set VM_MIXEDMAP and VM_PFNMAP is already set, likely for a very good > > > reason. > > > > > > Can't we limit ourselves to VM_IO? > > > > > > But then, I wonder if it really helps much regarding GUP: yes, it blocks > > > ordinary GUP (see check_vma_flags()) but as insert_page_into_pte_locked() > > > does *not* set pte_special(), GUP-fast (gup_fast_pte_range()) will not > > > reject it. > > > > > > Really, if you want GUP-fast to reject it, remap_pfn_range() and friends are > > > the way to go, that will set pte_special() such that also GUP-fast will > > > leave it alone, just like vm_normal_page() would. > > > > > > So ... I know Linus recommended VM_PFNMAP/VM_IO to stop GUP, but it alone > > > won't stop all of GUP. We really have to mark the PTE as special, which > > > vm_insert_page() must not do (because it is refcounted!). > > > > Hum, apologies, I am not sure to follow the connection here. Why do you think > > the recommendation was to prevent GUP? > > Ah, I'm hallucinating! :) "not let people play games with the mapping" to me > implied "make sure nobody touches it". If GUP is acceptable that makes stuff > a lot easier. VM_IO will block some GUP, but not all of it. > > > > > > > > > Which means: do we really have to stop GUP from grabbing that page? > > > > > > Using vm_insert_page() only with VM_MIXEDMAP (and without VM_PFNMAP|VM_IO) > > > would be better. > > > > Under the assumption we do not want to stop all GUP, why not using VM_IO over > > VM_MIXEDMAP which is I believe more restrictive? > > VM_MIXEDMAP will be implicitly set by vm_insert_page(). There is a lengthy comment > for vm_normal_page() that explains all this madness. VM_MIXEDMAP is primarily > relevant for COW mappings, which you just forbid completely. > > remap_pfn_range_notrack() documents the semantics of some of the other flags: > > * VM_IO tells people not to look at these pages > * (accesses can have side effects). > * VM_PFNMAP tells the core MM that the base pages are just > * raw PFN mappings, and do not have a "struct page" associated > * with them. > * VM_DONTEXPAND > * Disable vma merging and expanding with mremap(). > * VM_DONTDUMP > * Omit vma from core dump, even when VM_IO turned off. > > VM_PFNMAP is very likely really not what we want, unless we really perform raw > PFN mappings ... VM_IO we can set without doing much harm. > > So I would suggest dropping VM_PFNMAP when using vm_insert_pages(), using only VM_IO > and likely just letting vm_insert_pages() set VM_MIXEDMAP for you. Sounds good, I will do that in v22. > > [...] > > > > > > > vm_insert_pages() documents: "In case of error, we may have mapped a subset > > > of the provided pages. It is the caller's responsibility to account for this > > > case." > > > > > > Which could for example happen, when allocating a page table fails. > > > > > > Would we able to deal with that here? > > > > As we are in the mmap path, on an error, I would expect the vma to be destroyed > > and those pages whom insertion succeeded to be unmapped? > > > > Ah, we simply fail ->mmap(). > > In mmap_region(), if call_mmap() failed, we "goto unmap_and_free_vma" where we have > > /* Undo any partial mapping done by a device driver. */ > unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start, vma->vm_end, vma->vm_end, true); > > > > But perhaps shall we proactively zap_page_range_single()? > > No mmap_region() should indeed be handling it correctly already! Ok, thanks for confirming! > > -- > Cheers, > > David / dhildenb >