From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8934FC36000 for ; Fri, 21 Mar 2025 13:05:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 770A7280002; Fri, 21 Mar 2025 09:05:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FA11280001; Fri, 21 Mar 2025 09:05:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54E48280002; Fri, 21 Mar 2025 09:05:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 34793280001 for ; Fri, 21 Mar 2025 09:05:09 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D99EAAA702 for ; Fri, 21 Mar 2025 13:05:08 +0000 (UTC) X-FDA: 83245578696.06.5CD6CAB Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf29.hostedemail.com (Postfix) with ESMTP id CB432120032 for ; Fri, 21 Mar 2025 13:05:06 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Klu2vN2/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of olsajiri@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=olsajiri@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742562306; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=I4+8N541CkNOWDvfZ0cddMC7Om8lmOSt/AsWehOXc8g=; b=k20dryeuYjlHTEk7CYN1k3otBnJVQvobWgDENIm3atrmK41KfLxGFuwmnhs7Vykywe9HrI tAwYB9jK7DuAb1g+N6bmbbUGYzB4KDHZ8Wl1v7Dza1f7wfE0sb2gkw9KoKqkOnIeBsx1b0 IfM3xwAJgv/HMYLhckC9Ldf9oTQUgEo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Klu2vN2/"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of olsajiri@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=olsajiri@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742562306; a=rsa-sha256; cv=none; b=LeCWb06rqigz16wikixuhJE6hcT0HNlgOvrW+ikyIZ/BqkAq+PDFMzx0r0coPJ2mv8QouA FgQGcaCAMT1KdKuxBk15tu5v0fL2wlal5QzFNlFCyYkAcE8nrrpyh6At1DhWlbEFmqP9Nk TU/NR/KcBgz+okMf7+7pKr5AsTA5Bgc= Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-43cf06eabdaso18560705e9.2 for ; Fri, 21 Mar 2025 06:05:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742562305; x=1743167105; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=I4+8N541CkNOWDvfZ0cddMC7Om8lmOSt/AsWehOXc8g=; b=Klu2vN2/SavBc+Szwz4KhAMw5PyAvMV2Gs21y862xdSm0hlz35Mw6IePaS/QsTuWon 40PGDlWOcLr9f6hk37uxOwRFRqKN6oxLZtrIAi2ogymx+QBJjW6l/BQNjKu1m1uZG837 VyiV89W/zergNXOlebsuOQ6CKYEcZHWSP7AId2epYhZRjMu6qERPVtkFlRLasWblHaNl l+U1KugOOw/eLsYZ8Ll4G7UnlUpRGWycaXaBmRDco/Jpp8ibSOGe59o91/6+EJHd0f+6 TckmSuQKhIAEllYWxeKnb+3B7LYeBoOwgobIN7BPV6CiNoV0AkmZAw5c/JWuLrSClc2W C1+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742562305; x=1743167105; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=I4+8N541CkNOWDvfZ0cddMC7Om8lmOSt/AsWehOXc8g=; b=AevPtK/Q+WrteVQlqJT4z9Wx94Y5lprmJ212tXfile90yJPQdNcuj6g1So2NVY8cXW lzzP+wuHNQATLkKMdOiCkeSm1D9NHdqPZw9gTqOim5Y1xC3UTuDxmXsxsL9RAsDMCIC+ uY6U4w5tFx3Q0m9sFIjdK6PEuIWVeyoaog1ssPzZtjYMZLCuIfxky5MNBu1f+rzfvCns WUFUeTuSIzBFJD7P+8A3w1lYE5A/eUcosrNuuYmhrqDNQ96+9OVZpDNrWQ2iGIzv7GNr hBSuSbybCfJuJJK0oZ3ng7bNykoL1fAQc0WQfkxsRGVVjbIsA+T944v9Hm/11Vue06Ce Z5Nw== X-Forwarded-Encrypted: i=1; AJvYcCUGriUsMq8Q8jjbd4UgAiUAaP7OwXhDqn/D3NyQgSNJBxTnuirbRyzY8yhwhYqdmxsb9EPRJgq8xQ==@kvack.org X-Gm-Message-State: AOJu0YxUjS7lU9ce6kq8gLKS8IHTPNJNvg9n6gNXOgFnRalqSGmklfjo lq3MSsUXDskOIjNZzUvHFFe6DXj21Yi/YQLxec6wQcrBp99l5pCn X-Gm-Gg: ASbGncvdKZ5wQBzc9MOnhqTZdmoPRblCjmZQL34oQa+6L5Wp2Vo+NyfQPXDFnURlUTj CpvT+LKWw5FYAqcrb0NKA0fn19CJLKKIvWCtwyu30Rv8hf2/aslWGNLLJGFlRovHuPP1wyUBfBN KmgH3q4YsGifS+j9tqtJKXCnInqrFl8F2JSLPbo/qtWP8gZTHZVBnHmNU17IdepxWNY5DgLnczi T7aXF/sVbQltLDBLl3z+cmF7Vp7/ZLo0621QDOnEMYuOGQumPhe1UxIqQEZyuefy7oLABzeq7c4 81SSA+V6SphJJ8thFx/2YSVphmNQSdI= X-Google-Smtp-Source: AGHT+IH86jMzbekNz39ZrDpdZDYAPnbN5nw03VrRG8082LHcSjfAUIiFBGdlOw/SLKZprEqeMfYmYA== X-Received: by 2002:a05:600c:4512:b0:43c:f87c:24ce with SMTP id 5b1f17b1804b1-43d50a3781amr21917025e9.21.1742562304146; Fri, 21 Mar 2025 06:05:04 -0700 (PDT) Received: from krava ([173.38.220.55]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d4fd27980sm26731495e9.21.2025.03.21.06.05.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Mar 2025 06:05:03 -0700 (PDT) From: Jiri Olsa X-Google-Original-From: Jiri Olsa Date: Fri, 21 Mar 2025 14:05:01 +0100 To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Andrew Morton , Andrii Nakryiko , Matthew Wilcox , Russell King , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Ian Rogers , Adrian Hunter , "Liang, Kan" , Tong Tiangen Subject: Re: [PATCH v3 3/3] kernel/events/uprobes: uprobe_write_opcode() rewrite Message-ID: References: <20250321113713.204682-1-david@redhat.com> <20250321113713.204682-4-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250321113713.204682-4-david@redhat.com> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: CB432120032 X-Stat-Signature: x8iecg5f7ngabcwy8y31poh4fhr64orb X-Rspam-User: X-HE-Tag: 1742562306-494119 X-HE-Meta: U2FsdGVkX18kT8/cb1Tg5ru00wzxIDsQjgZ/tilGzjGm33RLL7doK6iC1BydtfFLHZid8bsAb0vIR5W9bsYvDbsm4TD1730dIVFEV42brql/WS7/x8wQEMUpRcklDXq8O6vwwnm2jUMtskAStDrAACI/YWt1bxABw3dcy2kEno2TwATd8wxCpP3+UqVCwgX8QyHSR6iCkofWXMsMuU/V1CwU/Xmyrx6AhmDMTjEtinQcQmdf5cLZbYvYnnZwUsJgpwbnVXsXAB+sA2yFaFYirWsbuLCjDbOu7KXZCKBZ86eYRALkM/GIpmF+uWkYrR/WHvVubHySJDCnqUxObYtT/06h13WAYnjICcm22y2M+5viXNMt9UWiLpmMaAGNTMnsDoW6j+Q4kB8wYf8wHBlOrfLXHDg+ApS1UFPbUB3TVV75fFg1mbYBYwJCT8ud0LdhDtWd8p7bF9kZF5Co0M7OnYW27Qzdx3ap9yKgP3JiaZPc5IUlHnMfHrjnixeUKYibIs8eRqaaJZ1voglqyYuFv71l2x1vssdXx845fXr1QcsPSYfEIFOUDnGxlQV1QvOnsVqeKqsA/fEZjfRQifW5xFdOhjDOTwOJ0y8ciNZxk3FBXg8PzBBzFtao1pPLFlf9c4vhThMQXdmApEfKrsUqN9jFW/VXQEpS1gZkNakgk18hqxF82pfQq8BqzQp0tNhqp45i47Xo3WgAnyxYE1yWijZLuQW7mfwYG67eLN2z0VHEcoPmOTuIXYiDbZu/6Wvx/j7bYDXJPji5DuMJF60RsfFwi93R5A7IF0c4uM4k+BewdlZp4uGwVJfu6b6/+QwgyfAvE3h+byjg/z6tUCXjjJwvQLkEgAT4S72Ax8NhJyOoWtz8pQd+cpVCfQYx2ylVkPAZmefXyps/ShJ5kzwBee1VVw+H9dJZf8PbI+PBDLsLc5WebEZjjvtgMqZlhPRREAVlCySh2tw93WQRjJR ySHhDRxK kzfVRrIg3HHqyGGYqGouSoQBXNJWVgeB21rSZSWi1GLnj9M4FjFpMlU/H21mGJdB+q5P4Lx8VkZdwItOItYmm8MsJ5LCQuKoYAkR7vCH9Uy8bXd2B9N7tn11/KXJK9iNvjSXDel7Hx9rnsSn5zVKNf5SfMkJyufdaduAGzkYm1VTz0r9RFfxpHYn1+pMzUylEenXrnXnBRu3kztMzfbX630n/GQKeBKjOqdY5MjTGs4xcbGDxZNTUloRDkmZESdXhOpxeIOGK0L/I5tTpQORACFYRgVhUruPs2TmzTk8Z7rv9UOD91I/UK96QvembZbrcOr7/qB3mutncI3DtARNky1IDWy79zyx0MLulVRbFQaCsV0KTG7sheSnWCXrwPnYKEWoRgC2EVm8/cN8QLgpULPWFhUkQVGLvOKTKlxxbyqHcdIkKVzlLK4k4mOLnqT9nYyptOuBIuOhMp/n4pAQfnqXnS1hnricDLR8S+b/jr0mgHt9N2NdnJP2/kiqtloCEFGSHEG5/1D8PLyo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 21, 2025 at 12:37:13PM +0100, David Hildenbrand wrote: SNIP > +static int __uprobe_write_opcode(struct vm_area_struct *vma, > + struct folio_walk *fw, struct folio *folio, > + unsigned long opcode_vaddr, uprobe_opcode_t opcode) > +{ > + const unsigned long vaddr = opcode_vaddr & PAGE_MASK; > + const bool is_register = !!is_swbp_insn(&opcode); > + bool pmd_mappable; > + > + /* For now, we'll only handle PTE-mapped folios. */ > + if (fw->level != FW_LEVEL_PTE) > + return -EFAULT; > + > + /* > + * See can_follow_write_pte(): we'd actually prefer a writable PTE here, > + * but the VMA might not be writable. > + */ > + if (!pte_write(fw->pte)) { > + if (!PageAnonExclusive(fw->page)) > + return -EFAULT; > + if (unlikely(userfaultfd_pte_wp(vma, fw->pte))) > + return -EFAULT; > + /* SOFTDIRTY is handled via pte_mkdirty() below. */ > + } > + > + /* > + * We'll temporarily unmap the page and flush the TLB, such that we can > + * modify the page atomically. > + */ > + flush_cache_page(vma, vaddr, pte_pfn(fw->pte)); > + fw->pte = ptep_clear_flush(vma, vaddr, fw->ptep); > + copy_to_page(fw->page, opcode_vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); > + > + /* > + * When unregistering, we may only zap a PTE if uffd is disabled and > + * there are no unexpected folio references ... > + */ > + if (is_register || userfaultfd_missing(vma) || > + (folio_ref_count(folio) != folio_mapcount(folio) + 1 + > + folio_test_swapcache(folio) * folio_nr_pages(folio))) > + goto remap; > + > + /* > + * ... and the mapped page is identical to the original page that > + * would get faulted in on next access. > + */ > + if (!orig_page_is_identical(vma, vaddr, fw->page, &pmd_mappable)) > + goto remap; > + > + dec_mm_counter(vma->vm_mm, MM_ANONPAGES); > + folio_remove_rmap_pte(folio, fw->page, vma); > + if (!folio_mapped(folio) && folio_test_swapcache(folio) && > + folio_trylock(folio)) { > + folio_free_swap(folio); > + folio_unlock(folio); > + } > + folio_put(folio); hi, it's probably ok and I'm missing something, but why do we call folio_put in here, I'd think it's done by folio_put call in uprobe_write_opcode thanks, jirka > + > + return pmd_mappable; > +remap: > + /* > + * Make sure that our copy_to_page() changes become visible before the > + * set_pte_at() write. > + */ > + smp_wmb(); > + /* We modified the page. Make sure to mark the PTE dirty. */ > + set_pte_at(vma->vm_mm, vaddr, fw->ptep, pte_mkdirty(fw->pte)); > + return 0; > +} > + > /* > * NOTE: > * Expect the breakpoint instruction to be the smallest size instruction for > @@ -475,116 +480,115 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm, > * uprobe_write_opcode - write the opcode at a given virtual address. > * @auprobe: arch specific probepoint information. > * @vma: the probed virtual memory area. > - * @vaddr: the virtual address to store the opcode. > - * @opcode: opcode to be written at @vaddr. > + * @opcode_vaddr: the virtual address to store the opcode. > + * @opcode: opcode to be written at @opcode_vaddr. > * > * Called with mm->mmap_lock held for read or write. > * Return 0 (success) or a negative errno. > */ > int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma, > - unsigned long vaddr, uprobe_opcode_t opcode) > + const unsigned long opcode_vaddr, uprobe_opcode_t opcode) > { > + const unsigned long vaddr = opcode_vaddr & PAGE_MASK; > struct mm_struct *mm = vma->vm_mm; > struct uprobe *uprobe; > - struct page *old_page, *new_page; > int ret, is_register, ref_ctr_updated = 0; > - bool orig_page_huge = false; > unsigned int gup_flags = FOLL_FORCE; > + struct mmu_notifier_range range; > + struct folio_walk fw; > + struct folio *folio; > + struct page *page; > > is_register = is_swbp_insn(&opcode); > uprobe = container_of(auprobe, struct uprobe, arch); > > -retry: > + if (WARN_ON_ONCE(!is_cow_mapping(vma->vm_flags))) > + return -EINVAL; > + > + /* > + * When registering, we have to break COW to get an exclusive anonymous > + * page that we can safely modify. Use FOLL_WRITE to trigger a write > + * fault if required. When unregistering, we might be lucky and the > + * anon page is already gone. So defer write faults until really > + * required. Use FOLL_SPLIT_PMD, because __uprobe_write_opcode() > + * cannot deal with PMDs yet. > + */ > if (is_register) > - gup_flags |= FOLL_SPLIT_PMD; > - /* Read the page with vaddr into memory */ > - ret = get_user_pages_remote(mm, vaddr, 1, gup_flags, &old_page, NULL); > - if (ret != 1) > - return ret; > + gup_flags |= FOLL_WRITE | FOLL_SPLIT_PMD; > > - ret = verify_opcode(old_page, vaddr, &opcode); > +retry: > + ret = get_user_pages_remote(mm, vaddr, 1, gup_flags, &page, NULL); > if (ret <= 0) > - goto put_old; > - > - if (is_zero_page(old_page)) { > - ret = -EINVAL; > - goto put_old; > - } > + goto out; > + folio = page_folio(page); > > - if (WARN(!is_register && PageCompound(old_page), > - "uprobe unregister should never work on compound page\n")) { > - ret = -EINVAL; > - goto put_old; > + ret = verify_opcode(page, opcode_vaddr, &opcode); > + if (ret <= 0) { > + folio_put(folio); > + goto out; > } > > /* We are going to replace instruction, update ref_ctr. */ > if (!ref_ctr_updated && uprobe->ref_ctr_offset) { > ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1); > - if (ret) > - goto put_old; > + if (ret) { > + folio_put(folio); > + goto out; > + } > > ref_ctr_updated = 1; > } > > ret = 0; > - if (!is_register && !PageAnon(old_page)) > - goto put_old; > - > - ret = anon_vma_prepare(vma); > - if (ret) > - goto put_old; > - > - ret = -ENOMEM; > - new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr); > - if (!new_page) > - goto put_old; > - > - __SetPageUptodate(new_page); > - copy_highpage(new_page, old_page); > - copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); > + if (unlikely(!folio_test_anon(folio))) { > + VM_WARN_ON_ONCE(is_register); > + folio_put(folio); > + goto out; > + } > > if (!is_register) { > - struct page *orig_page; > - pgoff_t index; > - > - VM_BUG_ON_PAGE(!PageAnon(old_page), old_page); > - > - index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT; > - orig_page = find_get_page(vma->vm_file->f_inode->i_mapping, > - index); > - > - if (orig_page) { > - if (PageUptodate(orig_page) && > - pages_identical(new_page, orig_page)) { > - /* let go new_page */ > - put_page(new_page); > - new_page = NULL; > - > - if (PageCompound(orig_page)) > - orig_page_huge = true; > - } > - put_page(orig_page); > - } > + /* > + * In the common case, we'll be able to zap the page when > + * unregistering. So trigger MMU notifiers now, as we won't > + * be able to do it under PTL. > + */ > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, > + vaddr, vaddr + PAGE_SIZE); > + mmu_notifier_invalidate_range_start(&range); > + } > + > + ret = -EAGAIN; > + /* Walk the page tables again, to perform the actual update. */ > + if (folio_walk_start(&fw, vma, vaddr, 0)) { > + if (fw.page == page) > + ret = __uprobe_write_opcode(vma, &fw, folio, opcode_vaddr, opcode); > + folio_walk_end(&fw, vma); > } > > - ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page); > - if (new_page) > - put_page(new_page); > -put_old: > - put_page(old_page); > + if (!is_register) > + mmu_notifier_invalidate_range_end(&range); > > - if (unlikely(ret == -EAGAIN)) > + folio_put(folio); > + switch (ret) { > + case -EFAULT: > + gup_flags |= FOLL_WRITE | FOLL_SPLIT_PMD; > + fallthrough; > + case -EAGAIN: > goto retry; > + default: > + break; > + } > > +out: > /* Revert back reference counter if instruction update failed. */ > - if (ret && is_register && ref_ctr_updated) > + if (ret < 0 && is_register && ref_ctr_updated) > update_ref_ctr(uprobe, mm, -1); > > /* try collapse pmd for compound page */ > - if (!ret && orig_page_huge) > + if (ret > 0) > collapse_pte_mapped_thp(mm, vaddr, false); > > - return ret; > + return ret < 0 ? ret : 0; > } > > /** > -- > 2.48.1 >