* Re: [PATCH v4 bpf-next 02/10] lib/buildid: add single folio-based file reader abstraction
[not found] ` <20240807234029.456316-3-andrii@kernel.org>
@ 2024-08-08 18:33 ` Shakeel Butt
0 siblings, 0 replies; 9+ messages in thread
From: Shakeel Butt @ 2024-08-08 18:33 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, linux-mm, akpm, adobriyan, hannes, ak, osandov, song, jannh,
linux-fsdevel, willy
On Wed, Aug 07, 2024 at 04:40:21PM GMT, Andrii Nakryiko wrote:
> Add freader abstraction that transparently manages fetching and local
> mapping of the underlying file page(s) and provides a simple direct data
> access interface.
>
> freader_fetch() is the only and single interface necessary. It accepts
> file offset and desired number of bytes that should be accessed, and
> will return a kernel mapped pointer that caller can use to dereference
> data up to requested size. Requested size can't be bigger than the size
> of the extra buffer provided during initialization (because, worst case,
> all requested data has to be copied into it, so it's better to flag
> wrongly sized buffer unconditionally, regardless if requested data range
> is crossing page boundaries or not).
>
> If folio is not paged in, or some of the conditions are not satisfied,
> NULL is returned and more detailed error code can be accessed through
> freader->err field. This approach makes the usage of freader_fetch()
> cleaner.
>
> To accommodate accessing file data that crosses folio boundaries, user
> has to provide an extra buffer that will be used to make a local copy,
> if necessary. This is done to maintain a simple linear pointer data
> access interface.
>
> We switch existing build ID parsing logic to it, without changing or
> lifting any of the existing constraints, yet. This will be done
> separately.
>
> Given existing code was written with the assumption that it's always
> working with a single (first) page of the underlying ELF file, logic
> passes direct pointers around, which doesn't really work well with
> freader approach and would be limiting when removing the single page (folio)
> limitation. So we adjust all the logic to work in terms of file offsets.
>
> There is also a memory buffer-based version (freader_init_from_mem())
> for cases when desired data is already available in kernel memory. This
> is used for parsing vmlinux's own build ID note. In this mode assumption
> is that provided data starts at "file offset" zero, which works great
> when parsing ELF notes sections, as all the parsing logic is relative to
> note section's start.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API
[not found] ` <20240807234029.456316-7-andrii@kernel.org>
@ 2024-08-08 18:40 ` Shakeel Butt
2024-08-08 20:15 ` Andrii Nakryiko
0 siblings, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2024-08-08 18:40 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, linux-mm, akpm, adobriyan, hannes, ak, osandov, song, jannh,
linux-fsdevel, willy, Omar Sandoval
On Wed, Aug 07, 2024 at 04:40:25PM GMT, Andrii Nakryiko wrote:
> Extend freader with a flag specifying whether it's OK to cause page
> fault to fetch file data that is not already physically present in
> memory. With this, it's now easy to wait for data if the caller is
> running in sleepable (faultable) context.
>
> We utilize read_cache_folio() to bring the desired folio into page
> cache, after which the rest of the logic works just the same at folio level.
>
> Suggested-by: Omar Sandoval <osandov@fb.com>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> lib/buildid.c | 44 ++++++++++++++++++++++++++++----------------
> 1 file changed, 28 insertions(+), 16 deletions(-)
>
> diff --git a/lib/buildid.c b/lib/buildid.c
> index 5e6f842f56f0..e1c01b23efd8 100644
> --- a/lib/buildid.c
> +++ b/lib/buildid.c
> @@ -20,6 +20,7 @@ struct freader {
> struct folio *folio;
> void *addr;
> loff_t folio_off;
> + bool may_fault;
> };
> struct {
> const char *data;
> @@ -29,12 +30,13 @@ struct freader {
> };
>
> static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
> - struct address_space *mapping)
> + struct address_space *mapping, bool may_fault)
> {
> memset(r, 0, sizeof(*r));
> r->buf = buf;
> r->buf_sz = buf_sz;
> r->mapping = mapping;
> + r->may_fault = may_fault;
> }
>
> static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
> @@ -63,6 +65,11 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
> freader_put_folio(r);
>
> r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
> +
> + /* if sleeping is allowed, wait for the page, if necessary */
> + if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)))
> + r->folio = read_cache_folio(r->mapping, file_off >> PAGE_SHIFT, NULL, NULL);
Willy's network fs comment is bugging me. If we pass NULL for filler,
the kernel will going to use fs's read_folio() callback. I have checked
read_folio() for fuse and nfs and it seems like for at least these two
filesystems the callback is accessing file->private_data. So, if the elf
file is on these filesystems, we might see null accesses.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API
2024-08-08 18:40 ` [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API Shakeel Butt
@ 2024-08-08 20:15 ` Andrii Nakryiko
2024-08-08 20:57 ` Jann Horn
2024-08-08 21:02 ` Shakeel Butt
0 siblings, 2 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2024-08-08 20:15 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, hannes, ak,
osandov, song, jannh, linux-fsdevel, willy, Omar Sandoval
On Thu, Aug 8, 2024 at 11:40 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Wed, Aug 07, 2024 at 04:40:25PM GMT, Andrii Nakryiko wrote:
> > Extend freader with a flag specifying whether it's OK to cause page
> > fault to fetch file data that is not already physically present in
> > memory. With this, it's now easy to wait for data if the caller is
> > running in sleepable (faultable) context.
> >
> > We utilize read_cache_folio() to bring the desired folio into page
> > cache, after which the rest of the logic works just the same at folio level.
> >
> > Suggested-by: Omar Sandoval <osandov@fb.com>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> > lib/buildid.c | 44 ++++++++++++++++++++++++++++----------------
> > 1 file changed, 28 insertions(+), 16 deletions(-)
> >
> > diff --git a/lib/buildid.c b/lib/buildid.c
> > index 5e6f842f56f0..e1c01b23efd8 100644
> > --- a/lib/buildid.c
> > +++ b/lib/buildid.c
> > @@ -20,6 +20,7 @@ struct freader {
> > struct folio *folio;
> > void *addr;
> > loff_t folio_off;
> > + bool may_fault;
> > };
> > struct {
> > const char *data;
> > @@ -29,12 +30,13 @@ struct freader {
> > };
> >
> > static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
> > - struct address_space *mapping)
> > + struct address_space *mapping, bool may_fault)
> > {
> > memset(r, 0, sizeof(*r));
> > r->buf = buf;
> > r->buf_sz = buf_sz;
> > r->mapping = mapping;
> > + r->may_fault = may_fault;
> > }
> >
> > static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
> > @@ -63,6 +65,11 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
> > freader_put_folio(r);
> >
> > r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
> > +
> > + /* if sleeping is allowed, wait for the page, if necessary */
> > + if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)))
> > + r->folio = read_cache_folio(r->mapping, file_off >> PAGE_SHIFT, NULL, NULL);
>
> Willy's network fs comment is bugging me. If we pass NULL for filler,
> the kernel will going to use fs's read_folio() callback. I have checked
> read_folio() for fuse and nfs and it seems like for at least these two
> filesystems the callback is accessing file->private_data. So, if the elf
> file is on these filesystems, we might see null accesses.
>
Isn't that just a huge problem with the read_cache_folio() interface
then? That file is optional, in general, but for some specific FS
types it's not. How generic code is supposed to know this?
Or maybe it's a bug with the nfs_read_folio() and fuse_read_folio()
implementation that they can't handle NULL file argument?
netfs_read_folio(), for example, seems to be working with file == NULL
just fine.
Matthew, can you please advise what's the right approach here? I can,
of course, always get file refcount, but most of the time it will be
just an unnecessary overhead, so ideally I'd like to avoid that. But
if I have to check each read_folio callback implementation to know
whether it's required or not, then that's not great...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API
2024-08-08 20:15 ` Andrii Nakryiko
@ 2024-08-08 20:57 ` Jann Horn
2024-08-08 21:23 ` Andrii Nakryiko
2024-08-08 21:02 ` Shakeel Butt
1 sibling, 1 reply; 9+ messages in thread
From: Jann Horn @ 2024-08-08 20:57 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Shakeel Butt, Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan,
hannes, ak, osandov, song, linux-fsdevel, willy, Omar Sandoval
On Thu, Aug 8, 2024 at 10:16 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Thu, Aug 8, 2024 at 11:40 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Wed, Aug 07, 2024 at 04:40:25PM GMT, Andrii Nakryiko wrote:
> > > Extend freader with a flag specifying whether it's OK to cause page
> > > fault to fetch file data that is not already physically present in
> > > memory. With this, it's now easy to wait for data if the caller is
> > > running in sleepable (faultable) context.
> > >
> > > We utilize read_cache_folio() to bring the desired folio into page
> > > cache, after which the rest of the logic works just the same at folio level.
> > >
> > > Suggested-by: Omar Sandoval <osandov@fb.com>
> > > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > > lib/buildid.c | 44 ++++++++++++++++++++++++++++----------------
> > > 1 file changed, 28 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/lib/buildid.c b/lib/buildid.c
> > > index 5e6f842f56f0..e1c01b23efd8 100644
> > > --- a/lib/buildid.c
> > > +++ b/lib/buildid.c
> > > @@ -20,6 +20,7 @@ struct freader {
> > > struct folio *folio;
> > > void *addr;
> > > loff_t folio_off;
> > > + bool may_fault;
> > > };
> > > struct {
> > > const char *data;
> > > @@ -29,12 +30,13 @@ struct freader {
> > > };
> > >
> > > static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
> > > - struct address_space *mapping)
> > > + struct address_space *mapping, bool may_fault)
> > > {
> > > memset(r, 0, sizeof(*r));
> > > r->buf = buf;
> > > r->buf_sz = buf_sz;
> > > r->mapping = mapping;
> > > + r->may_fault = may_fault;
> > > }
> > >
> > > static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
> > > @@ -63,6 +65,11 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
> > > freader_put_folio(r);
> > >
> > > r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
> > > +
> > > + /* if sleeping is allowed, wait for the page, if necessary */
> > > + if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)))
> > > + r->folio = read_cache_folio(r->mapping, file_off >> PAGE_SHIFT, NULL, NULL);
> >
> > Willy's network fs comment is bugging me. If we pass NULL for filler,
> > the kernel will going to use fs's read_folio() callback. I have checked
> > read_folio() for fuse and nfs and it seems like for at least these two
> > filesystems the callback is accessing file->private_data. So, if the elf
> > file is on these filesystems, we might see null accesses.
> >
>
> Isn't that just a huge problem with the read_cache_folio() interface
> then? That file is optional, in general, but for some specific FS
> types it's not. How generic code is supposed to know this?
I think you have to think about it the other way around. The file is
required, unless you know the filler function that will be used
doesn't use the file. Which you don't know when you're coming from
generic code, so generic code has to pass in a file.
As far as I can tell, most of the callers of read_cache_folio() (via
read_mapping_folio()) are inside filesystem implementations, not
generic code, so they know what the filler function will do. You're
generic code, so I think you have to pass in a file.
> Or maybe it's a bug with the nfs_read_folio() and fuse_read_folio()
> implementation that they can't handle NULL file argument?
> netfs_read_folio(), for example, seems to be working with file == NULL
> just fine.
>
> Matthew, can you please advise what's the right approach here? I can,
> of course, always get file refcount, but most of the time it will be
> just an unnecessary overhead, so ideally I'd like to avoid that. But
> if I have to check each read_folio callback implementation to know
> whether it's required or not, then that's not great...
Why would you need to increment the file refcount? As far as I can
tell, all your accesses to the file would happen under
__build_id_parse(), which is borrowing the refcounted reference from
vma->vm_file; the file can't go away as long as your caller is holding
the mmap lock.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API
2024-08-08 20:15 ` Andrii Nakryiko
2024-08-08 20:57 ` Jann Horn
@ 2024-08-08 21:02 ` Shakeel Butt
2024-08-08 21:21 ` Andrii Nakryiko
1 sibling, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2024-08-08 21:02 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, hannes, ak,
osandov, song, jannh, linux-fsdevel, willy, Omar Sandoval
On Thu, Aug 08, 2024 at 01:15:52PM GMT, Andrii Nakryiko wrote:
> On Thu, Aug 8, 2024 at 11:40 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Wed, Aug 07, 2024 at 04:40:25PM GMT, Andrii Nakryiko wrote:
> > > Extend freader with a flag specifying whether it's OK to cause page
> > > fault to fetch file data that is not already physically present in
> > > memory. With this, it's now easy to wait for data if the caller is
> > > running in sleepable (faultable) context.
> > >
> > > We utilize read_cache_folio() to bring the desired folio into page
> > > cache, after which the rest of the logic works just the same at folio level.
> > >
> > > Suggested-by: Omar Sandoval <osandov@fb.com>
> > > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > > lib/buildid.c | 44 ++++++++++++++++++++++++++++----------------
> > > 1 file changed, 28 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/lib/buildid.c b/lib/buildid.c
> > > index 5e6f842f56f0..e1c01b23efd8 100644
> > > --- a/lib/buildid.c
> > > +++ b/lib/buildid.c
> > > @@ -20,6 +20,7 @@ struct freader {
> > > struct folio *folio;
> > > void *addr;
> > > loff_t folio_off;
> > > + bool may_fault;
> > > };
> > > struct {
> > > const char *data;
> > > @@ -29,12 +30,13 @@ struct freader {
> > > };
> > >
> > > static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
> > > - struct address_space *mapping)
> > > + struct address_space *mapping, bool may_fault)
> > > {
> > > memset(r, 0, sizeof(*r));
> > > r->buf = buf;
> > > r->buf_sz = buf_sz;
> > > r->mapping = mapping;
> > > + r->may_fault = may_fault;
> > > }
> > >
> > > static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
> > > @@ -63,6 +65,11 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
> > > freader_put_folio(r);
> > >
> > > r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
> > > +
> > > + /* if sleeping is allowed, wait for the page, if necessary */
> > > + if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)))
> > > + r->folio = read_cache_folio(r->mapping, file_off >> PAGE_SHIFT, NULL, NULL);
> >
> > Willy's network fs comment is bugging me. If we pass NULL for filler,
> > the kernel will going to use fs's read_folio() callback. I have checked
> > read_folio() for fuse and nfs and it seems like for at least these two
> > filesystems the callback is accessing file->private_data. So, if the elf
> > file is on these filesystems, we might see null accesses.
> >
>
> Isn't that just a huge problem with the read_cache_folio() interface
> then? That file is optional, in general, but for some specific FS
> types it's not. How generic code is supposed to know this?
>
> Or maybe it's a bug with the nfs_read_folio() and fuse_read_folio()
> implementation that they can't handle NULL file argument?
> netfs_read_folio(), for example, seems to be working with file == NULL
> just fine.
If you go a bit down in netfs_alloc_request() there is the following
code:
if (rreq->netfs_ops->init_request) {
ret = rreq->netfs_ops->init_request(rreq, file);
...
...
I think this init_request is pointing to nfs_netfs_init_request which
calls nfs_file_open_context(file) and access filp->private_data.
>
> Matthew, can you please advise what's the right approach here? I can,
> of course, always get file refcount, but most of the time it will be
> just an unnecessary overhead, so ideally I'd like to avoid that. But
> if I have to check each read_folio callback implementation to know
> whether it's required or not, then that's not great...
I don't think we will need file refcnt. We have mmap lock in read mode
in this context because we are accessing vma and this vma has reference
to the file. So, this file can not go away under us here.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API
2024-08-08 21:02 ` Shakeel Butt
@ 2024-08-08 21:21 ` Andrii Nakryiko
0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2024-08-08 21:21 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, hannes, ak,
osandov, song, jannh, linux-fsdevel, willy, Omar Sandoval
On Thu, Aug 8, 2024 at 2:02 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Aug 08, 2024 at 01:15:52PM GMT, Andrii Nakryiko wrote:
> > On Thu, Aug 8, 2024 at 11:40 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Wed, Aug 07, 2024 at 04:40:25PM GMT, Andrii Nakryiko wrote:
> > > > Extend freader with a flag specifying whether it's OK to cause page
> > > > fault to fetch file data that is not already physically present in
> > > > memory. With this, it's now easy to wait for data if the caller is
> > > > running in sleepable (faultable) context.
> > > >
> > > > We utilize read_cache_folio() to bring the desired folio into page
> > > > cache, after which the rest of the logic works just the same at folio level.
> > > >
> > > > Suggested-by: Omar Sandoval <osandov@fb.com>
> > > > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > > lib/buildid.c | 44 ++++++++++++++++++++++++++++----------------
> > > > 1 file changed, 28 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/lib/buildid.c b/lib/buildid.c
> > > > index 5e6f842f56f0..e1c01b23efd8 100644
> > > > --- a/lib/buildid.c
> > > > +++ b/lib/buildid.c
> > > > @@ -20,6 +20,7 @@ struct freader {
> > > > struct folio *folio;
> > > > void *addr;
> > > > loff_t folio_off;
> > > > + bool may_fault;
> > > > };
> > > > struct {
> > > > const char *data;
> > > > @@ -29,12 +30,13 @@ struct freader {
> > > > };
> > > >
> > > > static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
> > > > - struct address_space *mapping)
> > > > + struct address_space *mapping, bool may_fault)
> > > > {
> > > > memset(r, 0, sizeof(*r));
> > > > r->buf = buf;
> > > > r->buf_sz = buf_sz;
> > > > r->mapping = mapping;
> > > > + r->may_fault = may_fault;
> > > > }
> > > >
> > > > static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
> > > > @@ -63,6 +65,11 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
> > > > freader_put_folio(r);
> > > >
> > > > r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
> > > > +
> > > > + /* if sleeping is allowed, wait for the page, if necessary */
> > > > + if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)))
> > > > + r->folio = read_cache_folio(r->mapping, file_off >> PAGE_SHIFT, NULL, NULL);
> > >
> > > Willy's network fs comment is bugging me. If we pass NULL for filler,
> > > the kernel will going to use fs's read_folio() callback. I have checked
> > > read_folio() for fuse and nfs and it seems like for at least these two
> > > filesystems the callback is accessing file->private_data. So, if the elf
> > > file is on these filesystems, we might see null accesses.
> > >
> >
> > Isn't that just a huge problem with the read_cache_folio() interface
> > then? That file is optional, in general, but for some specific FS
> > types it's not. How generic code is supposed to know this?
> >
> > Or maybe it's a bug with the nfs_read_folio() and fuse_read_folio()
> > implementation that they can't handle NULL file argument?
> > netfs_read_folio(), for example, seems to be working with file == NULL
> > just fine.
>
> If you go a bit down in netfs_alloc_request() there is the following
> code:
>
> if (rreq->netfs_ops->init_request) {
> ret = rreq->netfs_ops->init_request(rreq, file);
> ...
> ...
>
> I think this init_request is pointing to nfs_netfs_init_request which
> calls nfs_file_open_context(file) and access filp->private_data.
That's "nfs", which we know requires a file. For netfs implementations
(cifs_init_request() and v9fs_init_request()), they both treat file as
optional consistently.
But regardless, that's just pointless code archeology, I'll just pass
the file reference unconditionally.
>
> >
> > Matthew, can you please advise what's the right approach here? I can,
> > of course, always get file refcount, but most of the time it will be
> > just an unnecessary overhead, so ideally I'd like to avoid that. But
> > if I have to check each read_folio callback implementation to know
> > whether it's required or not, then that's not great...
>
> I don't think we will need file refcnt. We have mmap lock in read mode
> in this context because we are accessing vma and this vma has reference
> to the file. So, this file can not go away under us here.
Yep, good point, then it's not a problem, thanks! Will update.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API
2024-08-08 20:57 ` Jann Horn
@ 2024-08-08 21:23 ` Andrii Nakryiko
0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2024-08-08 21:23 UTC (permalink / raw)
To: Jann Horn
Cc: Shakeel Butt, Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan,
hannes, ak, osandov, song, linux-fsdevel, willy, Omar Sandoval
On Thu, Aug 8, 2024 at 1:58 PM Jann Horn <jannh@google.com> wrote:
>
> On Thu, Aug 8, 2024 at 10:16 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Thu, Aug 8, 2024 at 11:40 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Wed, Aug 07, 2024 at 04:40:25PM GMT, Andrii Nakryiko wrote:
> > > > Extend freader with a flag specifying whether it's OK to cause page
> > > > fault to fetch file data that is not already physically present in
> > > > memory. With this, it's now easy to wait for data if the caller is
> > > > running in sleepable (faultable) context.
> > > >
> > > > We utilize read_cache_folio() to bring the desired folio into page
> > > > cache, after which the rest of the logic works just the same at folio level.
> > > >
> > > > Suggested-by: Omar Sandoval <osandov@fb.com>
> > > > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > > lib/buildid.c | 44 ++++++++++++++++++++++++++++----------------
> > > > 1 file changed, 28 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/lib/buildid.c b/lib/buildid.c
> > > > index 5e6f842f56f0..e1c01b23efd8 100644
> > > > --- a/lib/buildid.c
> > > > +++ b/lib/buildid.c
> > > > @@ -20,6 +20,7 @@ struct freader {
> > > > struct folio *folio;
> > > > void *addr;
> > > > loff_t folio_off;
> > > > + bool may_fault;
> > > > };
> > > > struct {
> > > > const char *data;
> > > > @@ -29,12 +30,13 @@ struct freader {
> > > > };
> > > >
> > > > static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
> > > > - struct address_space *mapping)
> > > > + struct address_space *mapping, bool may_fault)
> > > > {
> > > > memset(r, 0, sizeof(*r));
> > > > r->buf = buf;
> > > > r->buf_sz = buf_sz;
> > > > r->mapping = mapping;
> > > > + r->may_fault = may_fault;
> > > > }
> > > >
> > > > static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
> > > > @@ -63,6 +65,11 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
> > > > freader_put_folio(r);
> > > >
> > > > r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
> > > > +
> > > > + /* if sleeping is allowed, wait for the page, if necessary */
> > > > + if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)))
> > > > + r->folio = read_cache_folio(r->mapping, file_off >> PAGE_SHIFT, NULL, NULL);
> > >
> > > Willy's network fs comment is bugging me. If we pass NULL for filler,
> > > the kernel will going to use fs's read_folio() callback. I have checked
> > > read_folio() for fuse and nfs and it seems like for at least these two
> > > filesystems the callback is accessing file->private_data. So, if the elf
> > > file is on these filesystems, we might see null accesses.
> > >
> >
> > Isn't that just a huge problem with the read_cache_folio() interface
> > then? That file is optional, in general, but for some specific FS
> > types it's not. How generic code is supposed to know this?
>
> I think you have to think about it the other way around. The file is
Fair enough:
> @file: Passed to filler function, may be NULL if not required.
But then you look at mapping_read_folio_gfp() which *always*
unconditionally passes NULL for filler and file, and that makes you
think that file is some special *extra* parameter.
But regardless, as you pointed out, I won't have to take extra ref, so
my concerns about performance are wrong. I'll pass the file.
> required, unless you know the filler function that will be used
> doesn't use the file. Which you don't know when you're coming from
> generic code, so generic code has to pass in a file.
>
> As far as I can tell, most of the callers of read_cache_folio() (via
> read_mapping_folio()) are inside filesystem implementations, not
> generic code, so they know what the filler function will do. You're
> generic code, so I think you have to pass in a file.
>
Yep, I guess this is a bit of trailblazing use case. I was confused by
some other helpers passing NULL for file unconditionally, which made
me think that NULL is a supported default use case. Clearly I was
wrong.
> > Or maybe it's a bug with the nfs_read_folio() and fuse_read_folio()
> > implementation that they can't handle NULL file argument?
> > netfs_read_folio(), for example, seems to be working with file == NULL
> > just fine.
> >
> > Matthew, can you please advise what's the right approach here? I can,
> > of course, always get file refcount, but most of the time it will be
> > just an unnecessary overhead, so ideally I'd like to avoid that. But
> > if I have to check each read_folio callback implementation to know
> > whether it's required or not, then that's not great...
>
> Why would you need to increment the file refcount? As far as I can
> tell, all your accesses to the file would happen under
> __build_id_parse(), which is borrowing the refcounted reference from
> vma->vm_file; the file can't go away as long as your caller is holding
> the mmap lock.
Yep, agreed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 01/10] lib/buildid: harden build ID parsing logic
[not found] ` <20240807234029.456316-2-andrii@kernel.org>
@ 2024-08-08 22:24 ` Andi Kleen
2024-08-08 22:44 ` Andrii Nakryiko
0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2024-08-08 22:24 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, linux-mm, akpm, adobriyan, shakeel.butt, hannes, osandov,
song, jannh, linux-fsdevel, willy, stable
> + name_sz = READ_ONCE(nhdr->n_namesz);
> + desc_sz = READ_ONCE(nhdr->n_descsz);
> + new_offs = note_offs + sizeof(Elf32_Nhdr) + ALIGN(name_sz, 4) + ALIGN(desc_sz, 4);
Don't you need to check the name_sz and desc_sz overflows separately?
Otherwise name_sz could be ~0 and desc_sz small (or reversed) and the check
below wouldn't trigger, but still bad things could happen.
> + if (new_offs <= note_offs /* overflow */ || new_offs > note_size)
> + break;
-Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 bpf-next 01/10] lib/buildid: harden build ID parsing logic
2024-08-08 22:24 ` [PATCH v4 bpf-next 01/10] lib/buildid: harden build ID parsing logic Andi Kleen
@ 2024-08-08 22:44 ` Andrii Nakryiko
0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2024-08-08 22:44 UTC (permalink / raw)
To: Andi Kleen
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, shakeel.butt,
hannes, osandov, song, jannh, linux-fsdevel, willy, stable
On Thu, Aug 8, 2024 at 3:24 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> > + name_sz = READ_ONCE(nhdr->n_namesz);
> > + desc_sz = READ_ONCE(nhdr->n_descsz);
> > + new_offs = note_offs + sizeof(Elf32_Nhdr) + ALIGN(name_sz, 4) + ALIGN(desc_sz, 4);
>
> Don't you need to check the name_sz and desc_sz overflows separately?
>
> Otherwise name_sz could be ~0 and desc_sz small (or reversed) and the check
> below wouldn't trigger, but still bad things could happen.
Yes, both sizes are full u32, so yes, they could technically both
overflow resulting in final non-overflown new_offs. I'll switch the
additions to be done step by step.
>
>
> > + if (new_offs <= note_offs /* overflow */ || new_offs > note_size)
> > + break;
>
> -Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-08-08 22:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20240807234029.456316-1-andrii@kernel.org>
[not found] ` <20240807234029.456316-3-andrii@kernel.org>
2024-08-08 18:33 ` [PATCH v4 bpf-next 02/10] lib/buildid: add single folio-based file reader abstraction Shakeel Butt
[not found] ` <20240807234029.456316-7-andrii@kernel.org>
2024-08-08 18:40 ` [PATCH v4 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API Shakeel Butt
2024-08-08 20:15 ` Andrii Nakryiko
2024-08-08 20:57 ` Jann Horn
2024-08-08 21:23 ` Andrii Nakryiko
2024-08-08 21:02 ` Shakeel Butt
2024-08-08 21:21 ` Andrii Nakryiko
[not found] ` <20240807234029.456316-2-andrii@kernel.org>
2024-08-08 22:24 ` [PATCH v4 bpf-next 01/10] lib/buildid: harden build ID parsing logic Andi Kleen
2024-08-08 22:44 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox