[PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign()

classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign()

Kirill Tkhai-2

Unlocked access to dst_rq->curr in task_numa_compare() is racy.
If curr task is exiting this may be a reason of use-after-free:

task_numa_compare()                    do_exit()
    ...                                        current->flags |= PF_EXITING;
    ...                                    release_task()
    ...                                        ~~delayed_put_task_struct()~~
    ...                                    schedule()
    rcu_read_lock()                        ...
    cur = ACCESS_ONCE(dst_rq->curr)        ...
        ...                                rq->curr = next;
        ...                                    context_switch()
        ...                                        finish_task_switch()
        ...                                            put_task_struct()
        ...                                                __put_task_struct()
        ...                                                    free_task_struct()
        task_numa_assign()                                     ...
            get_task_struct()                                  ...

As noted by Oleg:

  <<The lockless get_task_struct(tsk) is only safe if tsk == current
    and didn't pass exit_notify(), or if this tsk was found on a rcu
    protected list (say, for_each_process() or find_task_by_vpid()).
    IOW, it is only safe if release_task() was not called before we
    take rcu_read_lock(), in this case we can rely on the fact that
    delayed_put_pid() can not drop the (potentially) last reference
    until rcu_read_unlock().

    And as Kirill pointed out task_numa_compare()->task_numa_assign()
    path does get_task_struct(dst_rq->curr) and this is not safe. The
    task_struct itself can't go away, but rcu_read_lock() can't save
    us from the final put_task_struct() in finish_task_switch(); this
    reference goes away without rcu gp>>

The patch provides simple check of PF_EXITING flag. If it's not set,
this guarantees that call_rcu() of delayed_put_task_struct() callback
hasn't happened yet, so we can safely do get_task_struct() in
task_numa_assign().

Locked dst_rq->lock protects from concurrency with the last schedule().
Reusing or unmapping of cur's memory may happen without it.

Signed-off-by: Kirill Tkhai <[hidden email]>
Suggested-by: Oleg Nesterov <[hidden email]>
---
 kernel/sched/fair.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0b069bf..fbc0b82 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1164,9 +1164,19 @@ static void task_numa_compare(struct task_numa_env *env,
  long moveimp = imp;
 
  rcu_read_lock();
- cur = ACCESS_ONCE(dst_rq->curr);
- if (cur->pid == 0) /* idle */
+
+ raw_spin_lock_irq(&dst_rq->lock);
+ cur = dst_rq->curr;
+ /*
+ * No need to move the exiting task, and this ensures that ->curr
+ * wasn't reaped and thus get_task_struct() in task_numa_assign()
+ * is safe under RCU read lock.
+ * Note that rcu_read_lock() itself can't protect from the final
+ * put_task_struct() after the last schedule().
+ */
+ if ((cur->flags & PF_EXITING) || is_idle_task(cur))
  cur = NULL;
+ raw_spin_unlock_irq(&dst_rq->lock);
 
  /*
  * "imp" is the fault differential for the source task between the



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

introduce task_rcu_dereference?

Oleg Nesterov
On 10/22, Kirill Tkhai wrote:
>
> Unlocked access to dst_rq->curr in task_numa_compare() is racy.
> If curr task is exiting this may be a reason of use-after-free:

Thanks.

And as you pointed out, there are other examples of unlocked
foreign_rq->curr usage.

So, Kirill, Peter, do you think that the patch below can help? Can
we change task_numa_group() and ->select_task_rq() to do nothing if
rq_curr_rcu_safe() returns NULL? It seems we can...

task_numa_compare() can use it too, we can make another patch on
top of this one.

        - Obviously just for the early review. Lacks the changelog
          and the comments (at least).

        - Once again, I won't insist on probe_slab_address(). We can
          add SDBR and change task_rcu_dereference() to simply read
          ->sighand.

        - Also, I won't argue if you think that we do not need a
          generic helper. In this case we can move this logic into
          rq_curr_rcu_safe() and it will be a bit simpler.

        - OTOH, I am not sure we need rq_curr_rcu_safe(). The callers
          can just use task_rcu_dereference() and check IS_ERR_OR_NULL,
          I guess retry doesn't buy too much in this case.

Or do you think we need something else?

Oleg.

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 857ba40..0ba420e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2300,6 +2300,7 @@ extern void block_all_signals(int (*notifier)(void *priv), void *priv,
       sigset_t *mask);
 extern void unblock_all_signals(void);
 extern void release_task(struct task_struct * p);
+extern struct task_struct *task_rcu_dereference(struct task_struct **ptask);
 extern int send_sig_info(int, struct siginfo *, struct task_struct *);
 extern int force_sigsegv(int, struct task_struct *);
 extern int force_sig_info(int, struct siginfo *, struct task_struct *);
diff --git a/kernel/exit.c b/kernel/exit.c
index 32c58f7..4aa00c7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -213,6 +213,37 @@ repeat:
  goto repeat;
 }
 
+struct task_struct *task_rcu_dereference(struct task_struct **ptask)
+{
+ struct task_struct *task;
+ struct sighand_struct *sighand;
+
+ task = rcu_dereference(*ptask);
+ if (!task)
+ return NULL;
+
+ /* If it fails the check below must fail too */
+ probe_slab_address(&task->sighand, sighand);
+ /*
+ * Pairs with atomic_dec_and_test() in put_task_struct(task).
+ * If we have read the freed/reused memory, we must see that
+ * the pointer was updated. The caller might want to retry in
+ * this case.
+ */
+ smp_rmb();
+ if (unlikely(task != ACCESS_ONCE(*ptask)))
+ return ERR_PTR(-EAGAIN);
+
+ /*
+ * release_task(task) was already called; potentially before
+ * the caller took rcu_read_lock() and in this case it can be
+ * freed before rcu_read_unlock().
+ */
+ if (!sighand)
+ return ERR_PTR(-EINVAL);
+ return task;
+}
+
 /*
  * This checks not only the pgrp, but falls back on the pid if no
  * satisfactory pgrp is found. I dunno - gdb doesn't work correctly
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 579712f..249c0c1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -655,6 +655,18 @@ DECLARE_PER_CPU(struct rq, runqueues);
 #define cpu_curr(cpu) (cpu_rq(cpu)->curr)
 #define raw_rq() (&__raw_get_cpu_var(runqueues))
 
+static inline struct task_struct *rq_curr_rcu_safe(struct rq *rq)
+{
+ for (;;) {
+ struct task_struct *curr = task_rcu_dereference(&rq->curr);
+ /* NULL is not possible */
+ if (likely(!IS_ERR(curr)))
+ return curr;
+ if (PTR_ERR(curr) != -EAGAIN)
+ return NULL;
+ }
+}
+
 static inline u64 rq_clock(struct rq *rq)
 {
  return rq->clock;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: introduce task_rcu_dereference?

Oleg Nesterov
Damn.

On 10/22, Oleg Nesterov wrote:

>
> +struct task_struct *task_rcu_dereference(struct task_struct **ptask)
> +{
> + struct task_struct *task;
> + struct sighand_struct *sighand;
> +
> + task = rcu_dereference(*ptask);
> + if (!task)
> + return NULL;
> +
> + /* If it fails the check below must fail too */
> + probe_slab_address(&task->sighand, sighand);
> + /*
> + * Pairs with atomic_dec_and_test() in put_task_struct(task).
> + * If we have read the freed/reused memory, we must see that
> + * the pointer was updated. The caller might want to retry in
> + * this case.
> + */
> + smp_rmb();
> + if (unlikely(task != ACCESS_ONCE(*ptask)))
> + return ERR_PTR(-EAGAIN);

This is not exactly right. task == *ptask can be false positive.

It can be freed, then resused (so that sighand != NULL can be false
positive), then freed again, and then reused again as task_struct.

This is not that bad, we still can safely use this task_struct, but
the comment should be updated. Plus -EINVAL below can be wrong in
this case although this minor.

Yeees, SLAB_DESTTROY_BY_RCU closes this race. Not sure why I'd like
to avoid it, but I do ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: introduce task_rcu_dereference?

Kirill Tkhai-2
In reply to this post by Oleg Nesterov
В Ср, 22/10/2014 в 23:30 +0200, Oleg Nesterov пишет:

> On 10/22, Kirill Tkhai wrote:
> >
> > Unlocked access to dst_rq->curr in task_numa_compare() is racy.
> > If curr task is exiting this may be a reason of use-after-free:
>
> Thanks.
>
> And as you pointed out, there are other examples of unlocked
> foreign_rq->curr usage.
>
> So, Kirill, Peter, do you think that the patch below can help? Can
> we change task_numa_group() and ->select_task_rq() to do nothing if
> rq_curr_rcu_safe() returns NULL? It seems we can...
>
> task_numa_compare() can use it too, we can make another patch on
> top of this one.
>
> - Obviously just for the early review. Lacks the changelog
>  and the comments (at least).
>
> - Once again, I won't insist on probe_slab_address(). We can
>  add SDBR and change task_rcu_dereference() to simply read
>  ->sighand.
>
> - Also, I won't argue if you think that we do not need a
>  generic helper. In this case we can move this logic into
>  rq_curr_rcu_safe() and it will be a bit simpler.
>
> - OTOH, I am not sure we need rq_curr_rcu_safe(). The callers
>  can just use task_rcu_dereference() and check IS_ERR_OR_NULL,
>  I guess retry doesn't buy too much in this case.
>
> Or do you think we need something else?
>
> Oleg.
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 857ba40..0ba420e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2300,6 +2300,7 @@ extern void block_all_signals(int (*notifier)(void *priv), void *priv,
>        sigset_t *mask);
>  extern void unblock_all_signals(void);
>  extern void release_task(struct task_struct * p);
> +extern struct task_struct *task_rcu_dereference(struct task_struct **ptask);
>  extern int send_sig_info(int, struct siginfo *, struct task_struct *);
>  extern int force_sigsegv(int, struct task_struct *);
>  extern int force_sig_info(int, struct siginfo *, struct task_struct *);
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 32c58f7..4aa00c7 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -213,6 +213,37 @@ repeat:
>   goto repeat;
>  }
>  
> +struct task_struct *task_rcu_dereference(struct task_struct **ptask)
> +{
> + struct task_struct *task;
> + struct sighand_struct *sighand;
> +
> + task = rcu_dereference(*ptask);
> + if (!task)
> + return NULL;
> +
> + /* If it fails the check below must fail too */
> + probe_slab_address(&task->sighand, sighand);
> + /*
> + * Pairs with atomic_dec_and_test() in put_task_struct(task).
> + * If we have read the freed/reused memory, we must see that
> + * the pointer was updated. The caller might want to retry in
> + * this case.
> + */
> + smp_rmb();
> + if (unlikely(task != ACCESS_ONCE(*ptask)))
> + return ERR_PTR(-EAGAIN);
> +
> + /*
> + * release_task(task) was already called; potentially before
> + * the caller took rcu_read_lock() and in this case it can be
> + * freed before rcu_read_unlock().
> + */
> + if (!sighand)
> + return ERR_PTR(-EINVAL);
> + return task;
> +}
> +
>  /*
>   * This checks not only the pgrp, but falls back on the pid if no
>   * satisfactory pgrp is found. I dunno - gdb doesn't work correctly
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 579712f..249c0c1 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -655,6 +655,18 @@ DECLARE_PER_CPU(struct rq, runqueues);
>  #define cpu_curr(cpu) (cpu_rq(cpu)->curr)
>  #define raw_rq() (&__raw_get_cpu_var(runqueues))
>  
> +static inline struct task_struct *rq_curr_rcu_safe(struct rq *rq)
> +{
> + for (;;) {
> + struct task_struct *curr = task_rcu_dereference(&rq->curr);
> + /* NULL is not possible */
> + if (likely(!IS_ERR(curr)))
> + return curr;
> + if (PTR_ERR(curr) != -EAGAIN)
> + return NULL;
> + }
> +}
> +
>  static inline u64 rq_clock(struct rq *rq)
>  {
>   return rq->clock;
>

I'm agree generic helper is better. But probe_slab_address() has a sence
if we know that SDBR is worse in our subject area. Less of code is
easier to support :) probe_slab_address() it's not a trivial logic.
Also, if we use mm primitives this increases kernel modularity.

Kirill

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: introduce task_rcu_dereference?

Oleg Nesterov
In reply to this post by Oleg Nesterov
On 10/23, Oleg Nesterov wrote:
>
> Damn.

Yes.

> On 10/22, Oleg Nesterov wrote:
> >
> > +struct task_struct *task_rcu_dereference(struct task_struct **ptask)
> > +{
> > + struct task_struct *task;
> > + struct sighand_struct *sighand;
> > +
> > + task = rcu_dereference(*ptask);
> > + if (!task)
> > + return NULL;
> > +
> > + /* If it fails the check below must fail too */
> > + probe_slab_address(&task->sighand, sighand);
> > + /*
> > + * Pairs with atomic_dec_and_test() in put_task_struct(task).
> > + * If we have read the freed/reused memory, we must see that
> > + * the pointer was updated. The caller might want to retry in
> > + * this case.
> > + */
> > + smp_rmb();
> > + if (unlikely(task != ACCESS_ONCE(*ptask)))
> > + return ERR_PTR(-EAGAIN);
>
> This is not exactly right. task == *ptask can be false positive.
>
> It can be freed, then resused (so that sighand != NULL can be false
> positive), then freed again, and then reused again as task_struct.
>
> This is not that bad, we still can safely use this task_struct, but
> the comment should be updated. Plus -EINVAL below can be wrong in
> this case although this minor.

Yes.

> Yeees, SLAB_DESTTROY_BY_RCU closes this race. Not sure why I'd like
> to avoid it, but I do ;)

Argh. I only meant that SLAB_DESTTROY_BY_RCU can make the comments
simpler. "closes this race" applies too "check below must fail too"
too. Sorry if I confused you.

"task == *ptask can be false positive" is true with or without
SLAB_DESTTROY_BY_RCU, and this needs a good comment. Yes, it can't
be reused twice, but still we can't 100% trust the "sighand != NULL"
check.

So let me repeat, SDBR can only turn probe_slab_address() into a plain
load.

But I can't think properly today, will try to recheck tomorrow and send
v2.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: introduce task_rcu_dereference?

Oleg Nesterov
In reply to this post by Kirill Tkhai-2
On 10/23, Kirill Tkhai wrote:
>
> I'm agree generic helper is better. But probe_slab_address() has a sence
> if we know that SDBR is worse in our subject area.

And I still think it is worse.

> Less of code is
> easier to support :)

Sure, but ignoring the comments, SDBR needs the same code in
task_rcu_dereference() ? Except, of course

        - probe_slab_address(&task->sighand, sighand);
        + sighand = task->sighand;

or how do you think we can simplify it?


> probe_slab_address() it's not a trivial logic.

But it already has a user. And probably it can have more.

To me the usage of SDBR is not trivial (and confusing) in this case.
Once again, ignoring the CONFIG_DEBUG_PAGEALLOC problems it does not
help at all.

With or without SDBR rq->curr can be reused and we need to avoid this
race. The fact that with SDBR it can be reused only as another instance
of task_struct is absolutely immaterial imo.

Not to mention that SDBR still adds some overhead while probe_slab()
is free unless CONFIG_DEBUG_PAGEALLOC, but this option adds a large
slowdown anyway.


But again, I can't really work today, perhaps I missed something.
Perhaps you can show a better code which relies on SDBR?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: introduce task_rcu_dereference?

Kirill Tkhai-2
В Чт, 23/10/2014 в 20:18 +0200, Oleg Nesterov пишет:

> On 10/23, Kirill Tkhai wrote:
> >
> > I'm agree generic helper is better. But probe_slab_address() has a sence
> > if we know that SDBR is worse in our subject area.
>
> And I still think it is worse.
>
> > Less of code is
> > easier to support :)
>
> Sure, but ignoring the comments, SDBR needs the same code in
> task_rcu_dereference() ? Except, of course
>
> - probe_slab_address(&task->sighand, sighand);
> + sighand = task->sighand;
>
> or how do you think we can simplify it?

Ok, really, not big simplification there. Your variant is good.

> > probe_slab_address() it's not a trivial logic.
>
> But it already has a user. And probably it can have more.
>
> To me the usage of SDBR is not trivial (and confusing) in this case.
> Once again, ignoring the CONFIG_DEBUG_PAGEALLOC problems it does not
> help at all.
>
> With or without SDBR rq->curr can be reused and we need to avoid this
> race. The fact that with SDBR it can be reused only as another instance
> of task_struct is absolutely immaterial imo.
>
> Not to mention that SDBR still adds some overhead while probe_slab()
> is free unless CONFIG_DEBUG_PAGEALLOC, but this option adds a large
> slowdown anyway.
>
>
> But again, I can't really work today, perhaps I missed something.
> Perhaps you can show a better code which relies on SDBR?

No, it would be the same except probe_slab_address(). So, let's stay
on probe_slab_address() variant and fix the bug this way.

Kirill

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/3] introduce task_rcu_dereference()

Oleg Nesterov
In reply to this post by Kirill Tkhai-2
Peter, let me repeat once again, if you still prefer to avoid
probe_slab_address() and use SLAB_DESTROY_BY_RCU I won't argue.

I do not like SLAB_DESTROY_BY_RCU in this particular case. With
or without SDBR rq->curr can be reused and we need to avoid this
race. The fact that with SDBR it can be reused only as another
instance of task_struct is absolutely immaterial imo.

Not to mention that SDBR still adds some overhead while probe_slab()
is free unless CONFIG_DEBUG_PAGEALLOC, but this option adds a large
slowdown anyway.

However, my arguments against SDBR are not strictly technical, and
I think that this falls into "maintainer is always right" category.

So please tell me if you prefer v2 with SDBR. In this case 2/3 is
not needed, and 3/3 can simply read ->sighand. Otherwise the code
(and even the comments) will be the same.

Compared to the draft patch I sent before

        - update the comments

        - do not use ERR_PTR(), just return the task or NULL, so
          kernel/sched/ doesn't need another helper.

          This means that task_rcu_dereference() does retry itself.
          We can add __task_rcu_dereference() if we have another
          which do not need/want to retry.

Oleg.

 include/linux/sched.h   |    1 +
 include/linux/uaccess.h |   30 +++++++++++++++-------------
 kernel/exit.c           |   49 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/slub.c               |    6 +----
 4 files changed, 67 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/3] introduce probe_slab_address()

Oleg Nesterov
Extract the ifdef(CONFIG_DEBUG_PAGEALLOC) code from get_freepointer_safe()
into the new generic helper, probe_slab_address(). The next patch will add
another user.

Signed-off-by: Oleg Nesterov <[hidden email]>
---
 include/linux/uaccess.h |   15 +++++++++++++++
 mm/slub.c               |    6 +-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index effb637..3367396 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -71,6 +71,21 @@ static inline unsigned long __copy_from_user_nocache(void *to,
  __probe_kernel_read(&(retval), (__force void *)(addr), sizeof(retval))
 
 /*
+ * Same as probe_kernel_address(), but @addr must be the valid pointer
+ * to a slab object, potentially freed/reused/unmapped.
+ */
+#ifdef CONFIG_DEBUG_PAGEALLOC
+#define probe_slab_address(addr, retval) \
+ probe_kernel_address(addr, retval)
+#else
+#define probe_slab_address(addr, retval) \
+ ({ \
+ (retval) = *(typeof(retval) *)(addr); \
+ 0; \
+ })
+#endif
+
+/*
  * probe_kernel_read(): safely attempt to read from a location
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
diff --git a/mm/slub.c b/mm/slub.c
index 3e8afcc..0467d22 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -265,11 +265,7 @@ static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
 {
  void *p;
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
- probe_kernel_read(&p, (void **)(object + s->offset), sizeof(p));
-#else
- p = get_freepointer(s, object);
-#endif
+ probe_slab_address(object + s->offset, p);
  return p;
 }
 
--
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/3] introduce task_rcu_dereference()

Oleg Nesterov
In reply to this post by Oleg Nesterov
task_struct is only protected by RCU if it was found on a RCU protected
list (say, for_each_process() or find_task_by_vpid()).

And as Kirill pointed out rq->curr isn't protected by RCU, the scheduler
drops the (potentially) last reference without RCU gp, this means that we
need to fix the code which uses foreign_rq->curr under rcu_read_lock().

Add a new helper which can be used to dereferene rq->curr or any other
pointer to task_struct assuming that it should be cleared or updated
before the final put_task_struct(). It returns non-NULL only if this
task can't go away before rcu_read_unlock().

Suggested-by: Kirill Tkhai <[hidden email]>
Signed-off-by: Oleg Nesterov <[hidden email]>
---
 include/linux/sched.h |    1 +
 kernel/exit.c         |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 857ba40..0ba420e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2300,6 +2300,7 @@ extern void block_all_signals(int (*notifier)(void *priv), void *priv,
       sigset_t *mask);
 extern void unblock_all_signals(void);
 extern void release_task(struct task_struct * p);
+extern struct task_struct *task_rcu_dereference(struct task_struct **ptask);
 extern int send_sig_info(int, struct siginfo *, struct task_struct *);
 extern int force_sigsegv(int, struct task_struct *);
 extern int force_sig_info(int, struct siginfo *, struct task_struct *);
diff --git a/kernel/exit.c b/kernel/exit.c
index 32c58f7..d8b95c2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -213,6 +213,55 @@ repeat:
  goto repeat;
 }
 
+struct task_struct *task_rcu_dereference(struct task_struct **ptask)
+{
+ struct task_struct *task;
+ struct sighand_struct *sighand;
+
+ /*
+ * We need to verify that release_task() was not called and thus
+ * delayed_put_task_struct() can't run and drop the last reference
+ * before rcu_read_unlock(). We check task->sighand != NULL, but
+ * we can read the already freed and reused memory.
+ */
+ retry:
+ task = rcu_dereference(*ptask);
+ if (!task)
+ return NULL;
+
+ probe_slab_address(&task->sighand, sighand);
+ /*
+ * Pairs with atomic_dec_and_test(usage) in put_task_struct(task).
+ * If this task was already freed we can not miss the preceding
+ * update of this pointer.
+ */
+ smp_rmb();
+ if (unlikely(task != ACCESS_ONCE(*ptask)))
+ goto retry;
+
+ /*
+ * Either this is the same task and we can trust sighand != NULL, or
+ * its memory was re-instantiated as another instance of task_struct.
+ * In the latter case the new task can not go away until another rcu
+ * gp pass, so the only problem is that sighand == NULL can be false
+ * positive but we can pretend we got this NULL before it was freed.
+ */
+ if (sighand)
+ return task;
+
+ /*
+ * We could even eliminate the false positive mentioned above:
+ *
+ * probe_slab_address(&task->sighand, sighand);
+ * if (sighand)
+ * goto retry;
+ *
+ * if sighand != NULL because we read the freed memory we should
+ * see the new pointer, otherwise we will likely return this task.
+ */
+ return NULL;
+}
+
 /*
  * This checks not only the pgrp, but falls back on the pid if no
  * satisfactory pgrp is found. I dunno - gdb doesn't work correctly
--
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/3] probe_kernel_address() can use __probe_kernel_read()

Oleg Nesterov
In reply to this post by Oleg Nesterov
probe_kernel_address() can just do __probe_kernel_read(sizeof(retval)),
no need to open-code its implementation.

Signed-off-by: Oleg Nesterov <[hidden email]>
---
 include/linux/uaccess.h |   17 ++---------------
 1 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ecd3319..effb637 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -66,22 +66,9 @@ static inline unsigned long __copy_from_user_nocache(void *to,
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
  * This must be a macro because __get_user() needs to know the types of the
  * args.
- *
- * We don't include enough header files to be able to do the set_fs().  We
- * require that the probe_kernel_address() caller will do that.
  */
-#define probe_kernel_address(addr, retval) \
- ({ \
- long ret; \
- mm_segment_t old_fs = get_fs(); \
- \
- set_fs(KERNEL_DS); \
- pagefault_disable(); \
- ret = __copy_from_user_inatomic(&(retval), (__force typeof(retval) __user *)(addr), sizeof(retval)); \
- pagefault_enable(); \
- set_fs(old_fs); \
- ret; \
- })
+#define probe_kernel_address(addr, retval) \
+ __probe_kernel_read(&(retval), (__force void *)(addr), sizeof(retval))
 
 /*
  * probe_kernel_read(): safely attempt to read from a location
--
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/3] introduce probe_slab_address()

Christoph Lameter-5
In reply to this post by Oleg Nesterov
On Mon, 27 Oct 2014, Oleg Nesterov wrote:

> Extract the ifdef(CONFIG_DEBUG_PAGEALLOC) code from get_freepointer_safe()
> into the new generic helper, probe_slab_address(). The next patch will add
> another user.

Acked-by: Christoph Lameter <[hidden email]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/3] introduce probe_slab_address()

Kirill Tkhai-2
In reply to this post by Oleg Nesterov
В Пн, 27/10/2014 в 20:54 +0100, Oleg Nesterov пишет:

> Extract the ifdef(CONFIG_DEBUG_PAGEALLOC) code from get_freepointer_safe()
> into the new generic helper, probe_slab_address(). The next patch will add
> another user.
>
> Signed-off-by: Oleg Nesterov <[hidden email]>
> ---
>  include/linux/uaccess.h |   15 +++++++++++++++
>  mm/slub.c               |    6 +-----
>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index effb637..3367396 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -71,6 +71,21 @@ static inline unsigned long __copy_from_user_nocache(void *to,
>   __probe_kernel_read(&(retval), (__force void *)(addr), sizeof(retval))
>  
>  /*
> + * Same as probe_kernel_address(), but @addr must be the valid pointer
> + * to a slab object, potentially freed/reused/unmapped.
> + */
> +#ifdef CONFIG_DEBUG_PAGEALLOC
> +#define probe_slab_address(addr, retval) \
> + probe_kernel_address(addr, retval)
> +#else
> +#define probe_slab_address(addr, retval) \
> + ({ \
> + (retval) = *(typeof(retval) *)(addr); \
> + 0; \
> + })
> +#endif
> +
> +/*
>   * probe_kernel_read(): safely attempt to read from a location
>   * @dst: pointer to the buffer that shall take the data
>   * @src: address to read from
> diff --git a/mm/slub.c b/mm/slub.c
> index 3e8afcc..0467d22 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -265,11 +265,7 @@ static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
>  {
>   void *p;
>  
> -#ifdef CONFIG_DEBUG_PAGEALLOC
> - probe_kernel_read(&p, (void **)(object + s->offset), sizeof(p));
> -#else
> - p = get_freepointer(s, object);
> -#endif
> + probe_slab_address(object + s->offset, p);
>   return p;
>  }
>  

probe_kernel_read() was arch-dependent on tree platforms:

arch/blackfin/mm/maccess.c
arch/parisc/lib/memcpy.c
arch/um/kernel/maccess.c

But now we skip these arch-dependent implementations. Is there no a problem?

Kirill

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/3] introduce probe_slab_address()

Kirill Tkhai-2
В Вт, 28/10/2014 в 08:44 +0300, Kirill Tkhai пишет:

> В Пн, 27/10/2014 в 20:54 +0100, Oleg Nesterov пишет:
> > Extract the ifdef(CONFIG_DEBUG_PAGEALLOC) code from get_freepointer_safe()
> > into the new generic helper, probe_slab_address(). The next patch will add
> > another user.
> >
> > Signed-off-by: Oleg Nesterov <[hidden email]>
> > ---
> >  include/linux/uaccess.h |   15 +++++++++++++++
> >  mm/slub.c               |    6 +-----
> >  2 files changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> > index effb637..3367396 100644
> > --- a/include/linux/uaccess.h
> > +++ b/include/linux/uaccess.h
> > @@ -71,6 +71,21 @@ static inline unsigned long __copy_from_user_nocache(void *to,
> >   __probe_kernel_read(&(retval), (__force void *)(addr), sizeof(retval))
> >  
> >  /*
> > + * Same as probe_kernel_address(), but @addr must be the valid pointer
> > + * to a slab object, potentially freed/reused/unmapped.
> > + */
> > +#ifdef CONFIG_DEBUG_PAGEALLOC
> > +#define probe_slab_address(addr, retval) \
> > + probe_kernel_address(addr, retval)
> > +#else
> > +#define probe_slab_address(addr, retval) \
> > + ({ \
> > + (retval) = *(typeof(retval) *)(addr); \
> > + 0; \
> > + })
> > +#endif
> > +
> > +/*
> >   * probe_kernel_read(): safely attempt to read from a location
> >   * @dst: pointer to the buffer that shall take the data
> >   * @src: address to read from
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 3e8afcc..0467d22 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -265,11 +265,7 @@ static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
> >  {
> >   void *p;
> >  
> > -#ifdef CONFIG_DEBUG_PAGEALLOC
> > - probe_kernel_read(&p, (void **)(object + s->offset), sizeof(p));
> > -#else
> > - p = get_freepointer(s, object);
> > -#endif
> > + probe_slab_address(object + s->offset, p);
> >   return p;
> >  }
> >  
>
> probe_kernel_read() was arch-dependent on tree platforms:

Of course, I mean get_freepointer_safe() used to use arch-dependent
probe_kernel_read() on blackfin, parisc and um.

> arch/blackfin/mm/maccess.c
> arch/parisc/lib/memcpy.c
> arch/um/kernel/maccess.c
>
> But now we skip these arch-dependent implementations. Is there no a problem?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 3/3] introduce task_rcu_dereference()

Kirill Tkhai-2
In reply to this post by Oleg Nesterov
В Пн, 27/10/2014 в 20:54 +0100, Oleg Nesterov пишет:

> task_struct is only protected by RCU if it was found on a RCU protected
> list (say, for_each_process() or find_task_by_vpid()).
>
> And as Kirill pointed out rq->curr isn't protected by RCU, the scheduler
> drops the (potentially) last reference without RCU gp, this means that we
> need to fix the code which uses foreign_rq->curr under rcu_read_lock().
>
> Add a new helper which can be used to dereferene rq->curr or any other
> pointer to task_struct assuming that it should be cleared or updated
> before the final put_task_struct(). It returns non-NULL only if this
> task can't go away before rcu_read_unlock().
>
> Suggested-by: Kirill Tkhai <[hidden email]>
> Signed-off-by: Oleg Nesterov <[hidden email]>
> ---
>  include/linux/sched.h |    1 +
>  kernel/exit.c         |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 857ba40..0ba420e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2300,6 +2300,7 @@ extern void block_all_signals(int (*notifier)(void *priv), void *priv,
>        sigset_t *mask);
>  extern void unblock_all_signals(void);
>  extern void release_task(struct task_struct * p);
> +extern struct task_struct *task_rcu_dereference(struct task_struct **ptask);
>  extern int send_sig_info(int, struct siginfo *, struct task_struct *);
>  extern int force_sigsegv(int, struct task_struct *);
>  extern int force_sig_info(int, struct siginfo *, struct task_struct *);
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 32c58f7..d8b95c2 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -213,6 +213,55 @@ repeat:
>   goto repeat;
>  }
>  
> +struct task_struct *task_rcu_dereference(struct task_struct **ptask)
> +{
> + struct task_struct *task;
> + struct sighand_struct *sighand;
> +
> + /*
> + * We need to verify that release_task() was not called and thus
> + * delayed_put_task_struct() can't run and drop the last reference
> + * before rcu_read_unlock(). We check task->sighand != NULL, but
> + * we can read the already freed and reused memory.
> + */
> + retry:
> + task = rcu_dereference(*ptask);
> + if (!task)
> + return NULL;
> +
> + probe_slab_address(&task->sighand, sighand);
> + /*
> + * Pairs with atomic_dec_and_test(usage) in put_task_struct(task).
> + * If this task was already freed we can not miss the preceding
> + * update of this pointer.
> + */
> + smp_rmb();
> + if (unlikely(task != ACCESS_ONCE(*ptask)))
> + goto retry;
> +
> + /*
> + * Either this is the same task and we can trust sighand != NULL, or
> + * its memory was re-instantiated as another instance of task_struct.
> + * In the latter case the new task can not go away until another rcu
> + * gp pass, so the only problem is that sighand == NULL can be false
> + * positive but we can pretend we got this NULL before it was freed.
> + */
> + if (sighand)
> + return task;
> +
> + /*
> + * We could even eliminate the false positive mentioned above:
> + *
> + * probe_slab_address(&task->sighand, sighand);
> + * if (sighand)
> + * goto retry;
> + *
> + * if sighand != NULL because we read the freed memory we should
> + * see the new pointer, otherwise we will likely return this task.
> + */
> + return NULL;
> +}
> +
>  /*
>   * This checks not only the pgrp, but falls back on the pid if no
>   * satisfactory pgrp is found. I dunno - gdb doesn't work correctly

Reviewed-by: Kirill Tkhai <[hidden email]>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

[tip:sched/core] sched/numa: Fix unsafe get_task_struct() in task_numa_assign()

tip-bot for Peter Zijlstra
In reply to this post by Kirill Tkhai-2
Commit-ID:  1effd9f19324efb05fccc7421530e11a52db0278
Gitweb:     http://git.kernel.org/tip/1effd9f19324efb05fccc7421530e11a52db0278
Author:     Kirill Tkhai <[hidden email]>
AuthorDate: Wed, 22 Oct 2014 11:17:11 +0400
Committer:  Ingo Molnar <[hidden email]>
CommitDate: Tue, 28 Oct 2014 10:46:02 +0100

sched/numa: Fix unsafe get_task_struct() in task_numa_assign()

Unlocked access to dst_rq->curr in task_numa_compare() is racy.
If curr task is exiting this may be a reason of use-after-free:

task_numa_compare()                    do_exit()
    ...                                        current->flags |= PF_EXITING;
    ...                                    release_task()
    ...                                        ~~delayed_put_task_struct()~~
    ...                                    schedule()
    rcu_read_lock()                        ...
    cur = ACCESS_ONCE(dst_rq->curr)        ...
        ...                                rq->curr = next;
        ...                                    context_switch()
        ...                                        finish_task_switch()
        ...                                            put_task_struct()
        ...                                                __put_task_struct()
        ...                                                    free_task_struct()
        task_numa_assign()                                     ...
            get_task_struct()                                  ...

As noted by Oleg:

  <<The lockless get_task_struct(tsk) is only safe if tsk == current
    and didn't pass exit_notify(), or if this tsk was found on a rcu
    protected list (say, for_each_process() or find_task_by_vpid()).
    IOW, it is only safe if release_task() was not called before we
    take rcu_read_lock(), in this case we can rely on the fact that
    delayed_put_pid() can not drop the (potentially) last reference
    until rcu_read_unlock().

    And as Kirill pointed out task_numa_compare()->task_numa_assign()
    path does get_task_struct(dst_rq->curr) and this is not safe. The
    task_struct itself can't go away, but rcu_read_lock() can't save
    us from the final put_task_struct() in finish_task_switch(); this
    reference goes away without rcu gp>>

The patch provides simple check of PF_EXITING flag. If it's not set,
this guarantees that call_rcu() of delayed_put_task_struct() callback
hasn't happened yet, so we can safely do get_task_struct() in
task_numa_assign().

Locked dst_rq->lock protects from concurrency with the last schedule().
Reusing or unmapping of cur's memory may happen without it.

Suggested-by: Oleg Nesterov <[hidden email]>
Signed-off-by: Kirill Tkhai <[hidden email]>
Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
Cc: Linus Torvalds <[hidden email]>
Link: http://lkml.kernel.org/r/1413962231.19914.130.camel@tkhai
Signed-off-by: Ingo Molnar <[hidden email]>
---
 kernel/sched/fair.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0b069bf..fbc0b82 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1164,9 +1164,19 @@ static void task_numa_compare(struct task_numa_env *env,
  long moveimp = imp;
 
  rcu_read_lock();
- cur = ACCESS_ONCE(dst_rq->curr);
- if (cur->pid == 0) /* idle */
+
+ raw_spin_lock_irq(&dst_rq->lock);
+ cur = dst_rq->curr;
+ /*
+ * No need to move the exiting task, and this ensures that ->curr
+ * wasn't reaped and thus get_task_struct() in task_numa_assign()
+ * is safe under RCU read lock.
+ * Note that rcu_read_lock() itself can't protect from the final
+ * put_task_struct() after the last schedule().
+ */
+ if ((cur->flags & PF_EXITING) || is_idle_task(cur))
  cur = NULL;
+ raw_spin_unlock_irq(&dst_rq->lock);
 
  /*
  * "imp" is the fault differential for the source task between the
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/3] introduce probe_slab_address()

Peter Zijlstra-5
In reply to this post by Kirill Tkhai-2
On Tue, Oct 28, 2014 at 08:44:51AM +0300, Kirill Tkhai wrote:
> В Пн, 27/10/2014 в 20:54 +0100, Oleg Nesterov пишет:

> > +#define probe_slab_address(addr, retval) \
> > + probe_kernel_address(addr, retval)
>
> probe_kernel_read() was arch-dependent on tree platforms:
>
> arch/blackfin/mm/maccess.c
> arch/parisc/lib/memcpy.c
> arch/um/kernel/maccess.c
>
> But now we skip these arch-dependent implementations. Is there no a problem?

Nope, see the first patch, it makes probe_kernel_address use
__probe_kernel_read().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/3] introduce probe_slab_address()

Kirill Tkhai
On 28.10.2014 18:01, Peter Zijlstra wrote:

> On Tue, Oct 28, 2014 at 08:44:51AM +0300, Kirill Tkhai wrote:
>> В Пн, 27/10/2014 в 20:54 +0100, Oleg Nesterov пишет:
>
>>> +#define probe_slab_address(addr, retval) \
>>> + probe_kernel_address(addr, retval)
>>
>> probe_kernel_read() was arch-dependent on tree platforms:
>>
>> arch/blackfin/mm/maccess.c
>> arch/parisc/lib/memcpy.c
>> arch/um/kernel/maccess.c
>>
>> But now we skip these arch-dependent implementations. Is there no a problem?
>
> Nope, see the first patch, it makes probe_kernel_address use
> __probe_kernel_read().
>

Yes, probe_kernel_read() is in [1/3], but it's not the same as
__probe_kernel_read() for blackfin, for example.

It's defined as

long __weak probe_kernel_read(void *dst, const void *src, size_t size)
    __attribute__((alias("__probe_kernel_read")));

But blackfin's probe_kernel_read() redefines this __weak function,
isn't it? Didn't get_freepointer_safe() use to call architecture's
probe_kernel_read() before?

I don't see how it is called now...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/3] introduce probe_slab_address()

Kirill Tkhai
On 28.10.2014 20:56, Kirill Tkhai wrote:

> On 28.10.2014 18:01, Peter Zijlstra wrote:
>> On Tue, Oct 28, 2014 at 08:44:51AM +0300, Kirill Tkhai wrote:
>>> В Пн, 27/10/2014 в 20:54 +0100, Oleg Nesterov пишет:
>>
>>>> +#define probe_slab_address(addr, retval) \
>>>> + probe_kernel_address(addr, retval)
>>>
>>> probe_kernel_read() was arch-dependent on tree platforms:
>>>
>>> arch/blackfin/mm/maccess.c
>>> arch/parisc/lib/memcpy.c
>>> arch/um/kernel/maccess.c
>>>
>>> But now we skip these arch-dependent implementations. Is there no a problem?
>>
>> Nope, see the first patch, it makes probe_kernel_address use
>> __probe_kernel_read().
>>
>
> Yes, probe_kernel_read() is in [1/3], but it's not the same as
> __probe_kernel_read() for blackfin, for example.

Vise versa, I mean __probe_kernel_read() is in [1/3].

> It's defined as
>
> long __weak probe_kernel_read(void *dst, const void *src, size_t size)
>     __attribute__((alias("__probe_kernel_read")));
>
> But blackfin's probe_kernel_read() redefines this __weak function,
> isn't it? Didn't get_freepointer_safe() use to call architecture's
> probe_kernel_read() before?
>
> I don't see how it is called now...
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/3] introduce probe_slab_address()

Oleg Nesterov
In reply to this post by Kirill Tkhai
On 10/28, Kirill Tkhai wrote:

>
> Yes, probe_kernel_read() is in [1/3], but it's not the same as
> __probe_kernel_read() for blackfin, for example.
>
> It's defined as
>
> long __weak probe_kernel_read(void *dst, const void *src, size_t size)
>     __attribute__((alias("__probe_kernel_read")));
>
> But blackfin's probe_kernel_read() redefines this __weak function,
> isn't it? Didn't get_freepointer_safe() use to call architecture's
> probe_kernel_read() before?

I _think_ that __probe_kernel_read(slab_ddr) should be fine.

Yes, an architecture may want to reimplement probe_kernel_read() to
allow to safely access the special areas, or special addresses.

But again, in this case we know that this address points to the
"normal" kernel memory, __copy_from_user_inatomic() should work fine.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
123