You need to understand that a thread/process context has multiple parts, one, directly associated with execution and is held in the CPU and certain system tables in memory that the CPU uses (e.g. page tables), and the other, which is needed for the OS, for bookkeeping (think of the various IDs, handles, special OS-specific permissions, network connections and such).
A full context switch would involve swapping both of these, the old current thread/process goes away for a while and the new current thread/process comes in for a while. That's the essence of thread/process scheduling.
Now, system calls are very different w.r.t. each other.
Consider something simple, for example, the system call for requesting the current date and time. The CPU switches from the user to kernel mode, preserving the user-mode register values, executes some kernel code to get the necessary data, stores it either in the memory or registers that the caller can access, restores the user-mode register values and returns. There's not much of context switch in here, only what's needed for the transition between the modes, user and kernel.
Consider now a system call that involves blocking of the caller until some event or availability of data. Manipulating mutexes and reading files would be examples of such system calls. In this case the kernel is forced to save the full context of the caller, mark it as blocked so the scheduler can't run it until that event or data arrives, and load the context of another ready thread/process, so it can run.
That's how system calls are related to context switches.
Kernel executing in the context of a user or a process means that whenever the kernel does work on behalf of a certain process or user it has to take into consideration that user's/process's context, e.g. the current process/thread/user ID, the current directory, locale, access permissions for various resources (e.g. files), all that stuff, that can be different between different processes/threads/users.
If processes have individual address spaces, the address spaces is also part of the process context. So, when the kernel needs to access memory of a process (to read/write file data or network packets), it has to have access to the process' address space, IOW, it has to be in its context (it doesn't mean, however, that the kernel has to load the full context just to access memory in a specific address space).
Is that helpful?
To select rows whose column value equals a scalar, some_value
, use ==
:
df.loc[df['column_name'] == some_value]
To select rows whose column value is in an iterable, some_values
, use isin
:
df.loc[df['column_name'].isin(some_values)]
Combine multiple conditions with &
:
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
Note the parentheses. Due to Python's operator precedence rules, &
binds more tightly than <=
and >=
. Thus, the parentheses in the last example are necessary. Without the parentheses
df['column_name'] >= A & df['column_name'] <= B
is parsed as
df['column_name'] >= (A & df['column_name']) <= B
which results in a Truth value of a Series is ambiguous error.
To select rows whose column value does not equal some_value
, use !=
:
df.loc[df['column_name'] != some_value]
isin
returns a boolean Series, so to select rows whose value is not in some_values
, negate the boolean Series using ~
:
df.loc[~df['column_name'].isin(some_values)]
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14
print(df.loc[df['A'] == 'foo'])
yields
A B C D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14
If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use isin
:
print(df.loc[df['B'].isin(['one','three'])])
yields
A B C D
0 foo one 0 0
1 bar one 1 2
3 bar three 3 6
6 foo one 6 12
7 foo three 7 14
Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use df.loc
:
df = df.set_index(['B'])
print(df.loc['one'])
yields
A C D
B
one foo 0 0
one bar 1 2
one foo 6 12
or, to include multiple values from the index use df.index.isin
:
df.loc[df.index.isin(['one','two'])]
yields
A C D
B
one foo 0 0
one bar 1 2
two foo 2 4
two foo 4 8
two bar 5 10
one foo 6 12
Best Answer
This is what I have done:
However, I am annoyed that predict chooses a threshold corresponding to 0.4% of true positives (false positives are zero). The ROC curve shows a threshold I like better for my problem where the true positives are approximately 20% (false positive around 4%). I then scan the predict_probabilities to find what probability value corresponds to my favourite ROC point. In my case this probability is 0.21. Then I create my own predict array:
and there you go:
returns what I wanted: