To summarize the contents of other (already good!) answers, isinstance
caters for inheritance (an instance of a derived class is an instance of a base class, too), while checking for equality of type
does not (it demands identity of types and rejects instances of subtypes, AKA subclasses).
Normally, in Python, you want your code to support inheritance, of course (since inheritance is so handy, it would be bad to stop code using yours from using it!), so isinstance
is less bad than checking identity of type
s because it seamlessly supports inheritance.
It's not that isinstance
is good, mind you—it's just less bad than checking equality of types. The normal, Pythonic, preferred solution is almost invariably "duck typing": try using the argument as if it was of a certain desired type, do it in a try
/except
statement catching all exceptions that could arise if the argument was not in fact of that type (or any other type nicely duck-mimicking it;-), and in the except
clause, try something else (using the argument "as if" it was of some other type).
basestring
is, however, quite a special case—a builtin type that exists only to let you use isinstance
(both str
and unicode
subclass basestring
). Strings are sequences (you could loop over them, index them, slice them, ...), but you generally want to treat them as "scalar" types—it's somewhat incovenient (but a reasonably frequent use case) to treat all kinds of strings (and maybe other scalar types, i.e., ones you can't loop on) one way, all containers (lists, sets, dicts, ...) in another way, and basestring
plus isinstance
helps you do that—the overall structure of this idiom is something like:
if isinstance(x, basestring)
return treatasscalar(x)
try:
return treatasiter(iter(x))
except TypeError:
return treatasscalar(x)
You could say that basestring
is an Abstract Base Class ("ABC")—it offers no concrete functionality to subclasses, but rather exists as a "marker", mainly for use with isinstance
. The concept is obviously a growing one in Python, since PEP 3119, which introduces a generalization of it, was accepted and has been implemented starting with Python 2.6 and 3.0.
The PEP makes it clear that, while ABCs can often substitute for duck typing, there is generally no big pressure to do that (see here). ABCs as implemented in recent Python versions do however offer extra goodies: isinstance
(and issubclass
) can now mean more than just "[an instance of] a derived class" (in particular, any class can be "registered" with an ABC so that it will show as a subclass, and its instances as instances of the ABC); and ABCs can also offer extra convenience to actual subclasses in a very natural way via Template Method design pattern applications (see here and here [[part II]] for more on the TM DP, in general and specifically in Python, independent of ABCs).
For the underlying mechanics of ABC support as offered in Python 2.6, see here; for their 3.1 version, very similar, see here. In both versions, standard library module collections (that's the 3.1 version—for the very similar 2.6 version, see here) offers several useful ABCs.
For the purpose of this answer, the key thing to retain about ABCs (beyond an arguably more natural placement for TM DP functionality, compared to the classic Python alternative of mixin classes such as UserDict.DictMixin) is that they make isinstance
(and issubclass
) much more attractive and pervasive (in Python 2.6 and going forward) than they used to be (in 2.5 and before), and therefore, by contrast, make checking type equality an even worse practice in recent Python versions than it already used to be.
I know it's been said already, but I'd highly recommend the requests
Python package.
If you've used languages other than python, you're probably thinking urllib
and urllib2
are easy to use, not much code, and highly capable, that's how I used to think. But the requests
package is so unbelievably useful and short that everyone should be using it.
First, it supports a fully restful API, and is as easy as:
import requests
resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')
Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:
userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)
Plus it even has a built in JSON decoder (again, I know json.loads()
isn't a lot more to write, but this sure is convenient):
resp.json()
Or if your response data is just text, use:
resp.text
This is just the tip of the iceberg. This is the list of features from the requests site:
- International Domains and URLs
- Keep-Alive & Connection Pooling
- Sessions with Cookie Persistence
- Browser-style SSL Verification
- Basic/Digest Authentication
- Elegant Key/Value Cookies
- Automatic Decompression
- Unicode Response Bodies
- Multipart File Uploads
- Connection Timeouts
- .netrc support
- List item
- Python 2.7, 3.6—3.9
- Thread-safe.
Best Answer
Is this the correct process for feature selection? This is ONE of the many ways of feature selection. Recursive feature elimination is an automated approach to this, others are listed in scikit.learn documentation. They have different pros and cons, and usually feature selection is best achieved by also involving common sense and trying models with different features. RFE is a quick way of selecting a good set of features, but does not necessarily give you the ultimately best. By the way, you don't need to build your StratifiedKFold separately. If you just set the
cv
parameter tocv=3
, bothRFECV
andGridSearchCV
will automatically use StratifiedKFold if the y values are binary or multiclass, which I'm assuming is most likely the case since you are usingLogisticRegression
. You can also combineinto
Is this the correct process for hyper-parameter selection? GridSearchCV is basically an automated way of systematically trying a whole set of combinations of model parameters and picking the best among these according to some performance metric. It's a good way of finding well-suited parameters, yes.
Is this the correct process for fitting? Yes, this is a valid way of fitting the model. When you call
grid.fit(X_new, y)
, it makes a grid ofLogisticRegression
estimators (each with a set of parameters that are tried) and fits each of them. It will keep the one with the best performance undergrid.best_estimator_
, the parameters of this estimator ingrid.best_params_
and the performance score for this estimator undergrid.best_score_
. It will return itself, and not the best estimator. Remember that with incoming new X values that you will use the model to predict on, you have to apply the transform with the fitted RFECV model. So, you can actually add this step to the pipeline as well.Where can I find the fitted coefficients for the selected features? The
grid.best_estimator_
attribute is aLogisticRegression
object with all this information, sogrid.best_estimator_.coef_
has all the coefficients (andgrid.best_estimator_.intercept_
is the intercept). Note that to be able to get thisgrid.best_estimator_
, therefit
parameter onGridSearchCV
needs to be set toTrue
, but this is the default anyway.