DEV Community

Cover image for Hands-on with Feature Selection Techniques: Hybrid Methods
Younes Charfaoui
Younes Charfaoui

Posted on

Hands-on with Feature Selection Techniques: Hybrid Methods

This article is part 5 of a series centered on hands-on approaches to feature selection techniques. If you’ve missed any of the other posts, I’d recommend checking them out:

Hands-on with Feature Selection Techniques: An Introduction.
Hands-on with Feature Selection Techniques: Filter Methods.
Hands-on with Feature Selection Techniques: Wrapper Methods.
Hands-on with Feature Selection Techniques: Embedded Methods.
Hands-on with Feature Selection Techniques: Hybrid Methods.
Hands-on with Feature Selection Techniques: More Advanced Methods.

Note: As the title of this part in the series suggests, we’ll be combining multiple methods from parts 1–4. I strongly suggest reading those parts before working through this post.


In the previous articles in this series, we saw that there are a variety of different methods for selecting features, ranging from filters to embedded methods.

We also saw that each method has its own advantages and drawbacks. Therefore, machine learning engineers should consider ways to combine the advantage of these methods in order to perform the best features selection process possible. Here, we’ll explore a few hybrid methods that attempt to do just that.

Hybrid Methods: Definition

The definition here is quite simple—rather than using a single approach to select feature subsets as the previous methods do, hybrid methods combine the different approaches to get the best possible feature subset.

The way to combine these approaches is up to the engineer, given that you have a lot of methods in your toolbox. You can, for example, start by performing filter methods by eliminating constant, quasi-constant and duplicated features. Then, in the second step, you could use wrapper methods to select the best feature subset from the previous step. This is just one simple, high-level approach.

We’ll explore a few different hybrid methods in this article, but the idea is to combine weaker methods to end up with a more powerful one.

Hybrid Methods: Advantages

The big advantage that hybrid methods offer is that they take the best advantages from other feature selection methods, and as such, can reduce their disadvantages. This can (and hopefully will) result in:

  • High performance and accuracy
  • Better computational complexity than wrapper methods
  • Models that are more flexible and robust against high dimensional data

For a complete guide of Feature Selection & Feature Engineering in one book, you can check this link.


Hybrid Methods: Process

The process of creating hybrid feature selection methods depends on what you choose to combine. The main priority is to select the methods you’re going to use, then to follow their processes.

In the following sections, you’ll see how we can combine methods, starting with wrapper approaches since they provide the best feature subset most of the time.

Using Filter & Wrapper methods

In part two of our series, which covered filter methods, we saw ranking methods like mutual information and Chi score, which order features independently without involving any learning algorithm. From there, the best features are selected from the ranking list.

The idea here is to use these ranking methods to generate a feature ranking list in the first step, then use the top k features from this list to perform wrapper methods (like SFS or SBS).

With that, we can reduce the feature space of our dataset using these filter-based rangers in order to improve the time complexity of the wrapper methods.

You can use the code from the previous articles (filter methods and wrapper methods and combine them in ways that work for your use case.

Using Embedded & Wrapper Methods

The thought here is to also acquire a ranking of features and then establish wrapper methods to search for the best possible features subset.

If you’ll recall, embedded methods offer a way to establish feature importance. This can be used to select top features and then perform a wrapper methods search.

You can use the code from the previous articles (embedded methods and wrapper methods) and combine them in ways that work for your use case.

Another way to use embedded methods is by using what’s called recursive feature elimination and recursive feature addition, which are illustrated in more detail below.

Recursive Feature Elimination

This is just a fancy name for a simple method that works as follows:

  • Train a model on all the data features. This model can be a tree-based model, lasso, logistic regression, or others that can offer feature importance. Evaluate its performance on a suitable metric of your choice.
  • Derive the feature importance to rank features accordingly.
  • Delete the least important feature and re-train the model on the remaining ones.
  • Use the previous evaluation metric to calculate the performance of the resulting model.
  • Now test whether the evaluation metric decreases by an arbitrary threshold (you should define this as well). If it does, that means this feature is important. Otherwise, you can remove it.
  • Repeat steps 3–5 until all features are removed (i.e. evaluated).

You might be thinking to say that this is just like the step backward features selection that we did in our post on wrapper methods, but it isn’t. The difference is that SBS eliminates all the features first in order to determine which one is the least important. But here, we’re getting this information from the machine learning model’s derived importance, so it removes the feature only once rather than removing all the features at each step.

That's why this approach is faster than pure wrapper methods and better than pure embedded methods. But as a drawback, the main problem with that is we have to use an arbitrary threshold value to decide whether to keep a feature or not.

As a consequence, the smaller this threshold value, the more features will be included in the subset, and vice versa.

Here’s some sample code that works with a RandomForestClassifier to select the best features:

from sklearn.feature_selection import RFECV

# use any other model you want here.
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=411)

# build the RFE with CV option.
rfe = RFECV(model, min_features_to_select = 3, step = 1 , cv=5, scoring='accuracy')

# fit the RFE to our data.
selection  = rfe.fit(x_train, y_train)

# print the selected features.
print(x_train.columns[selection.support_]) 
Enter fullscreen mode Exit fullscreen mode

We’re also using RFECV to do recursive feature elimination with cross-validation. Here’s a description of the different parameters:

  • min_feature_to_select: As its name suggests, this parameter sets the minimum number of features to select.
  • step: How many features we remove at each step.
  • cv: An integer, generator, or an iterable that describes the cross-validation splitting strategy.
  • scoring: The evaluation metric we use.

You also can use the different code snippets from the previous articles to create your own implementation of RFE. Here’s an example for your reference:

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score 

# array to hold the feature to be removed.
features_to_remove = []

# set this value according to you.
threshold=0.003

# create your prefered model and  fit it to the training data.
model_all_features = RandomForestClassifier(n_estimators=221)
model_all_features.fit(x_train, y_train)

# get the first score of all the features. (you can use your metric)
y_pred_test = model_all_features.predict(x_test)
auc_score_all = roc_auc_score(y_test, y_pred_test)

# loop over all the feature to do recursive feature elimination.
for feature in x_train.columns:

    model = RandomForestClassifier(n_estimators=221)

    # delete the current feature.
    x_train_rfe = x_train.drop(features_to_remove + [feature], axis=1)
    x_test_rfe = x_test.drop(features_to_remove + [feature], axis=1)

    # fit model with all variables minus the removed features and the feature to be evaluated.
    model.fit(x_train_rfe, y_train)
    y_pred_test = model.predict(x_test_rfe)
    auc_score_int = roc_auc_score(y_test, y_pred_test)

    # determine the drop in the roc-auc
    diff_auc = auc_score_all - auc_score_int

    # compare the drop in roc-auc with the threshold
    if diff_auc < threshold:
        # if the drop in the roc is small and we remove the
        # feature, we need to set the new roc to the one based on
        # the remaining features
        auc_score_all = auc_score_int

        # and append the feature to remove to the list
        features_to_remove.append(feature)

# print the features that need removing.
print(features_to_remove)  
features_to_keep = [x for x in x_train.columns if x not in features_to_remove]

# print the features to keep.
print('total features to keep: ', len(features_to_keep))
Enter fullscreen mode Exit fullscreen mode

Recursive Feature Addition

With the previous method, we started from all the features and removed one at the time. Now it’s the opposite case—we start with no features and add one feature at the time. Here are the steps:

  • Train a model on all the data and derive the feature importance to rank it accordingly. This model can be a tree-based model, lasso, logistic regression, or others that can offer feature importance.
  • From that initial model, create another with the most important feature and evaluate it with an evaluation metric of your choice.
  • Add another important feature and use it to re-train the model, along with any feature from the previous step.
  • Use the previous evaluation metric to calculate the performance of the resulting model.
  • Now test whether the evaluation metric increases by an arbitrarily-set threshold (you should define this as well). If that’s the case, it means that this feature is important; otherwise, we can remove it.
  • Repeat steps 3–5 until all features are added (i.e. evaluated).

The difference between this method and step forward feature selection is similar to what we discussed in RFE—it doesn’t look for all features first in order to determine which ones to add, so it’s faster than wrapper methods.

sklearn doesn’t provide a RecursiveFeatureAddition algorithm, so you’ll need to implement it on your own:

# you can use any other algorithm.
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score 

# array to hold the feature to be keept.
features_to_keep = [x_train.columns[0]]

# set this value according to you.
threshold = 0.002

# create your prefered model and  fit it to the training data.
model_one_feature = RandomForestClassifier(n_estimators=332)
model_one_feature.fit(x_train[[x_train.columns[0]]], y_train)

# evaluate against your metric.
y_pred_test = model_one_feature.predict(x_test[[x_train.columns[0]]])
auc_score_all = roc_auc_score(y_test, y_pred_test)

# start iterating from the feature.
for feature in x_train.columns[1:]:

    model = RandomForestClassifier(n_estimators=332)

    # fit model with  the selected features and the feature to be evaluated
    model.fit(x_train[features_to_keep + [feature]], y_train)
    y_pred_test = model.predict(x_test[features_to_keep + [feature]])
    auc_score_int = roc_auc_score(y_test, y_pred_test)

    # determine the drop in the roc-auc
    diff_auc = auc_score_int - auc_score_all

    # compare the drop in roc-auc with the threshold
    if diff_auc >= threshold:

        # if the increase in the roc is bigger than the threshold
        # we keep the feature and re-adjust the roc-auc to the new value
        # considering the added feature
        auc_score_all = auc_score_int
        features_to_keep.append(feature)

# print the feature to keep.
print(features_to_keep)
Enter fullscreen mode Exit fullscreen mode

There’s also this repository that provides this functionality, which you may want to check out.

Conclusion

Hybrid methods offer a great way to combine weak feature selection methods to obtain more robust and powerful ways to select variables. The described methods in this article are just examples of what you can combine, so feel free to test other combinations and see what gives you better results for your use case.

That’s the beauty of machine learning—it’s not always guaranteed that you will find the optimal solution just by copying and pasting code — it’s about testing and seeing what best fits your problem.

To see a full example of using hybrid feature selection methods, check out this GitHub repository.

Discussion (0)