Welcome!
rare event encoding for categorical feature in machine learning in pandas dataframe rare event encoding for categorical feature in machine learning in pandas dataframe
If categorical features has too many values, it will generate too many features after encoding, such as one-hot encoding
2021-06-23
handles feature order in training and online production stage to avoid inconsistent error handles feature order in training and online production stage to avoid inconsistent error
In applying machine learning models in production stage, like lightGBM model or any models.While we all know the order o
2021-06-18
some handy functions to group continous variables and missing value imputation in dataframe some handy functions to group continous variables and missing value imputation in dataframe
Following example shows how to group age variable into groups,and some simple missing value imputaiton proecdures. There
2021-06-15
explode and expand rows to multiple rows or columns to multiple columns using pandas dataframe explode and expand rows to multiple rows or columns to multiple columns using pandas dataframe
import pandas as pd generate some example some datadata = [ [['python','C'],'John'],[[&#x
2021-06-14
aggregate features from different rows into one row in pandas dataframe aggregate features from different rows into one row in pandas dataframe
In many use cases, different features of the same event are stored in a table by multiple rows.multiple columns will ind
2021-06-12
time series feature engineering using tsfresh, training vs test time series feature engineering using tsfresh, training vs test
During the test stage, i.e., once the model is on production, for any new data,tsfresh feature generation does not depen
2021-06-10
word tokenization and sentence tokenization in python using NLTK package word tokenization and sentence tokenization in python using NLTK package
What is Tokenization?Tokenization is the process by which a large quantity of text is divided into smaller parts called
2021-06-09
missing value or null value processing in pandas dataframe missing value or null value processing in pandas dataframe
obtain null or missing values of a dataframeSuppose the dataframe has the following formats, with 10 rows and 5 clomns:
2021-06-08
how to convert timestamp column of pandas dataframe into hour and day features using transformer how to convert timestamp column of pandas dataframe into hour and day features using transformer
from sklearn.base import BaseEstimator, TransformerMixinimport pandas as pdclass dayandhour_Transformer(BaseEstimator, T
2021-06-04
10 / 10