mrmr¶
- mrmr(relevance_table: pandas.core.frame.DataFrame, regressors: pandas.core.frame.DataFrame, top_k: int, fast_redundancy: bool = False, relevance_aggregation_mode: str = AggregationMode.mean, redundancy_aggregation_mode: str = AggregationMode.mean, atol: float = 1e-10) List[str] [source]¶
Maximum Relevance and Minimum Redundancy feature selection method.
Here relevance for each regressor is calculated as the per-segment aggregation of the relevance values in relevance_table. The redundancy term for the regressor is calculated as a mean absolute correlation between this regressor and other ones. The correlation between the two regressors is an aggregated pairwise correlation for the regressors values in each segment.
- Parameters
relevance_table (pandas.core.frame.DataFrame) – dataframe of shape n_segment x n_exog_series with relevance table, where
relevance_table[i][j]
contains relevance of j-thdf_exog
series to i-th df seriesregressors (pandas.core.frame.DataFrame) – dataframe with regressors in etna format
top_k (int) – num of regressors to select; if there are not enough regressors, then all will be selected
fast_redundancy (bool) –
True: compute redundancy only inside the the segments, time complexity \(O(top\_k * n\_segments * n\_features * history\_len)\)
False: compute redundancy for all the pairs of segments, time complexity \(O(top\_k * n\_segments^2 * n\_features * history\_len)\)
relevance_aggregation_mode (str) – the method for relevance values per-segment aggregation
redundancy_aggregation_mode (str) – the method for redundancy values per-segment aggregation
atol (float) – the absolute tolerance to compare the float values
- Returns
selected_features – list of
top_k
selected regressors, sorted by their importance- Return type
List[str]