duplicate_data¶
- duplicate_data(df: pandas.core.frame.DataFrame, segments: Sequence[str], format: str = DataFrameFormat.wide) pandas.core.frame.DataFrame [source]¶
Duplicate dataframe for all the segments.
- Parameters
df (pandas.core.frame.DataFrame) – dataframe to duplicate, there should be column “timestamp”
segments (Sequence[str]) – list of segments for making duplication
format (str) – represent the result in TSDataset inner format (wide) or in flatten format (long)
- Returns
result – result of duplication for all the segments
- Return type
pd.DataFrame
- Raises
ValueError: – if segments list is empty
ValueError: – if incorrect strategy is given
ValueError: – if dataframe doesn’t contain “timestamp” column
Examples
>>> from etna.datasets import generate_const_df >>> from etna.datasets import duplicate_data >>> from etna.datasets import TSDataset >>> df = generate_const_df( ... periods=50, start_time="2020-03-10", ... n_segments=2, scale=1 ... ) >>> timestamp = pd.date_range("2020-03-10", periods=100, freq="D") >>> is_friday_13 = (timestamp.weekday == 4) & (timestamp.day == 13) >>> df_exog_raw = pd.DataFrame({"timestamp": timestamp, "is_friday_13": is_friday_13}) >>> df_exog = duplicate_data(df_exog_raw, segments=["segment_0", "segment_1"], format="wide") >>> df_ts_format = TSDataset.to_dataset(df) >>> ts = TSDataset(df=df_ts_format, df_exog=df_exog, freq="D", known_future="all") >>> ts.head() segment segment_0 segment_1 feature is_friday_13 target is_friday_13 target timestamp 2020-03-10 False 1.00 False 1.00 2020-03-11 False 1.00 False 1.00 2020-03-12 False 1.00 False 1.00 2020-03-13 True 1.00 True 1.00 2020-03-14 False 1.00 False 1.00