针对一个新数据集可能需要用的一些简单操作(pandas, seaborn)
定义
1
| df = pd.DataFrame(values, index, columns=['A', 'B'])
|
查看
1 2 3 4 5
| df.dtypes
df.isnull().sum()
df.loc[index, columns]
|
删除冗余
1
| df.drop_duplicates(keep='first')
|
时间
1 2 3 4
| pd.to_datetime(df['time'])
dates = pd.date_range("1 1 2016", periods=24*4, freq="15min")
|
记录残缺
1 2 3
| import numpy as np tag = np.isnan(df.values) tag = tag.astype('float32')
|
插值
1
| df.interpolate(method='linear', limit_direction='forward', axis=0, inplace=True)
|
简单可视化
1 2 3 4
| import seaborn as sns sns.set_theme(style="whitegrid") show = df.loc['2017-01-01 14:00:00':'2017-01-02 14:00:00', stations[:3]] sns.lineplot(data=show, palette="tab10", linewidth=2.5)
|