针对一个新数据集可能需要用的一些简单操作(pandas, seaborn)
定义
| 1
 | df = pd.DataFrame(values, index, columns=['A', 'B'])	
 | 
查看
| 12
 3
 4
 5
 
 | df.dtypes	
 df.isnull().sum()
 
 df.loc[index, columns]
 
 | 
删除冗余
| 1
 | df.drop_duplicates(keep='first') 
 | 
时间
| 12
 3
 4
 
 | pd.to_datetime(df['time']) 
 dates = pd.date_range("1 1 2016", periods=24*4, freq="15min")
 
 
 | 
记录残缺
| 12
 3
 
 | import numpy as nptag = np.isnan(df.values)
 tag = tag.astype('float32')
 
 | 
插值
| 1
 | df.interpolate(method='linear', limit_direction='forward', axis=0, inplace=True) 
 | 
简单可视化
| 12
 3
 4
 
 | import seaborn as snssns.set_theme(style="whitegrid")
 show = df.loc['2017-01-01 14:00:00':'2017-01-02 14:00:00', stations[:3]]
 sns.lineplot(data=show, palette="tab10", linewidth=2.5)
 
 | 
