当前位置: 首页 > news >正文

企业网站建设公司那家好网址网域ip地址查询

企业网站建设公司那家好,网址网域ip地址查询,简单网站开发实例,哪个网站做简历好Kaggle(3):Predict CO2 Emissions in Rwanda 1. Introduction 在本次竞赛中,我们的任务是预测非洲 497 个不同地点 2022 年的二氧化碳排放量。 在训练数据中,我们有 2019-2021 年的二氧化碳排放量 本笔记本的内容&am…

Kaggle(3):Predict CO2 Emissions in Rwanda

在这里插入图片描述

1. Introduction

在本次竞赛中,我们的任务是预测非洲 497 个不同地点 2022 年的二氧化碳排放量。 在训练数据中,我们有 2019-2021 年的二氧化碳排放量

本笔记本的内容:

1.通过平滑消除2020年一次性的新冠疫情趋势。 或者,用 2019 年和 2021 年的平均值来估算 2020 年也是一种有效的方法,但此处未实施
2. 观察靠近最大排放位置的位置也具有较高的排放水平。 执行 K-Means 聚类以根据数据点的位置对数据点进行聚类。 这允许具有相似排放的数据点被分组在一起
3. 以 2019 年和 2020 年为训练数据,用一些集成模型进行实验,以测试其在 2021 年数据上的 CV

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import math
from tqdm import tqdm
from sklearn.preprocessing import SplineTransformer
from holidays import CountryHoliday
from tqdm.notebook import tqdm
from typing import Listfrom category_encoders import OneHotEncoder, MEstimateEncoder, GLMMEncoder, OrdinalEncoder
from sklearn.model_selection import RepeatedStratifiedKFold, StratifiedKFold, KFold, RepeatedKFold, TimeSeriesSplit, train_test_split, cross_val_score
from sklearn.ensemble import ExtraTreesRegressor, RandomForestRegressor, GradientBoostingRegressor
from sklearn.ensemble import HistGradientBoostingRegressor, VotingRegressor, StackingRegressor
from sklearn.svm import SVR, LinearSVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, SGDRegressor, LogisticRegression
from sklearn.linear_model import PassiveAggressiveRegressor, ARDRegression
from sklearn.linear_model import TheilSenRegressor, RANSACRegressor, HuberRegressor
from sklearn.cross_decomposition import PLSRegression
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, roc_auc_score, roc_curve
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.preprocessing import FunctionTransformer, StandardScaler, MinMaxScaler, LabelEncoder, SplineTransformer
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer, KNNImputer
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import squareform
from sklearn.feature_selection import RFECV
from sklearn.decomposition import PCA
from xgboost import XGBRegressor, XGBClassifier
import lightgbm as lgbm
from lightgbm import LGBMRegressor, LGBMClassifier
from lightgbm import log_evaluation, early_stopping, record_evaluation
from catboost import CatBoostRegressor, CatBoostClassifier, Pool
from sklearn import set_config
from sklearn.multioutput import MultiOutputClassifier
from datetime import datetime, timedelta
import gcimport warnings
warnings.filterwarnings('ignore')set_config(transform_output = 'pandas')pal = sns.color_palette('viridis')pd.set_option('display.max_rows', 100)
M = 1.07

2. Examine Data

2.1

在这里,我们试图平滑 2020 年的数据以消除新冠趋势

1.使用平滑导入的数据集
2. 使用 2019 年和 2021 年值的平均值 [https://www.kaggle.com/code/kacperrabczewski/rwanda-co2-step-by-step-guide]

extrp = pd.read_csv("./data/PS3E20_train_covid_updated")
extrp = extrp[(extrp["year"] == 2020)]
extrp
ID_LAT_LON_YEAR_WEEKlatitudelongitudeyearweek_noSulphurDioxide_SO2_column_number_densitySulphurDioxide_SO2_column_number_density_amfSulphurDioxide_SO2_slant_column_number_densitySulphurDioxide_cloud_fractionSulphurDioxide_sensor_azimuth_angle...Cloud_cloud_top_heightCloud_cloud_base_pressureCloud_cloud_base_heightCloud_cloud_optical_depthCloud_surface_albedoCloud_sensor_azimuth_angleCloud_sensor_zenith_angleCloud_solar_azimuth_angleCloud_solar_zenith_angleemission
53ID_-0.510_29.290_2020_00-0.51029.290202000.0000640.9702900.0000730.163462-100.602665...5388.60205460747.0635304638.6021766.2877090.283116-13.29137533.679610-140.30917330.0534473.753601
54ID_-0.510_29.290_2020_01-0.51029.29020201NaNNaNNaNNaNNaN...6361.48875453750.1741625361.48875419.1672690.317732-30.47497248.119754-139.43777730.3919574.051966
55ID_-0.510_29.290_2020_02-0.51029.29020202-0.0003610.668526-0.0002310.08619973.269733...5320.71590261012.6250004320.71586148.2037330.265554-12.46115035.809728-137.85444929.1004154.154116
56ID_-0.510_29.290_2020_03-0.51029.290202030.0005970.5537290.0003310.14925773.522247...6219.31929455704.7829985219.31926912.8093500.26703016.38107935.836898-139.01775426.2655614.165751
57ID_-0.510_29.290_2020_04-0.51029.290202040.0001071.0452380.0001120.22428377.588455...6348.56000654829.3317765348.56001435.2839810.268983-12.19365047.092968-134.47427927.0613214.233635
..................................................................
78965ID_-3.299_30.301_2020_48-3.29930.3012020480.0001141.1239350.0001250.14988574.376836...6092.32372257479.3977765169.18514215.3312960.26160816.30962539.924967-132.25870030.39360426.929207
78966ID_-3.299_30.301_2020_49-3.29930.3012020490.0000510.6179270.0000310.21313572.364738...5992.05300657739.3001554992.05300627.2140850.276616-0.28765645.624810-134.46041830.91174126.606790
78967ID_-3.299_30.301_2020_50-3.29930.301202050-0.0002350.633192-0.0001490.257000-99.141518...6104.23124156954.5172315181.57021326.2703650.260574-50.41124137.645974-132.19316132.51668527.256273
78968ID_-3.299_30.301_2020_51-3.29930.301202051NaNNaNNaNNaNNaN...4855.53758564839.9557183858.18745314.5197890.24848430.84092239.529722-138.96401628.57409125.591976
78969ID_-3.299_30.301_2020_52-3.29930.3012020520.0000251.1030250.0000280.265622-99.811790...5345.67946462098.7165464345.67939713.0821620.283677-13.00295738.243055-136.66095829.58405825.559870

26341 rows × 76 columns

DATA_DIR = "./data/"
train = pd.read_csv(DATA_DIR + "train.csv")
test = pd.read_csv(DATA_DIR + "test.csv")def add_features(df):#df["week"] = df["year"].astype(str) + "-" + df["week_no"].astype(str)#df["date"] = df["week"].apply(lambda x: get_date_from_week_string(x))#df = df.drop(columns = ["week"])df["week"] = (df["year"] - 2019) * 53 + df["week_no"]#df["lat_long"] = df["latitude"].astype(str) + "#" + df["longitude"].astype(str)return dftrain = add_features(train)
test = add_features(test)

2.2

对预测进行一些有风险的后处理。

假设数据点的 MAX = max(2019 年排放量、2020 年排放量、2021 年排放量)。

如果 2021 年排放量 > 2019 年排放量,我们将 MAX * 1.07 分配给预测,否则我们只分配 MAX。 参考:https://www.kaggle.com/competitions/playground-series-s3e20/discussion/430152

vals = set()
for x in train[["latitude", "longitude"]].values:vals.add(tuple(x))vals = list(vals)
zeros = []for lat, long in vals:subset = train[(train["latitude"] == lat) & (train["longitude"] == long)]em_vals = subset["emission"].valuesif all(x == 0 for x in em_vals):zeros.append([lat, long])
test["2021_emission"] = test["week_no"]
test["2020_emission"] = test["week_no"]
test["2019_emission"] = test["week_no"]for lat, long in vals:test.loc[(test.latitude == lat) & (test.longitude == long), "2021_emission"] = train.loc[(train.latitude == lat) & (train.longitude == long) & (train.year == 2021) & (train.week_no <= 48), "emission"].valuestest.loc[(test.latitude == lat) & (test.longitude == long), "2020_emission"] = train.loc[(train.latitude == lat) & (train.longitude == long) & (train.year == 2020) & (train.week_no <= 48), "emission"].valuestest.loc[(test.latitude == lat) & (test.longitude == long), "2019_emission"] = train.loc[(train.latitude == lat) & (train.longitude == long) & (train.year == 2019) & (train.week_no <= 48), "emission"].values#print(train.loc[(train.latitude == lat) & (train.longitude == long) & (train.year == 2021), "emission"])test["ratio"] = (test["2021_emission"] / test["2019_emission"]).replace(np.nan, 0)
test["pos_ratio"] = test["ratio"].apply(lambda x: max(x, 1))
test["pos_ratio"] = test["pos_ratio"].apply(lambda x: 1.07 if x > 1 else x)
test["max"] = test[["2019_emission", "2020_emission", "2021_emission"]].max(axis=1)
test["lazy_pred"] = test["max"] * test["pos_ratio"]
test = test.drop(columns = ["ratio", "pos_ratio", "max", "2019_emission", "2020_emission", "2021_emission"])
train.loc[train.year == 2020, "emission"] = extrp
train
ID_LAT_LON_YEAR_WEEKlatitudelongitudeyearweek_noSulphurDioxide_SO2_column_number_densitySulphurDioxide_SO2_column_number_density_amfSulphurDioxide_SO2_slant_column_number_densitySulphurDioxide_cloud_fractionSulphurDioxide_sensor_azimuth_angle...Cloud_cloud_base_pressureCloud_cloud_base_heightCloud_cloud_optical_depthCloud_surface_albedoCloud_sensor_azimuth_angleCloud_sensor_zenith_angleCloud_solar_azimuth_angleCloud_solar_zenith_angleemissionweek
0ID_-0.510_29.290_2019_00-0.51029.29020190-0.0001080.603019-0.0000650.255668-98.593887...61085.8095702615.12048315.5685330.272292-12.62898635.632416-138.78642330.7521403.7509940
1ID_-0.510_29.290_2019_01-0.51029.290201910.0000210.7282140.0000140.13098816.592861...66969.4787353174.5724248.6906010.25683030.35937539.557633-145.18393027.2517794.0251761
2ID_-0.510_29.290_2019_02-0.51029.290201920.0005140.7481990.0003850.11001872.795837...60068.8944483516.28266921.1034100.25110115.37788330.401823-142.51954526.1932964.2313812
3ID_-0.510_29.290_2019_03-0.51029.29020193NaNNaNNaNNaNNaN...51064.5473394180.97332215.3868990.262043-11.29339924.380357-132.66582828.8291554.3052863
4ID_-0.510_29.290_2019_04-0.51029.29020194-0.0000790.676296-0.0000480.1211644.121269...63751.1257813355.7101078.1146940.23584738.53226337.392979-141.50980522.2046124.3473174
..................................................................
79018ID_-3.299_30.301_2021_48-3.29930.3012021480.0002841.1956430.0003400.19131372.820518...60657.1019134590.87950420.2459540.304797-35.14036840.113533-129.93550832.09521429.404171154
79019ID_-3.299_30.301_2021_49-3.29930.3012021490.0000831.1308680.0000630.177222-12.856753...60168.1915284659.1303786.1046100.3140154.66705847.528435-134.25287130.77146929.186497155
79020ID_-3.299_30.301_2021_50-3.29930.301202150NaNNaNNaNNaNNaN...56596.0272095222.64682314.8178850.288058-0.34092235.328098-134.73172330.71616629.131205156
79021ID_-3.299_30.301_2021_51-3.29930.301202151-0.0000340.879397-0.0000280.184209-100.344827...46533.3481946946.85802232.5947680.2740478.42769948.295652-139.44784929.11286828.125792157
79022ID_-3.299_30.301_2021_52-3.29930.301202152-0.0000910.871951-0.0000790.00000076.825638...47771.6818876553.29501819.4640320.226276-12.80852847.923441-136.29998430.24638727.239302158

79023 rows × 77 columns

test
ID_LAT_LON_YEAR_WEEKlatitudelongitudeyearweek_noSulphurDioxide_SO2_column_number_densitySulphurDioxide_SO2_column_number_density_amfSulphurDioxide_SO2_slant_column_number_densitySulphurDioxide_cloud_fractionSulphurDioxide_sensor_azimuth_angle...Cloud_cloud_base_pressureCloud_cloud_base_heightCloud_cloud_optical_depthCloud_surface_albedoCloud_sensor_azimuth_angleCloud_sensor_zenith_angleCloud_solar_azimuth_angleCloud_solar_zenith_angleweeklazy_pred
0ID_-0.510_29.290_2022_00-0.51029.29020220NaNNaNNaNNaNNaN...41047.9375007472.3134777.9356170.240773-100.11379233.697044-133.04754633.7795831593.753601
1ID_-0.510_29.290_2022_01-0.51029.290202210.0004560.6911640.0003160.00000076.239196...54915.7085795476.14716111.4484370.293119-30.51031942.402593-138.63282231.0123801604.051966
2ID_-0.510_29.290_2022_02-0.51029.290202220.0001610.6051070.0001060.079870-42.055341...39006.0937507984.79570310.7531790.26713039.08736145.936480-144.78498826.7433611614.231381
3ID_-0.510_29.290_2022_03-0.51029.290202230.0003500.6969170.0002430.20102872.169566...57646.3683685014.72411511.7645560.304679-24.46512742.140419-135.02789129.6047741624.305286
4ID_-0.510_29.290_2022_04-0.51029.29020224-0.0003170.580527-0.0001840.20435276.190865...52896.5418735849.28039413.0653170.284221-12.90785030.122641-135.50011926.2768071634.347317
..................................................................
24348ID_-3.299_30.301_2022_44-3.29930.301202244-0.0006180.745549-0.0004610.23449272.306198...55483.4599805260.12005630.3985080.180046-25.52858845.284576-116.52141229.99256220330.327420
24349ID_-3.299_30.301_2022_45-3.29930.301202245NaNNaNNaNNaNNaN...53589.9173835678.95152119.2238440.177833-13.38000543.770351-122.40575929.01797520430.811167
24350ID_-3.299_30.301_2022_46-3.29930.301202246NaNNaNNaNNaNNaN...62646.7613404336.28249113.8011940.219471-5.07206533.226455-124.53063930.18747220531.162886
24351ID_-3.299_30.301_2022_47-3.29930.3012022470.0000711.0038050.0000770.20507774.327427...50728.3139916188.57846427.8874890.247275-0.66871445.885617-129.00679730.42745520631.439606
24352ID_-3.299_30.301_2022_48-3.29930.301202248NaNNaNNaNNaNNaN...46260.0390926777.86381923.7712690.239684-40.82613930.680056-124.89547334.45772020729.944366

24353 rows × 77 columns

Insights

训练数据集有 79023 个观测值,测试数据集有 24353 个观测值。 正如我们所观察到的,某些列具有空值

3. EDA and Data Distribution

def plot_emission(train):plt.figure(figsize=(15, 6))sns.lineplot(data=train, x="week", y="emission", label="Emission", alpha=0.7, color='blue')plt.xlabel('Week')plt.ylabel('Emission')plt.title('Emission over time')plt.legend()plt.tight_layout()plt.show()plot_emission(train)

在这里插入图片描述

sns.histplot(train["emission"])

在这里插入图片描述

4. Data Transformation

print(len(vals))
497

Insights

有 497 个独特的经纬度组合

4.1

大多数特征只是噪音,我们可以将它们删除。(Reference: multiple discussion posts)

#train = train.drop(columns = ["ID_LAT_LON_YEAR_WEEK", "lat_long"])
#test = test.drop(columns = ["ID_LAT_LON_YEAR_WEEK", "lat_long"])train = train[["latitude", "longitude", "year", "week_no", "emission"]]
test = test[["latitude", "longitude", "year", "week_no", "lazy_pred"]]

4.2

K Means 聚类 + 到最高排放量的距离

#https://www.kaggle.com/code/lucasboesen/simple-catboost-6-features-cv-21-7
from sklearn.cluster import KMeans
import haversine as hskm_train = train.groupby(by=['latitude', 'longitude'], as_index=False)['emission'].mean()
model = KMeans(n_clusters = 7, random_state = 42)
model.fit(km_train)
yhat_train = model.predict(km_train)
km_train['kmeans_group'] = yhat_train""" Own Groups """
# Some locations have emission == 0
km_train['is_zero'] = km_train['emission'].apply(lambda x: 'no_emission_recorded' if x==0 else 'emission_recorded')# Distance to the highest emission location
max_lat_lon_emission = km_train.loc[km_train['emission']==km_train['emission'].max(), ['latitude', 'longitude']]
km_train['distance_to_max_emission'] = km_train.apply(lambda x: hs.haversine((x['latitude'], x['longitude']), (max_lat_lon_emission['latitude'].values[0], max_lat_lon_emission['longitude'].values[0])), axis=1)train = train.merge(km_train[['latitude', 'longitude', 'kmeans_group', 'distance_to_max_emission']], on=['latitude', 'longitude'])
test = test.merge(km_train[['latitude', 'longitude', 'kmeans_group', 'distance_to_max_emission']], on=['latitude', 'longitude'])
#train = train.drop(columns = ["latitude", "longitude"])
#test = test.drop(columns = ["latitude", "longitude"])
train
latitudelongitudeyearweek_noemissionkmeans_groupdistance_to_max_emission
0-0.51029.290201903.7509946207.849890
1-0.51029.290201914.0251766207.849890
2-0.51029.290201924.2313816207.849890
3-0.51029.290201934.3052866207.849890
4-0.51029.290201944.3473176207.849890
........................
79018-3.29930.30120214829.4041716157.630611
79019-3.29930.30120214929.1864976157.630611
79020-3.29930.30120215029.1312056157.630611
79021-3.29930.30120215128.1257926157.630611
79022-3.29930.30120215227.2393026157.630611

79023 rows × 7 columns

test
latitudelongitudeyearweek_nolazy_predkmeans_groupdistance_to_max_emission
0-0.51029.290202203.7536016207.849890
1-0.51029.290202214.0519666207.849890
2-0.51029.290202224.2313816207.849890
3-0.51029.290202234.3052866207.849890
4-0.51029.290202244.3473176207.849890
........................
24348-3.29930.30120224430.3274206157.630611
24349-3.29930.30120224530.8111676157.630611
24350-3.29930.30120224631.1628866157.630611
24351-3.29930.30120224731.4396066157.630611
24352-3.29930.30120224829.9443666157.630611

24353 rows × 7 columns

cat_params = {'n_estimators': 799, 'learning_rate': 0.09180872710592884,'depth': 8, 'l2_leaf_reg': 1.0242996861886846, 'subsample': 0.38227256755249117, 'colsample_bylevel': 0.7183481537623551,'random_state': 42,"silent": True,
}lgb_params = {'n_estimators': 835, 'max_depth': 12, 'reg_alpha': 3.849279869880706, 'reg_lambda': 0.6840221712299135, 'min_child_samples': 10, 'subsample': 0.6810493885301987, 'learning_rate': 0.0916362259866008, 'colsample_bytree': 0.3133780298325982, 'colsample_bynode': 0.7966712089198238,"random_state": 42,
}xgb_params = {"random_state": 42,
}rf_params = {'n_estimators': 263, 'max_depth': 41, 'min_samples_split': 10, 'min_samples_leaf': 3,"random_state": 42,"verbose": 0
}et_params = {"random_state": 42,"verbose": 0
}

5. Validate Performance on 2021 data

def rmse(a, b):return mean_squared_error(a, b, squared=False)
validation = train[train.year == 2021]
clusters = train["kmeans_group"].unique()for i in range(len(clusters)):cluster = clusters[i]print("==============================================")print(f" Cluster {cluster} ")train_c = train[train["kmeans_group"] == cluster]X_train = train_c[train_c.year < 2021].drop(columns = ["emission", "kmeans_group"])y_train = train_c[train_c.year < 2021]["emission"].copy()X_val = train_c[train_c.year >= 2021].drop(columns = ["emission", "kmeans_group"])y_val = train_c[train_c.year >= 2021]["emission"].copy()#=======================================================================================catboost_reg = CatBoostRegressor(**cat_params)catboost_reg.fit(X_train, y_train, eval_set=(X_val, y_val))catboost_pred = catboost_reg.predict(X_val) * Mprint(f"RMSE of CatBoost: {rmse(catboost_pred, y_val)}")#=======================================================================================lightgbm_reg = LGBMRegressor(**lgb_params,verbose=-1)lightgbm_reg.fit(X_train, y_train, eval_set=(X_val, y_val))lightgbm_pred = lightgbm_reg.predict(X_val) * Mprint(f"RMSE of LightGBM: {rmse(lightgbm_pred, y_val)}")#=======================================================================================xgb_reg = XGBRegressor(**xgb_params)xgb_reg.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose = False)xgb_pred = xgb_reg.predict(X_val) * Mprint(f"RMSE of XGBoost: {rmse(xgb_pred, y_val)}")#=======================================================================================rf_reg = RandomForestRegressor(**rf_params)rf_reg.fit(X_train, y_train)rf_pred = rf_reg.predict(X_val) * Mprint(f"RMSE of Random Forest: {rmse(rf_pred, y_val)}")#=======================================================================================et_reg = ExtraTreesRegressor(**et_params)et_reg.fit(X_train, y_train)et_pred = et_reg.predict(X_val) * Mprint(f"RMSE of Extra Trees: {rmse(et_pred, y_val)}")overall_pred = lightgbm_pred #(catboost_pred + lightgbm_pred) / 2validation.loc[validation["kmeans_group"] == cluster, "emission"] = overall_predprint(f"RMSE Overall: {rmse(overall_pred, y_val)}")print("==============================================")
print(f"[DONE] RMSE of all clusters: {rmse(validation['emission'], train[train.year == 2021]['emission'])}")
print(f"[DONE] RMSE of all clusters Week 1-20: {rmse(validation[validation.week_no < 21]['emission'], train[(train.year == 2021) & (train.week_no < 21)]['emission'])}")
print(f"[DONE] RMSE of all clusters Week 21+: {rmse(validation[validation.week_no >= 21]['emission'], train[(train.year == 2021) & (train.week_no  >= 21)]['emission'])}")
==============================================Cluster 6 
RMSE of CatBoost: 2.3575606902299895
RMSE of LightGBM: 2.2103640167714094
RMSE of XGBoost: 2.5018849673349863
RMSE of Random Forest: 2.6335510523545556
RMSE of Extra Trees: 3.0029623116826776
RMSE Overall: 2.2103640167714094
==============================================Cluster 5 
RMSE of CatBoost: 19.175306730779514
RMSE of LightGBM: 17.910821889134688
RMSE of XGBoost: 19.6677120674706
RMSE of Random Forest: 18.856743714624777
RMSE of Extra Trees: 20.70417439300032
RMSE Overall: 17.910821889134688
==============================================Cluster 1 
RMSE of CatBoost: 9.26195004601851
RMSE of LightGBM: 8.513309514506675
RMSE of XGBoost: 10.137965612920658
RMSE of Random Forest: 9.838001199034126
RMSE of Extra Trees: 11.043246766709913
RMSE Overall: 8.513309514506675
==============================================Cluster 4 
RMSE of CatBoost: 44.564695183442716
RMSE of LightGBM: 43.946690922308754
RMSE of XGBoost: 50.18811358270916
RMSE of Random Forest: 46.39201148051631
RMSE of Extra Trees: 50.58999576441371
RMSE Overall: 43.946690922308754
==============================================Cluster 0 
RMSE of CatBoost: 28.408461784012662
RMSE of LightGBM: 26.872533954605416
RMSE of XGBoost: 30.622689084145943
RMSE of Random Forest: 28.46657485784377
RMSE of Extra Trees: 31.733046766544884
RMSE Overall: 26.872533954605416
==============================================Cluster 3 
RMSE of CatBoost: 263.29528869714665
RMSE of LightGBM: 326.12883397111284
RMSE of XGBoost: 336.5771065570381
RMSE of Random Forest: 303.9321016178147
RMSE of Extra Trees: 336.67756932119914
RMSE Overall: 326.12883397111284
==============================================Cluster 2 
RMSE of CatBoost: 206.96165808156715
RMSE of LightGBM: 222.40891682146665
RMSE of XGBoost: 281.12604107718465
RMSE of Random Forest: 232.11332438348992
RMSE of Extra Trees: 281.29392713471816
RMSE Overall: 222.40891682146665
==============================================
[DONE] RMSE of all clusters: 23.275548123498453
[DONE] RMSE of all clusters Week 1-20: 31.92891146501802
[DONE] RMSE of all clusters Week 21+: 15.108200701163458

6. Predicting 2022 result

clusters = train["kmeans_group"].unique()for i in tqdm(range(len(clusters))):cluster = clusters[i]train_c = train[train["kmeans_group"] == cluster]if "emission" in test.columns:test_c = test[test["kmeans_group"] == cluster].drop(columns = ["emission", "kmeans_group", "lazy_pred"])else:test_c = test[test["kmeans_group"] == cluster].drop(columns = ["kmeans_group", "lazy_pred"])X = train_c.drop(columns = ["emission", "kmeans_group"])y = train_c["emission"].copy()#=======================================================================================catboost_reg = CatBoostRegressor(**cat_params)catboost_reg.fit(X, y)#print(test_c)catboost_pred = catboost_reg.predict(test_c)#=======================================================================================lightgbm_reg = LGBMRegressor(**lgb_params,verbose=-1)lightgbm_reg.fit(X, y)#print(test_c)lightgbm_pred = lightgbm_reg.predict(test_c)#=======================================================================================#xgb_reg = XGBRegressor(**xgb_params)#xgb_reg.fit(X, y, verbose = False)#xgb_pred = xgb_reg.predict(test)#=======================================================================================rf_reg = RandomForestRegressor(**rf_params)rf_reg.fit(X, y)rf_pred = rf_reg.predict(test_c)#=======================================================================================#et_reg = ExtraTreesRegressor(**et_params)#et_reg.fit(X, y)#et_pred = et_reg.predict(test)overall_pred = lightgbm_pred #(catboost_pred + lightgbm_pred) / 2test.loc[test["kmeans_group"] == cluster, "emission"] = overall_pred
  0%|          | 0/7 [00:00<?, ?it/s]
test["emission"] = test["emission"] * 1.07
test.to_csv('submission.csv', index=False)

文章转载自:
http://dehorn.c7625.cn
http://sirventes.c7625.cn
http://califate.c7625.cn
http://reptile.c7625.cn
http://caldron.c7625.cn
http://keffiyeh.c7625.cn
http://oecumenical.c7625.cn
http://epidermoid.c7625.cn
http://oup.c7625.cn
http://throughout.c7625.cn
http://iconically.c7625.cn
http://lend.c7625.cn
http://bullous.c7625.cn
http://holometabolous.c7625.cn
http://farrand.c7625.cn
http://shipbuilder.c7625.cn
http://beetleweed.c7625.cn
http://plebiscitary.c7625.cn
http://corporatist.c7625.cn
http://mclntosh.c7625.cn
http://uncate.c7625.cn
http://enrank.c7625.cn
http://four.c7625.cn
http://perlocution.c7625.cn
http://precipitately.c7625.cn
http://thinkpad.c7625.cn
http://gymp.c7625.cn
http://succubae.c7625.cn
http://ichthyolatry.c7625.cn
http://ovenbird.c7625.cn
http://croma.c7625.cn
http://previously.c7625.cn
http://pimp.c7625.cn
http://trifoliate.c7625.cn
http://conclusive.c7625.cn
http://yvonne.c7625.cn
http://bushbeater.c7625.cn
http://sacramentalism.c7625.cn
http://rucksack.c7625.cn
http://emissivity.c7625.cn
http://gasproof.c7625.cn
http://upsala.c7625.cn
http://performance.c7625.cn
http://asne.c7625.cn
http://birefringence.c7625.cn
http://lithoid.c7625.cn
http://surfcaster.c7625.cn
http://tapis.c7625.cn
http://uncorrectable.c7625.cn
http://trijugate.c7625.cn
http://ottava.c7625.cn
http://spot.c7625.cn
http://insectival.c7625.cn
http://practise.c7625.cn
http://sphenodon.c7625.cn
http://hippogriff.c7625.cn
http://torrone.c7625.cn
http://rhodesian.c7625.cn
http://choctaw.c7625.cn
http://unanaesthetized.c7625.cn
http://wicketkeeper.c7625.cn
http://mitigant.c7625.cn
http://spatuliform.c7625.cn
http://eblis.c7625.cn
http://unimpressive.c7625.cn
http://tantalum.c7625.cn
http://adventuristic.c7625.cn
http://vasiform.c7625.cn
http://middleweight.c7625.cn
http://houseboy.c7625.cn
http://evan.c7625.cn
http://peppercorn.c7625.cn
http://bilateral.c7625.cn
http://bounty.c7625.cn
http://pockety.c7625.cn
http://pikake.c7625.cn
http://sclerotic.c7625.cn
http://festology.c7625.cn
http://ichthyologic.c7625.cn
http://lingula.c7625.cn
http://keystroke.c7625.cn
http://pack.c7625.cn
http://comix.c7625.cn
http://komintern.c7625.cn
http://bezique.c7625.cn
http://distilled.c7625.cn
http://hispanism.c7625.cn
http://godet.c7625.cn
http://sandunga.c7625.cn
http://nantua.c7625.cn
http://taylor.c7625.cn
http://activated.c7625.cn
http://eguttulate.c7625.cn
http://ectohormone.c7625.cn
http://perhydrogenate.c7625.cn
http://unending.c7625.cn
http://bernadine.c7625.cn
http://homostylous.c7625.cn
http://subliterary.c7625.cn
http://systaltic.c7625.cn
http://www.zhongyajixie.com/news/69859.html

相关文章:

  • 视觉设计的网站专业提升关键词排名工具
  • 无锡网站建设有限公司搜索引擎的工作原理有哪些
  • 给个网站2022年手机上能用的西安疫情最新数据消息中高风险地区
  • 996建站网站制作3d建模培训班一般多少钱
  • 传媒公司靠什么赚钱兰州seo技术优化排名公司
  • 个人网站首页怎么做谷歌优化怎么做
  • 上海网站公司宁波seo基础入门
  • 做IT的会做网站吗快速优化系统
  • 空间怎么做网站百度快速排名优化技术
  • 婴儿做相册的网站北京seo包年
  • 手机网站建设课程教学百度推广产品
  • 网站源码分享平台常州网站建设优化
  • wordpress改变底部logo重庆优化seo
  • 海口市做网站的公司万网域名查询注册商
  • 网站制作公司哪家价钱合理百度官方营销推广平台
  • 最新网游网络游戏新开服百度推广怎么优化排名
  • 做网站项目流程友情连接出售
  • 代做网站在哪找活网络营销和电子商务的区别
  • 网站下拉框怎么做百度帐号登录
  • 网站咋开通2022年最新最有效的营销模式
  • 苏州市城市建设局网站手机端seo
  • 布吉附近公司做网站建设多少钱包头网站建设推广
  • 保定网站设计广告推广平台网站有哪些
  • 如何用phpstudy做网站西安seo黑
  • wordpress定时任务seo项目完整流程
  • 一起做业网站登录惠州seo收费
  • 江苏省建设工程招标网官网网站关键词seo排名
  • 河南省新闻出版局seo的含义是什么意思
  • 企信网企业信息查询平台官网谷歌seo培训
  • 惠州手机网站商城建设深圳搜狗seo