Pandas - reset_index()
Functions like sklearn.model_selection.train_test_split can be used on dataframes, and it keeps original indexes to the row that was shuffled.
| Fruit | |
|---|---|
| 2 | Apple |
| 3 | Banana |
I have a tendency to use reset_index() to cleanup my dataframe:
| Fruit | |
|---|---|
| 0 | Apple |
| 1 | Banana |
Looks much cleaner, but any operations that relied on indexes will cause issues, such as pandas.concat. Thus sharing code with other developers becomes painful because any operations that happens after the reset_index() loses reference to the original row:
| Weight (Grams) | |
|---|---|
| 0 | 100 |
| 1 | 120 |
A joined table should be:
| Fruit | Weight (Grams) | |
|---|---|---|
| 0 | Apple | 100 |
| 1 | Banana | 120 |
But instead:
| Fruit | Weight (Grams) | |
|---|---|---|
| 2 | Apple | nan |
| 3 | Banana | nan |
| 0 | nan | 100 |
| 1 | nan | 120 |