comments | difficulty | edit_url | tags | |
---|---|---|---|---|
true |
简单 |
|
DataFrame customers +-------------+--------+ | Column Name | Type | +-------------+--------+ | customer_id | int | | name | object | | email | object | +-------------+--------+
在 DataFrame 中基于 email
列存在一些重复行。
编写一个解决方案,删除这些重复行,仅保留第一次出现的行。
返回结果格式如下例所示。
示例 1:
输入: +-------------+---------+---------------------+ | customer_id | name | email | +-------------+---------+---------------------+ | 1 | Ella | [email protected] | | 2 | David | [email protected] | | 3 | Zachary | [email protected] | | 4 | Alice | [email protected] | | 5 | Finn | [email protected] | | 6 | Violet | [email protected] | +-------------+---------+---------------------+ 输出: +-------------+---------+---------------------+ | customer_id | name | email | +-------------+---------+---------------------+ | 1 | Ella | [email protected] | | 2 | David | [email protected] | | 3 | Zachary | [email protected] | | 4 | Alice | [email protected] | | 6 | Violet | [email protected] | +-------------+---------+---------------------+ 解释: Alice (customer_id = 4) 和 Finn (customer_id = 5) 都使用 [email protected],因此只保留该邮箱地址的第一次出现。
import pandas as pd
def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
return customers.drop_duplicates(subset=['email'])