Skip to content

Latest commit

 

History

History
90 lines (66 loc) · 2.28 KB

File metadata and controls

90 lines (66 loc) · 2.28 KB
comments difficulty edit_url tags
true
简单
Pandas

English Version

题目描述

DataFrame customers
+-------------+--------+
| Column Name | Type   |
+-------------+--------+
| customer_id | int    |
| name        | object |
| email       | object |
+-------------+--------+

在 DataFrame 中基于 email 列存在一些重复行。

编写一个解决方案,删除这些重复行,仅保留第一次出现的行。

返回结果格式如下例所示。

 

示例 1:

输入:
+-------------+---------+---------------------+
| customer_id | name    | email               |
+-------------+---------+---------------------+
| 1           | Ella    | [email protected]   |
| 2           | David   | [email protected] |
| 3           | Zachary | [email protected]   |
| 4           | Alice   | [email protected]    |
| 5           | Finn    | [email protected]    |
| 6           | Violet  | [email protected]   |
+-------------+---------+---------------------+
输出:
+-------------+---------+---------------------+
| customer_id | name    | email               |
+-------------+---------+---------------------+
| 1           | Ella    | [email protected]   |
| 2           | David   | [email protected] |
| 3           | Zachary | [email protected]   |
| 4           | Alice   | [email protected]    |
| 6           | Violet  | [email protected]   |
+-------------+---------+---------------------+
解释:
Alice (customer_id = 4) 和 Finn (customer_id = 5) 都使用 [email protected],因此只保留该邮箱地址的第一次出现。

解法

方法一

Python3

import pandas as pd


def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
    return customers.drop_duplicates(subset=['email'])