{ "data": { "question": { "questionId": "3071", "questionFrontendId": "2882", "categoryTitle": "pandas", "boundTopicId": 2467488, "title": "Drop Duplicate Rows", "titleSlug": "drop-duplicate-rows", "content": "
\nDataFrame customers\n+-------------+--------+\n| Column Name | Type |\n+-------------+--------+\n| customer_id | int |\n| name | object |\n| email | object |\n+-------------+--------+\n\n\n
There are some duplicate rows in the DataFrame based on the email
column.
Write a solution to remove these duplicate rows and keep only the first occurrence.
\n\nThe result format is in the following example.
\n\n\n
\nExample 1:\nInput:\n+-------------+---------+---------------------+\n| customer_id | name | email |\n+-------------+---------+---------------------+\n| 1 | Ella | emily@example.com |\n| 2 | David | michael@example.com |\n| 3 | Zachary | sarah@example.com |\n| 4 | Alice | john@example.com |\n| 5 | Finn | john@example.com |\n| 6 | Violet | alice@example.com |\n+-------------+---------+---------------------+\nOutput: \n+-------------+---------+---------------------+\n| customer_id | name | email |\n+-------------+---------+---------------------+\n| 1 | Ella | emily@example.com |\n| 2 | David | michael@example.com |\n| 3 | Zachary | sarah@example.com |\n| 4 | Alice | john@example.com |\n| 6 | Violet | alice@example.com |\n+-------------+---------+---------------------+\nExplanation:\nAlic (customer_id = 4) and Finn (customer_id = 5) both use john@example.com, so only the first occurrence of this email is retained.\n\n", "translatedTitle": "删去重复的行", "translatedContent": "
\nDataFrame customers\n+-------------+--------+\n| Column Name | Type |\n+-------------+--------+\n| customer_id | int |\n| name | object |\n| email | object |\n+-------------+--------+\n\n\n
在 DataFrame 中基于 email
列存在一些重复行。
编写一个解决方案,删除这些重复行,仅保留第一次出现的行。
\n\n返回结果格式如下例所示。
\n\n\n\n
示例 1:
\n\n\n输入:\n+-------------+---------+---------------------+\n| customer_id | name | email |\n+-------------+---------+---------------------+\n| 1 | Ella | emily@example.com |\n| 2 | David | michael@example.com |\n| 3 | Zachary | sarah@example.com |\n| 4 | Alice | john@example.com |\n| 5 | Finn | john@example.com |\n| 6 | Violet | alice@example.com |\n+-------------+---------+---------------------+\n输出:\n+-------------+---------+---------------------+\n| customer_id | name | email |\n+-------------+---------+---------------------+\n| 1 | Ella | emily@example.com |\n| 2 | David | michael@example.com |\n| 3 | Zachary | sarah@example.com |\n| 4 | Alice | john@example.com |\n| 6 | Violet | alice@example.com |\n+-------------+---------+---------------------+\n解释:\nAlice (customer_id = 4) 和 Finn (customer_id = 5) 都使用 john@example.com,因此只保留该邮箱地址的第一次出现。\n\n", "isPaidOnly": false, "difficulty": "Easy", "likes": 1, "dislikes": 0, "isLiked": null, "similarQuestions": "[]", "contributors": [], "langToValidPlayground": "{\"cpp\": false, \"java\": false, \"python\": false, \"python3\": false, \"mysql\": false, \"mssql\": false, \"oraclesql\": false, \"c\": false, \"csharp\": false, \"javascript\": false, \"typescript\": false, \"bash\": false, \"php\": false, \"swift\": false, \"kotlin\": false, \"dart\": false, \"golang\": false, \"ruby\": false, \"scala\": false, \"html\": false, \"pythonml\": false, \"rust\": false, \"racket\": false, \"erlang\": false, \"elixir\": false, \"pythondata\": false, \"react\": false, \"vanillajs\": false, \"postgresql\": false}", "topicTags": [], "companyTagStats": null, "codeSnippets": [ { "lang": "Pandas", "langSlug": "pythondata", "code": "import pandas as pd\n\ndef dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:\n ", "__typename": "CodeSnippetNode" } ], "stats": "{\"totalAccepted\": \"1.5K\", \"totalSubmission\": \"1.9K\", \"totalAcceptedRaw\": 1535, \"totalSubmissionRaw\": 1924, \"acRate\": \"79.8%\"}", "hints": [ "Consider using a build-in function in pandas library to remove the duplicate rows based on specified data." ], "solution": null, "status": null, "sampleTestCase": "{\"headers\":{\"customers\":[\"customer_id\",\"name\",\"email\"]},\"rows\":{\"customers\":[[1,\"Ella\",\"emily@example.com\"],[2,\"David\",\"michael@example.com\"],[3,\"Zachary\",\"sarah@example.com\"],[4,\"Alice\",\"john@example.com\"],[5,\"Finn\",\"john@example.com\"],[6,\"Violet\",\"alice@example.com\"]]}}", "metaData": "{\n \"pythondata\": [\n \"customers = pd.DataFrame([], columns=['customer_id', 'name', 'email']).astype({'customer_id':'Int64', 'name':'object', 'email':'object'})\"\n ],\n \"database\": true,\n \"name\": \"dropDuplicateEmails\",\n \"languages\": [\n \"pythondata\"\n ]\n}", "judgerAvailable": true, "judgeType": "large", "mysqlSchemas": [], "enableRunCode": true, "envInfo": "{\"pythondata\":[\"Pandas\",\"
Python 3.10 with Pandas 2.0.2 and NumPy 1.25.0<\\/p>\"]}", "book": null, "isSubscribed": false, "isDailyQuestion": false, "dailyRecordStatus": null, "editorType": "CKEDITOR", "ugcQuestionId": null, "style": "LEETCODE", "exampleTestcases": "{\"headers\":{\"customers\":[\"customer_id\",\"name\",\"email\"]},\"rows\":{\"customers\":[[1,\"Ella\",\"emily@example.com\"],[2,\"David\",\"michael@example.com\"],[3,\"Zachary\",\"sarah@example.com\"],[4,\"Alice\",\"john@example.com\"],[5,\"Finn\",\"john@example.com\"],[6,\"Violet\",\"alice@example.com\"]]}}", "__typename": "QuestionNode" } } }