mirror of
https://gitee.com/coder-xiaomo/leetcode-problemset
synced 2025-01-10 18:48:13 +08:00
184 lines
18 KiB
JSON
184 lines
18 KiB
JSON
{
|
|
"data": {
|
|
"question": {
|
|
"questionId": "609",
|
|
"questionFrontendId": "609",
|
|
"boundTopicId": null,
|
|
"title": "Find Duplicate File in System",
|
|
"titleSlug": "find-duplicate-file-in-system",
|
|
"content": "<p>Given a list <code>paths</code> of directory info, including the directory path, and all the files with contents in this directory, return <em>all the duplicate files in the file system in terms of their paths</em>. You may return the answer in <strong>any order</strong>.</p>\n\n<p>A group of duplicate files consists of at least two files that have the same content.</p>\n\n<p>A single directory info string in the input list has the following format:</p>\n\n<ul>\n\t<li><code>"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"</code></li>\n</ul>\n\n<p>It means there are <code>n</code> files <code>(f1.txt, f2.txt ... fn.txt)</code> with content <code>(f1_content, f2_content ... fn_content)</code> respectively in the directory "<code>root/d1/d2/.../dm"</code>. Note that <code>n >= 1</code> and <code>m >= 0</code>. If <code>m = 0</code>, it means the directory is just the root directory.</p>\n\n<p>The output is a list of groups of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:</p>\n\n<ul>\n\t<li><code>"directory_path/file_name.txt"</code></li>\n</ul>\n\n<p> </p>\n<p><strong class=\"example\">Example 1:</strong></p>\n<pre><strong>Input:</strong> paths = [\"root/a 1.txt(abcd) 2.txt(efgh)\",\"root/c 3.txt(abcd)\",\"root/c/d 4.txt(efgh)\",\"root 4.txt(efgh)\"]\n<strong>Output:</strong> [[\"root/a/2.txt\",\"root/c/d/4.txt\",\"root/4.txt\"],[\"root/a/1.txt\",\"root/c/3.txt\"]]\n</pre><p><strong class=\"example\">Example 2:</strong></p>\n<pre><strong>Input:</strong> paths = [\"root/a 1.txt(abcd) 2.txt(efgh)\",\"root/c 3.txt(abcd)\",\"root/c/d 4.txt(efgh)\"]\n<strong>Output:</strong> [[\"root/a/2.txt\",\"root/c/d/4.txt\"],[\"root/a/1.txt\",\"root/c/3.txt\"]]\n</pre>\n<p> </p>\n<p><strong>Constraints:</strong></p>\n\n<ul>\n\t<li><code>1 <= paths.length <= 2 * 10<sup>4</sup></code></li>\n\t<li><code>1 <= paths[i].length <= 3000</code></li>\n\t<li><code>1 <= sum(paths[i].length) <= 5 * 10<sup>5</sup></code></li>\n\t<li><code>paths[i]</code> consist of English letters, digits, <code>'/'</code>, <code>'.'</code>, <code>'('</code>, <code>')'</code>, and <code>' '</code>.</li>\n\t<li>You may assume no files or directories share the same name in the same directory.</li>\n\t<li>You may assume each given directory info represents a unique directory. A single blank space separates the directory path and file info.</li>\n</ul>\n\n<p> </p>\n<p><strong>Follow up:</strong></p>\n\n<ul>\n\t<li>Imagine you are given a real file system, how will you search files? DFS or BFS?</li>\n\t<li>If the file content is very large (GB level), how will you modify your solution?</li>\n\t<li>If you can only read the file by 1kb each time, how will you modify your solution?</li>\n\t<li>What is the time complexity of your modified solution? What is the most time-consuming part and memory-consuming part of it? How to optimize?</li>\n\t<li>How to make sure the duplicated files you find are not false positive?</li>\n</ul>\n",
|
|
"translatedTitle": null,
|
|
"translatedContent": null,
|
|
"isPaidOnly": false,
|
|
"difficulty": "Medium",
|
|
"likes": 1479,
|
|
"dislikes": 1641,
|
|
"isLiked": null,
|
|
"similarQuestions": "[{\"title\": \"Delete Duplicate Folders in System\", \"titleSlug\": \"delete-duplicate-folders-in-system\", \"difficulty\": \"Hard\", \"translatedTitle\": null}]",
|
|
"exampleTestcases": "[\"root/a 1.txt(abcd) 2.txt(efgh)\",\"root/c 3.txt(abcd)\",\"root/c/d 4.txt(efgh)\",\"root 4.txt(efgh)\"]\n[\"root/a 1.txt(abcd) 2.txt(efgh)\",\"root/c 3.txt(abcd)\",\"root/c/d 4.txt(efgh)\"]",
|
|
"categoryTitle": "Algorithms",
|
|
"contributors": [],
|
|
"topicTags": [
|
|
{
|
|
"name": "Array",
|
|
"slug": "array",
|
|
"translatedName": null,
|
|
"__typename": "TopicTagNode"
|
|
},
|
|
{
|
|
"name": "Hash Table",
|
|
"slug": "hash-table",
|
|
"translatedName": null,
|
|
"__typename": "TopicTagNode"
|
|
},
|
|
{
|
|
"name": "String",
|
|
"slug": "string",
|
|
"translatedName": null,
|
|
"__typename": "TopicTagNode"
|
|
}
|
|
],
|
|
"companyTagStats": null,
|
|
"codeSnippets": [
|
|
{
|
|
"lang": "C++",
|
|
"langSlug": "cpp",
|
|
"code": "class Solution {\npublic:\n vector<vector<string>> findDuplicate(vector<string>& paths) {\n \n }\n};",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Java",
|
|
"langSlug": "java",
|
|
"code": "class Solution {\n public List<List<String>> findDuplicate(String[] paths) {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Python",
|
|
"langSlug": "python",
|
|
"code": "class Solution(object):\n def findDuplicate(self, paths):\n \"\"\"\n :type paths: List[str]\n :rtype: List[List[str]]\n \"\"\"\n ",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Python3",
|
|
"langSlug": "python3",
|
|
"code": "class Solution:\n def findDuplicate(self, paths: List[str]) -> List[List[str]]:\n ",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "C",
|
|
"langSlug": "c",
|
|
"code": "/**\n * Return an array of arrays of size *returnSize.\n * The sizes of the arrays are returned as *returnColumnSizes array.\n * Note: Both returned array and *columnSizes array must be malloced, assume caller calls free().\n */\nchar*** findDuplicate(char** paths, int pathsSize, int* returnSize, int** returnColumnSizes) {\n \n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "C#",
|
|
"langSlug": "csharp",
|
|
"code": "public class Solution {\n public IList<IList<string>> FindDuplicate(string[] paths) {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "JavaScript",
|
|
"langSlug": "javascript",
|
|
"code": "/**\n * @param {string[]} paths\n * @return {string[][]}\n */\nvar findDuplicate = function(paths) {\n \n};",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "TypeScript",
|
|
"langSlug": "typescript",
|
|
"code": "function findDuplicate(paths: string[]): string[][] {\n \n};",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "PHP",
|
|
"langSlug": "php",
|
|
"code": "class Solution {\n\n /**\n * @param String[] $paths\n * @return String[][]\n */\n function findDuplicate($paths) {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Swift",
|
|
"langSlug": "swift",
|
|
"code": "class Solution {\n func findDuplicate(_ paths: [String]) -> [[String]] {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Kotlin",
|
|
"langSlug": "kotlin",
|
|
"code": "class Solution {\n fun findDuplicate(paths: Array<String>): List<List<String>> {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Dart",
|
|
"langSlug": "dart",
|
|
"code": "class Solution {\n List<List<String>> findDuplicate(List<String> paths) {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Go",
|
|
"langSlug": "golang",
|
|
"code": "func findDuplicate(paths []string) [][]string {\n \n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Ruby",
|
|
"langSlug": "ruby",
|
|
"code": "# @param {String[]} paths\n# @return {String[][]}\ndef find_duplicate(paths)\n \nend",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Scala",
|
|
"langSlug": "scala",
|
|
"code": "object Solution {\n def findDuplicate(paths: Array[String]): List[List[String]] = {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Rust",
|
|
"langSlug": "rust",
|
|
"code": "impl Solution {\n pub fn find_duplicate(paths: Vec<String>) -> Vec<Vec<String>> {\n \n }\n}",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Racket",
|
|
"langSlug": "racket",
|
|
"code": "(define/contract (find-duplicate paths)\n (-> (listof string?) (listof (listof string?)))\n )",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Erlang",
|
|
"langSlug": "erlang",
|
|
"code": "-spec find_duplicate(Paths :: [unicode:unicode_binary()]) -> [[unicode:unicode_binary()]].\nfind_duplicate(Paths) ->\n .",
|
|
"__typename": "CodeSnippetNode"
|
|
},
|
|
{
|
|
"lang": "Elixir",
|
|
"langSlug": "elixir",
|
|
"code": "defmodule Solution do\n @spec find_duplicate(paths :: [String.t]) :: [[String.t]]\n def find_duplicate(paths) do\n \n end\nend",
|
|
"__typename": "CodeSnippetNode"
|
|
}
|
|
],
|
|
"stats": "{\"totalAccepted\": \"148.3K\", \"totalSubmission\": \"219.3K\", \"totalAcceptedRaw\": 148299, \"totalSubmissionRaw\": 219345, \"acRate\": \"67.6%\"}",
|
|
"hints": [],
|
|
"solution": {
|
|
"id": "151",
|
|
"canSeeDetail": true,
|
|
"paidOnly": false,
|
|
"hasVideoSolution": false,
|
|
"paidOnlyVideo": true,
|
|
"__typename": "ArticleNode"
|
|
},
|
|
"status": null,
|
|
"sampleTestCase": "[\"root/a 1.txt(abcd) 2.txt(efgh)\",\"root/c 3.txt(abcd)\",\"root/c/d 4.txt(efgh)\",\"root 4.txt(efgh)\"]",
|
|
"metaData": "{\r\n \"name\": \"findDuplicate\",\r\n \"params\": [\r\n {\r\n \"name\": \"paths\",\r\n \"type\": \"string[]\"\r\n }\r\n ],\r\n \"return\": {\r\n \"type\": \"list<list<string>>\"\r\n }\r\n}\r\n",
|
|
"judgerAvailable": true,
|
|
"judgeType": "large",
|
|
"mysqlSchemas": [],
|
|
"enableRunCode": true,
|
|
"enableTestMode": false,
|
|
"enableDebugger": true,
|
|
"envInfo": "{\"cpp\": [\"C++\", \"<p>Compiled with <code> clang 11 </code> using the latest C++ 20 standard.</p>\\r\\n\\r\\n<p>Your code is compiled with level two optimization (<code>-O2</code>). <a href=\\\"https://github.com/google/sanitizers/wiki/AddressSanitizer\\\" target=\\\"_blank\\\">AddressSanitizer</a> is also enabled to help detect out-of-bounds and use-after-free bugs.</p>\\r\\n\\r\\n<p>Most standard library headers are already included automatically for your convenience.</p>\"], \"java\": [\"Java\", \"<p><code>OpenJDK 17</code>. Java 8 features such as lambda expressions and stream API can be used. </p>\\r\\n\\r\\n<p>Most standard library headers are already included automatically for your convenience.</p>\\r\\n<p>Includes <code>Pair</code> class from https://docs.oracle.com/javase/8/javafx/api/javafx/util/Pair.html.</p>\"], \"python\": [\"Python\", \"<p><code>Python 2.7.12</code>.</p>\\r\\n\\r\\n<p>Most libraries are already imported automatically for your convenience, such as <a href=\\\"https://docs.python.org/2/library/array.html\\\" target=\\\"_blank\\\">array</a>, <a href=\\\"https://docs.python.org/2/library/bisect.html\\\" target=\\\"_blank\\\">bisect</a>, <a href=\\\"https://docs.python.org/2/library/collections.html\\\" target=\\\"_blank\\\">collections</a>. If you need more libraries, you can import it yourself.</p>\\r\\n\\r\\n<p>For Map/TreeMap data structure, you may use <a href=\\\"http://www.grantjenks.com/docs/sortedcontainers/\\\" target=\\\"_blank\\\">sortedcontainers</a> library.</p>\\r\\n\\r\\n<p>Note that Python 2.7 <a href=\\\"https://www.python.org/dev/peps/pep-0373/\\\" target=\\\"_blank\\\">will not be maintained past 2020</a>. For the latest Python, please choose Python3 instead.</p>\"], \"c\": [\"C\", \"<p>Compiled with <code>gcc 8.2</code> using the gnu11 standard.</p>\\r\\n\\r\\n<p>Your code is compiled with level one optimization (<code>-O1</code>). <a href=\\\"https://github.com/google/sanitizers/wiki/AddressSanitizer\\\" target=\\\"_blank\\\">AddressSanitizer</a> is also enabled to help detect out-of-bounds and use-after-free bugs.</p>\\r\\n\\r\\n<p>Most standard library headers are already included automatically for your convenience.</p>\\r\\n\\r\\n<p>For hash table operations, you may use <a href=\\\"https://troydhanson.github.io/uthash/\\\" target=\\\"_blank\\\">uthash</a>. \\\"uthash.h\\\" is included by default. Below are some examples:</p>\\r\\n\\r\\n<p><b>1. Adding an item to a hash.</b>\\r\\n<pre>\\r\\nstruct hash_entry {\\r\\n int id; /* we'll use this field as the key */\\r\\n char name[10];\\r\\n UT_hash_handle hh; /* makes this structure hashable */\\r\\n};\\r\\n\\r\\nstruct hash_entry *users = NULL;\\r\\n\\r\\nvoid add_user(struct hash_entry *s) {\\r\\n HASH_ADD_INT(users, id, s);\\r\\n}\\r\\n</pre>\\r\\n</p>\\r\\n\\r\\n<p><b>2. Looking up an item in a hash:</b>\\r\\n<pre>\\r\\nstruct hash_entry *find_user(int user_id) {\\r\\n struct hash_entry *s;\\r\\n HASH_FIND_INT(users, &user_id, s);\\r\\n return s;\\r\\n}\\r\\n</pre>\\r\\n</p>\\r\\n\\r\\n<p><b>3. Deleting an item in a hash:</b>\\r\\n<pre>\\r\\nvoid delete_user(struct hash_entry *user) {\\r\\n HASH_DEL(users, user); \\r\\n}\\r\\n</pre>\\r\\n</p>\"], \"csharp\": [\"C#\", \"<p><a href=\\\"https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-10\\\" target=\\\"_blank\\\">C# 10 with .NET 6 runtime</a></p>\"], \"javascript\": [\"JavaScript\", \"<p><code>Node.js 16.13.2</code>.</p>\\r\\n\\r\\n<p>Your code is run with <code>--harmony</code> flag, enabling <a href=\\\"http://node.green/\\\" target=\\\"_blank\\\">new ES6 features</a>.</p>\\r\\n\\r\\n<p><a href=\\\"https://lodash.com\\\" target=\\\"_blank\\\">lodash.js</a> library is included by default.</p>\\r\\n\\r\\n<p>For Priority Queue / Queue data structures, you may use 5.3.0 version of <a href=\\\"https://github.com/datastructures-js/priority-queue/tree/fb4fdb984834421279aeb081df7af624d17c2a03\\\" target=\\\"_blank\\\">datastructures-js/priority-queue</a> and 4.2.1 version of <a href=\\\"https://github.com/datastructures-js/queue/tree/e63563025a5a805aa16928cb53bcd517bfea9230\\\" target=\\\"_blank\\\">datastructures-js/queue</a>.</p>\"], \"ruby\": [\"Ruby\", \"<p><code>Ruby 3.1</code></p>\\r\\n\\r\\n<p>Some common data structure implementations are provided in the Algorithms module: https://www.rubydoc.info/github/kanwei/algorithms/Algorithms</p>\"], \"swift\": [\"Swift\", \"<p><code>Swift 5.5.2</code>.</p>\"], \"golang\": [\"Go\", \"<p><code>Go 1.21</code></p>\\r\\n<p>Support <a href=\\\"https://github.com/emirpasic/gods/tree/v1.18.1\\\" target=\\\"_blank\\\">https://godoc.org/github.com/emirpasic/gods@v1.18.1</a> library.</p>\"], \"python3\": [\"Python3\", \"<p><code>Python 3.10</code>.</p>\\r\\n\\r\\n<p>Most libraries are already imported automatically for your convenience, such as <a href=\\\"https://docs.python.org/3/library/array.html\\\" target=\\\"_blank\\\">array</a>, <a href=\\\"https://docs.python.org/3/library/bisect.html\\\" target=\\\"_blank\\\">bisect</a>, <a href=\\\"https://docs.python.org/3/library/collections.html\\\" target=\\\"_blank\\\">collections</a>. If you need more libraries, you can import it yourself.</p>\\r\\n\\r\\n<p>For Map/TreeMap data structure, you may use <a href=\\\"http://www.grantjenks.com/docs/sortedcontainers/\\\" target=\\\"_blank\\\">sortedcontainers</a> library.</p>\"], \"scala\": [\"Scala\", \"<p><code>Scala 2.13.7</code>.</p>\"], \"kotlin\": [\"Kotlin\", \"<p><code>Kotlin 1.9.0</code>.</p>\"], \"rust\": [\"Rust\", \"<p><code>Rust 1.58.1</code></p>\\r\\n\\r\\n<p>Supports <a href=\\\"https://crates.io/crates/rand\\\" target=\\\"_blank\\\">rand </a> v0.6\\u00a0from crates.io</p>\"], \"php\": [\"PHP\", \"<p><code>PHP 8.1</code>.</p>\\r\\n<p>With bcmath module</p>\"], \"typescript\": [\"Typescript\", \"<p><code>TypeScript 5.1.6, Node.js 16.13.2</code>.</p>\\r\\n\\r\\n<p>Your code is run with <code>--harmony</code> flag, enabling <a href=\\\"http://node.green/\\\" target=\\\"_blank\\\">new ES2022 features</a>.</p>\\r\\n\\r\\n<p><a href=\\\"https://lodash.com\\\" target=\\\"_blank\\\">lodash.js</a> library is included by default.</p>\"], \"racket\": [\"Racket\", \"<p>Run with <code>Racket 8.3</code>.</p>\"], \"erlang\": [\"Erlang\", \"Erlang/OTP 25.0\"], \"elixir\": [\"Elixir\", \"Elixir 1.13.4 with Erlang/OTP 25.0\"], \"dart\": [\"Dart\", \"<p>Dart 2.17.3</p>\\r\\n\\r\\n<p>Your code will be run directly without compiling</p>\"]}",
|
|
"libraryUrl": null,
|
|
"adminUrl": null,
|
|
"challengeQuestion": null,
|
|
"__typename": "QuestionNode"
|
|
}
|
|
}
|
|
} |