{ "data": { "question": { "questionId": "1000051", "questionFrontendId": "面试题 17.26", "categoryTitle": "LCCI", "boundTopicId": 103156, "title": "Sparse Similarity LCCI", "titleSlug": "sparse-similarity-lcci", "content": "
The similarity of two documents (each with distinct words) is defined to be the size of the intersection divided by the size of the union. For example, if the documents consist of integers, the similarity of {1, 5, 3} and {1, 7, 2, 3} is 0.4, because the intersection has size 2 and the union has size 5. We have a long list of documents (with distinct values and each with an associated ID) where the similarity is believed to be "sparse". That is, any two arbitrarily selected documents are very likely to have similarity 0. Design an algorithm that returns a list of pairs of document IDs and the associated similarity.
\r\n\r\nInput is a 2D array docs
, where docs[i]
is the document with id i
. Return an array of strings, where each string represents a pair of documents with similarity greater than 0. The string should be formatted as {id1},{id2}: {similarity}
, where id1
is the smaller id in the two documents, and similarity
is the similarity rounded to four decimal places. You can return the array in any order.
Example:
\r\n\r\n\r\nInput: \r\n[\r\n [14, 15, 100, 9, 3],\r\n [32, 1, 9, 3, 5],\r\n [15, 29, 2, 6, 8, 7],\r\n [7, 10]\r\n]
\r\nOutput:\r\n[\r\n "0,1: 0.2500",\r\n "0,2: 0.1000",\r\n "2,3: 0.1429"\r\n]
\r\n\r\nNote:
\r\n\r\ndocs.length <= 500
docs[i].length <= 500
两个(具有不同单词的)文档的交集(intersection)中元素的个数除以并集(union)中元素的个数,就是这两个文档的相似度。例如,{1, 5, 3} 和 {1, 7, 2, 3} 的相似度是 0.4,其中,交集的元素有 2 个,并集的元素有 5 个。给定一系列的长篇文档,每个文档元素各不相同,并与一个 ID 相关联。它们的相似度非常“稀疏”,也就是说任选 2 个文档,相似度都很接近 0。请设计一个算法返回每对文档的 ID 及其相似度。只需输出相似度大于 0 的组合。请忽略空文档。为简单起见,可以假定每个文档由一个含有不同整数的数组表示。
\n\n输入为一个二维数组 docs
,docs[i]
表示 id 为 i
的文档。返回一个数组,其中每个元素是一个字符串,代表每对相似度大于 0 的文档,其格式为 {id1},{id2}: {similarity}
,其中 id1
为两个文档中较小的 id,similarity
为相似度,精确到小数点后 4 位。以任意顺序返回数组均可。
示例:
\n\n输入: \n[\n [14, 15, 100, 9, 3],\n [32, 1, 9, 3, 5],\n [15, 29, 2, 6, 8, 7],\n [7, 10]\n]
\n输出:\n[\n "0,1: 0.2500",\n "0,2: 0.1000",\n "2,3: 0.1429"\n]
\n\n提示:
\n\ndocs.length <= 500
docs[i].length <= 500
\\u7248\\u672c\\uff1a \\u7f16\\u8bd1\\u65f6\\uff0c\\u5c06\\u4f1a\\u91c7\\u7528 \\u4e3a\\u4e86\\u4f7f\\u7528\\u65b9\\u4fbf\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8\\u5bfc\\u5165\\u3002<\\/p>\"],\"java\":[\"Java\",\" \\u7248\\u672c\\uff1a \\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u88ab\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u5305\\u542b Pair \\u7c7b: https:\\/\\/docs.oracle.com\\/javase\\/8\\/javafx\\/api\\/javafx\\/util\\/Pair.html <\\/p>\"],\"python\":[\"Python\",\" \\u7248\\u672c\\uff1a \\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u5e38\\u7528\\u5e93\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8 \\u5bfc\\u5165\\uff0c\\u5982\\uff1aarray<\\/a>, bisect<\\/a>, collections<\\/a>\\u3002\\u5982\\u679c\\u60a8\\u9700\\u8981\\u4f7f\\u7528\\u5176\\u4ed6\\u5e93\\u51fd\\u6570\\uff0c\\u8bf7\\u81ea\\u884c\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u6ce8\\u610f Python 2.7 \\u5c06\\u57282020\\u5e74\\u540e\\u4e0d\\u518d\\u7ef4\\u62a4<\\/a>\\u3002 \\u5982\\u60f3\\u4f7f\\u7528\\u6700\\u65b0\\u7248\\u7684Python\\uff0c\\u8bf7\\u9009\\u62e9Python 3\\u3002<\\/p>\"],\"c\":[\"C\",\" \\u7248\\u672c\\uff1a \\u7f16\\u8bd1\\u65f6\\uff0c\\u5c06\\u4f1a\\u91c7\\u7528 \\u4e3a\\u4e86\\u4f7f\\u7528\\u65b9\\u4fbf\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u5982\\u60f3\\u4f7f\\u7528\\u54c8\\u5e0c\\u8868\\u8fd0\\u7b97, \\u60a8\\u53ef\\u4ee5\\u4f7f\\u7528 uthash<\\/a>\\u3002 \\\"uthash.h\\\"\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5bfc\\u5165\\u3002\\u8bf7\\u770b\\u5982\\u4e0b\\u793a\\u4f8b:<\\/p>\\r\\n\\r\\n 1. \\u5f80\\u54c8\\u5e0c\\u8868\\u4e2d\\u6dfb\\u52a0\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n 2. \\u5728\\u54c8\\u5e0c\\u8868\\u4e2d\\u67e5\\u627e\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n 3. \\u4ece\\u54c8\\u5e0c\\u8868\\u4e2d\\u5220\\u9664\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n C# 10<\\/a> \\u8fd0\\u884c\\u5728 .NET 6 \\u4e0a<\\/p>\"],\"javascript\":[\"JavaScript\",\" \\u7248\\u672c\\uff1a \\u60a8\\u7684\\u4ee3\\u7801\\u5728\\u6267\\u884c\\u65f6\\u5c06\\u5e26\\u4e0a lodash.js<\\/a> \\u5e93\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5305\\u542b\\u3002<\\/p>\\r\\n\\r\\n \\u5982\\u9700\\u4f7f\\u7528\\u961f\\u5217\\/\\u4f18\\u5148\\u961f\\u5217\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 datastructures-js\\/priority-queue@5.3.0<\\/a> \\u548c datastructures-js\\/queue@4.2.1<\\/a>\\u3002<\\/p>\"],\"ruby\":[\"Ruby\",\" \\u4f7f\\u7528 \\u4e00\\u4e9b\\u5e38\\u7528\\u7684\\u6570\\u636e\\u7ed3\\u6784\\u5df2\\u5728 Algorithms \\u6a21\\u5757\\u4e2d\\u63d0\\u4f9b\\uff1ahttps:\\/\\/www.rubydoc.info\\/github\\/kanwei\\/algorithms\\/Algorithms<\\/p>\"],\"swift\":[\"Swift\",\" \\u7248\\u672c\\uff1a \\u6211\\u4eec\\u901a\\u5e38\\u4fdd\\u8bc1\\u66f4\\u65b0\\u5230 Apple\\u653e\\u51fa\\u7684\\u6700\\u65b0\\u7248Swift<\\/a>\\u3002\\u5982\\u679c\\u60a8\\u53d1\\u73b0Swift\\u4e0d\\u662f\\u6700\\u65b0\\u7248\\u7684\\uff0c\\u8bf7\\u8054\\u7cfb\\u6211\\u4eec\\uff01\\u6211\\u4eec\\u5c06\\u5c3d\\u5feb\\u66f4\\u65b0\\u3002<\\/p>\"],\"golang\":[\"Go\",\" \\u7248\\u672c\\uff1a \\u652f\\u6301 https:\\/\\/godoc.org\\/github.com\\/emirpasic\\/gods@v1.18.1<\\/a> \\u7b2c\\u4e09\\u65b9\\u5e93\\u3002<\\/p>\"],\"python3\":[\"Python3\",\" \\u7248\\u672c\\uff1a \\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u5e38\\u7528\\u5e93\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8 \\u5bfc\\u5165\\uff0c\\u5982array<\\/a>, bisect<\\/a>, collections<\\/a>\\u3002 \\u5982\\u679c\\u60a8\\u9700\\u8981\\u4f7f\\u7528\\u5176\\u4ed6\\u5e93\\u51fd\\u6570\\uff0c\\u8bf7\\u81ea\\u884c\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u5982\\u9700\\u4f7f\\u7528 Map\\/TreeMap \\u6570\\u636e\\u7ed3\\u6784\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 sortedcontainers<\\/a> \\u5e93\\u3002<\\/p>\"],\"scala\":[\"Scala\",\" \\u7248\\u672c\\uff1a \\u7248\\u672c\\uff1a \\u6211\\u4eec\\u4f7f\\u7528\\u7684\\u662f JetBrains \\u63d0\\u4f9b\\u7684 experimental compiler\\u3002\\u5982\\u679c\\u60a8\\u8ba4\\u4e3a\\u60a8\\u9047\\u5230\\u4e86\\u7f16\\u8bd1\\u5668\\u76f8\\u5173\\u7684\\u95ee\\u9898\\uff0c\\u8bf7\\u5411\\u6211\\u4eec\\u53cd\\u9988<\\/p>\"],\"rust\":[\"Rust\",\" \\u7248\\u672c\\uff1a \\u652f\\u6301 crates.io \\u7684 rand<\\/a><\\/p>\"],\"php\":[\"PHP\",\" With bcmath module.<\\/p>\"],\"typescript\":[\"TypeScript\",\" TypeScript 5.1.6<\\/p>\\r\\n\\r\\n Compile Options: --alwaysStrict --strictBindCallApply --strictFunctionTypes --target ES2022<\\/p>\\r\\n\\r\\n lodash.js<\\/a> \\u5e93\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5305\\u542b\\u3002<\\/p>\\r\\n\\r\\n \\u5982\\u9700\\u4f7f\\u7528\\u961f\\u5217\\/\\u4f18\\u5148\\u961f\\u5217\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 datastructures-js\\/priority-queue@5.3.0<\\/a> \\u548c datastructures-js\\/queue@4.2.1<\\/a>\\u3002<\\/p>\"],\"racket\":[\"Racket\",\"clang 11<\\/code> \\u91c7\\u7528\\u6700\\u65b0C++ 20\\u6807\\u51c6\\u3002<\\/p>\\r\\n\\r\\n
-O2<\\/code>\\u7ea7\\u4f18\\u5316\\u3002AddressSanitizer<\\/a> \\u4e5f\\u88ab\\u5f00\\u542f\\u6765\\u68c0\\u6d4b
out-of-bounds<\\/code>\\u548c
use-after-free<\\/code>\\u9519\\u8bef\\u3002<\\/p>\\r\\n\\r\\n
OpenJDK 17<\\/code>\\u3002\\u53ef\\u4ee5\\u4f7f\\u7528Java 8\\u7684\\u7279\\u6027\\u4f8b\\u5982\\uff0clambda expressions \\u548c stream API\\u3002<\\/p>\\r\\n\\r\\n
Python 2.7.12<\\/code><\\/p>\\r\\n\\r\\n
GCC 8.2<\\/code>\\uff0c\\u91c7\\u7528GNU11\\u6807\\u51c6\\u3002<\\/p>\\r\\n\\r\\n
-O1<\\/code>\\u7ea7\\u4f18\\u5316\\u3002 AddressSanitizer<\\/a>\\u4e5f\\u88ab\\u5f00\\u542f\\u6765\\u68c0\\u6d4b
out-of-bounds<\\/code>\\u548c
use-after-free<\\/code>\\u9519\\u8bef\\u3002<\\/p>\\r\\n\\r\\n
\\r\\nstruct hash_entry {\\r\\n int id; \\/* we'll use this field as the key *\\/\\r\\n char name[10];\\r\\n UT_hash_handle hh; \\/* makes this structure hashable *\\/\\r\\n};\\r\\n\\r\\nstruct hash_entry *users = NULL;\\r\\n\\r\\nvoid add_user(struct hash_entry *s) {\\r\\n HASH_ADD_INT(users, id, s);\\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\\r\\n\\r\\n
\\r\\nstruct hash_entry *find_user(int user_id) {\\r\\n struct hash_entry *s;\\r\\n HASH_FIND_INT(users, &user_id, s);\\r\\n return s;\\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\\r\\n\\r\\n
\\r\\nvoid delete_user(struct hash_entry *user) {\\r\\n HASH_DEL(users, user); \\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\"],\"csharp\":[\"C#\",\"
Node.js 16.13.2<\\/code><\\/p>\\r\\n\\r\\n
--harmony<\\/code> \\u6807\\u8bb0\\u6765\\u5f00\\u542f \\u65b0\\u7248ES6\\u7279\\u6027<\\/a>\\u3002<\\/p>\\r\\n\\r\\n
Ruby 3.1<\\/code>\\u6267\\u884c<\\/p>\\r\\n\\r\\n
Swift 5.5.2<\\/code><\\/p>\\r\\n\\r\\n
Go 1.21<\\/code><\\/p>\\r\\n\\r\\n
Python 3.10<\\/code><\\/p>\\r\\n\\r\\n
Scala 2.13<\\/code><\\/p>\"],\"kotlin\":[\"Kotlin\",\"
Kotlin 1.9.0<\\/code><\\/p>\\r\\n\\r\\n
rust 1.58.1<\\/code><\\/p>\\r\\n\\r\\n
PHP 8.1<\\/code>.<\\/p>\\r\\n\\r\\n