{ "data": { "question": { "questionId": "393", "questionFrontendId": "393", "categoryTitle": "Algorithms", "boundTopicId": 1862, "title": "UTF-8 Validation", "titleSlug": "utf-8-validation", "content": "
Given an integer array data
representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:
\n\n0
, followed by its Unicode code.n
bits are all one's, the n + 1
bit is 0
, followed by n - 1
bytes with the most significant 2
bits being 10
.This is how the UTF-8 encoding would work:
\n\n\n Number of Bytes | UTF-8 Octet Sequence\n | (binary)\n --------------------+-----------------------------------------\n 1 | 0xxxxxxx\n 2 | 110xxxxx 10xxxxxx\n 3 | 1110xxxx 10xxxxxx 10xxxxxx\n 4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n\n\n
x
denotes a bit in the binary form of a byte that may be either 0
or 1
.
Note: The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.
\n\n\n
Example 1:
\n\n\nInput: data = [197,130,1]\nOutput: true\nExplanation: data represents the octet sequence: 11000101 10000010 00000001.\nIt is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.\n\n\n
Example 2:
\n\n\nInput: data = [235,140,4]\nOutput: false\nExplanation: data represented the octet sequence: 11101011 10001100 00000100.\nThe first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.\nThe next byte is a continuation byte which starts with 10 and that's correct.\nBut the second continuation byte does not start with 10, so it is invalid.\n\n\n
\n
Constraints:
\n\n1 <= data.length <= 2 * 104
0 <= data[i] <= 255
给定一个表示数据的整数数组 data
,返回它是否为有效的 UTF-8 编码。
UTF-8 中的一个字符可能的长度为 1 到 4 字节,遵循以下的规则:
\n\n这是 UTF-8 编码的工作方式:
\n\n\n Char. number range | UTF-8 octet sequence\n (hexadecimal) | (binary)\n --------------------+---------------------------------------------\n 0000 0000-0000 007F | 0xxxxxxx\n 0000 0080-0000 07FF | 110xxxxx 10xxxxxx\n 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx\n 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n
\n\n注意:输入是整数数组。只有每个整数的 最低 8 个有效位 用来存储数据。这意味着每个整数只表示 1 字节的数据。
\n\n\n\n
示例 1:
\n\n\n输入:data = [197,130,1]\n输出:true\n解释:数据表示字节序列:11000101 10000010 00000001。\n这是有效的 utf-8 编码,为一个 2 字节字符,跟着一个 1 字节字符。\n\n\n
示例 2:
\n\n\n输入:data = [235,140,4]\n输出:false\n解释:数据表示 8 位的序列: 11101011 10001100 00000100.\n前 3 位都是 1 ,第 4 位为 0 表示它是一个 3 字节字符。\n下一个字节是开头为 10 的延续字节,这是正确的。\n但第二个延续字节不以 10 开头,所以是不符合规则的。\n\n\n
\n\n
提示:
\n\n1 <= data.length <= 2 * 104
0 <= data[i] <= 255
\r\nmask = 1 << 7\r\nwhile mask & num:\r\n n_bytes += 1\r\n mask = mask >> 1\r\n\r\n\r\nCan you use bit-masking to perform the second validation as well i.e. checking if the most significant bit is 1 and the second most significant bit a 0?", "To check if the most significant bit is a 1 and the second most significant bit is a 0, we can make use of the following two masks.\r\n\r\n
\r\nmask1 = 1 << 7\r\nmask2 = 1 << 6\r\n\r\nif not (num & mask1 and not (num & mask2)):\r\n return False\r\n" ], "solution": null, "status": null, "sampleTestCase": "[197,130,1]", "metaData": "{\r\n \"name\": \"validUtf8\",\r\n \"params\": [\r\n {\r\n \"name\": \"data\",\r\n \"type\": \"integer[]\"\r\n }\r\n ],\r\n \"return\": {\r\n \"type\": \"boolean\"\r\n }\r\n}", "judgerAvailable": true, "judgeType": "large", "mysqlSchemas": [], "enableRunCode": true, "envInfo": "{\"cpp\":[\"C++\",\"
\\u7248\\u672c\\uff1a \\u7f16\\u8bd1\\u65f6\\uff0c\\u5c06\\u4f1a\\u91c7\\u7528 \\u4e3a\\u4e86\\u4f7f\\u7528\\u65b9\\u4fbf\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8\\u5bfc\\u5165\\u3002<\\/p>\"],\"java\":[\"Java\",\" \\u7248\\u672c\\uff1a \\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u88ab\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u5305\\u542b Pair \\u7c7b: https:\\/\\/docs.oracle.com\\/javase\\/8\\/javafx\\/api\\/javafx\\/util\\/Pair.html <\\/p>\"],\"python\":[\"Python\",\" \\u7248\\u672c\\uff1a \\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u5e38\\u7528\\u5e93\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8 \\u5bfc\\u5165\\uff0c\\u5982\\uff1aarray<\\/a>, bisect<\\/a>, collections<\\/a>\\u3002\\u5982\\u679c\\u60a8\\u9700\\u8981\\u4f7f\\u7528\\u5176\\u4ed6\\u5e93\\u51fd\\u6570\\uff0c\\u8bf7\\u81ea\\u884c\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u6ce8\\u610f Python 2.7 \\u5c06\\u57282020\\u5e74\\u540e\\u4e0d\\u518d\\u7ef4\\u62a4<\\/a>\\u3002 \\u5982\\u60f3\\u4f7f\\u7528\\u6700\\u65b0\\u7248\\u7684Python\\uff0c\\u8bf7\\u9009\\u62e9Python 3\\u3002<\\/p>\"],\"c\":[\"C\",\" \\u7248\\u672c\\uff1a \\u7f16\\u8bd1\\u65f6\\uff0c\\u5c06\\u4f1a\\u91c7\\u7528 \\u4e3a\\u4e86\\u4f7f\\u7528\\u65b9\\u4fbf\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u5982\\u60f3\\u4f7f\\u7528\\u54c8\\u5e0c\\u8868\\u8fd0\\u7b97, \\u60a8\\u53ef\\u4ee5\\u4f7f\\u7528 uthash<\\/a>\\u3002 \\\"uthash.h\\\"\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5bfc\\u5165\\u3002\\u8bf7\\u770b\\u5982\\u4e0b\\u793a\\u4f8b:<\\/p>\\r\\n\\r\\n 1. \\u5f80\\u54c8\\u5e0c\\u8868\\u4e2d\\u6dfb\\u52a0\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n 2. \\u5728\\u54c8\\u5e0c\\u8868\\u4e2d\\u67e5\\u627e\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n 3. \\u4ece\\u54c8\\u5e0c\\u8868\\u4e2d\\u5220\\u9664\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n C# 10<\\/a> \\u8fd0\\u884c\\u5728 .NET 6 \\u4e0a<\\/p>\\r\\n\\r\\n \\u60a8\\u7684\\u4ee3\\u7801\\u5728\\u7f16\\u8bd1\\u65f6\\u9ed8\\u8ba4\\u5f00\\u542f\\u4e86debug\\u6807\\u8bb0( \\u7248\\u672c\\uff1a \\u60a8\\u7684\\u4ee3\\u7801\\u5728\\u6267\\u884c\\u65f6\\u5c06\\u5e26\\u4e0a lodash.js<\\/a> \\u5e93\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5305\\u542b\\u3002<\\/p>\\r\\n\\r\\n \\u5982\\u9700\\u4f7f\\u7528\\u961f\\u5217\\/\\u4f18\\u5148\\u961f\\u5217\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 datastructures-js\\/priority-queue<\\/a> \\u548c datastructures-js\\/queue<\\/a>\\u3002<\\/p>\"],\"ruby\":[\"Ruby\",\" \\u4f7f\\u7528 \\u4e00\\u4e9b\\u5e38\\u7528\\u7684\\u6570\\u636e\\u7ed3\\u6784\\u5df2\\u5728 Algorithms \\u6a21\\u5757\\u4e2d\\u63d0\\u4f9b\\uff1ahttps:\\/\\/www.rubydoc.info\\/github\\/kanwei\\/algorithms\\/Algorithms<\\/p>\"],\"swift\":[\"Swift\",\" \\u7248\\u672c\\uff1a \\u6211\\u4eec\\u901a\\u5e38\\u4fdd\\u8bc1\\u66f4\\u65b0\\u5230 Apple\\u653e\\u51fa\\u7684\\u6700\\u65b0\\u7248Swift<\\/a>\\u3002\\u5982\\u679c\\u60a8\\u53d1\\u73b0Swift\\u4e0d\\u662f\\u6700\\u65b0\\u7248\\u7684\\uff0c\\u8bf7\\u8054\\u7cfb\\u6211\\u4eec\\uff01\\u6211\\u4eec\\u5c06\\u5c3d\\u5feb\\u66f4\\u65b0\\u3002<\\/p>\"],\"golang\":[\"Go\",\" \\u7248\\u672c\\uff1a \\u652f\\u6301 https:\\/\\/godoc.org\\/github.com\\/emirpasic\\/gods<\\/a> \\u7b2c\\u4e09\\u65b9\\u5e93\\u3002<\\/p>\"],\"python3\":[\"Python3\",\" \\u7248\\u672c\\uff1a \\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u5e38\\u7528\\u5e93\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8 \\u5bfc\\u5165\\uff0c\\u5982array<\\/a>, bisect<\\/a>, collections<\\/a>\\u3002 \\u5982\\u679c\\u60a8\\u9700\\u8981\\u4f7f\\u7528\\u5176\\u4ed6\\u5e93\\u51fd\\u6570\\uff0c\\u8bf7\\u81ea\\u884c\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n \\u5982\\u9700\\u4f7f\\u7528 Map\\/TreeMap \\u6570\\u636e\\u7ed3\\u6784\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 sortedcontainers<\\/a> \\u5e93\\u3002<\\/p>\"],\"scala\":[\"Scala\",\" \\u7248\\u672c\\uff1a \\u7248\\u672c\\uff1a \\u7248\\u672c\\uff1a \\u652f\\u6301 crates.io \\u7684 rand<\\/a><\\/p>\"],\"php\":[\"PHP\",\" With bcmath module.<\\/p>\"],\"typescript\":[\"TypeScript\",\" TypeScript 4.5.4<\\/p>\\r\\n\\r\\n Compile Options: --alwaysStrict --strictBindCallApply --strictFunctionTypes --target ES2020<\\/p>\"],\"racket\":[\"Racket\",\"clang 11<\\/code> \\u91c7\\u7528\\u6700\\u65b0C++ 17\\u6807\\u51c6\\u3002<\\/p>\\r\\n\\r\\n
-O2<\\/code>\\u7ea7\\u4f18\\u5316\\u3002AddressSanitizer<\\/a> \\u4e5f\\u88ab\\u5f00\\u542f\\u6765\\u68c0\\u6d4b
out-of-bounds<\\/code>\\u548c
use-after-free<\\/code>\\u9519\\u8bef\\u3002<\\/p>\\r\\n\\r\\n
OpenJDK 17<\\/code>\\u3002\\u53ef\\u4ee5\\u4f7f\\u7528Java 8\\u7684\\u7279\\u6027\\u4f8b\\u5982\\uff0clambda expressions \\u548c stream API\\u3002<\\/p>\\r\\n\\r\\n
Python 2.7.12<\\/code><\\/p>\\r\\n\\r\\n
GCC 8.2<\\/code>\\uff0c\\u91c7\\u7528GNU99\\u6807\\u51c6\\u3002<\\/p>\\r\\n\\r\\n
-O1<\\/code>\\u7ea7\\u4f18\\u5316\\u3002 AddressSanitizer<\\/a>\\u4e5f\\u88ab\\u5f00\\u542f\\u6765\\u68c0\\u6d4b
out-of-bounds<\\/code>\\u548c
use-after-free<\\/code>\\u9519\\u8bef\\u3002<\\/p>\\r\\n\\r\\n
\\r\\nstruct hash_entry {\\r\\n int id; \\/* we'll use this field as the key *\\/\\r\\n char name[10];\\r\\n UT_hash_handle hh; \\/* makes this structure hashable *\\/\\r\\n};\\r\\n\\r\\nstruct hash_entry *users = NULL;\\r\\n\\r\\nvoid add_user(struct hash_entry *s) {\\r\\n HASH_ADD_INT(users, id, s);\\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\\r\\n\\r\\n
\\r\\nstruct hash_entry *find_user(int user_id) {\\r\\n struct hash_entry *s;\\r\\n HASH_FIND_INT(users, &user_id, s);\\r\\n return s;\\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\\r\\n\\r\\n
\\r\\nvoid delete_user(struct hash_entry *user) {\\r\\n HASH_DEL(users, user); \\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\"],\"csharp\":[\"C#\",\"
\\/debug:pdbonly<\\/code>)\\u3002<\\/p>\"],\"javascript\":[\"JavaScript\",\"
Node.js 16.13.2<\\/code><\\/p>\\r\\n\\r\\n
--harmony<\\/code> \\u6807\\u8bb0\\u6765\\u5f00\\u542f \\u65b0\\u7248ES6\\u7279\\u6027<\\/a>\\u3002<\\/p>\\r\\n\\r\\n
Ruby 3.1<\\/code>\\u6267\\u884c<\\/p>\\r\\n\\r\\n
Swift 5.5.2<\\/code><\\/p>\\r\\n\\r\\n
Go 1.17<\\/code><\\/p>\\r\\n\\r\\n
Python 3.10<\\/code><\\/p>\\r\\n\\r\\n
Scala 2.13<\\/code><\\/p>\"],\"kotlin\":[\"Kotlin\",\"
Kotlin 1.3.10<\\/code><\\/p>\"],\"rust\":[\"Rust\",\"
rust 1.58.1<\\/code><\\/p>\\r\\n\\r\\n
PHP 8.1<\\/code>.<\\/p>\\r\\n\\r\\n