{ "data": { "question": { "questionId": "393", "questionFrontendId": "393", "categoryTitle": "Algorithms", "boundTopicId": 1862, "title": "UTF-8 Validation", "titleSlug": "utf-8-validation", "content": "

Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).

\n\n

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

\n\n
    \n\t
  1. For a 1-byte character, the first bit is a 0, followed by its Unicode code.
  2. \n\t
  3. For an n-bytes character, the first n bits are all one's, the n + 1 bit is 0, followed by n - 1 bytes with the most significant 2 bits being 10.
  4. \n
\n\n

This is how the UTF-8 encoding would work:

\n\n
\n     Number of Bytes   |        UTF-8 Octet Sequence\n                       |              (binary)\n   --------------------+-----------------------------------------\n            1          |   0xxxxxxx\n            2          |   110xxxxx 10xxxxxx\n            3          |   1110xxxx 10xxxxxx 10xxxxxx\n            4          |   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n
\n\n

x denotes a bit in the binary form of a byte that may be either 0 or 1.

\n\n

Note: The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

\n\n

 

\n

Example 1:

\n\n
\nInput: data = [197,130,1]\nOutput: true\nExplanation: data represents the octet sequence: 11000101 10000010 00000001.\nIt is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.\n
\n\n

Example 2:

\n\n
\nInput: data = [235,140,4]\nOutput: false\nExplanation: data represented the octet sequence: 11101011 10001100 00000100.\nThe first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.\nThe next byte is a continuation byte which starts with 10 and that's correct.\nBut the second continuation byte does not start with 10, so it is invalid.\n
\n\n

 

\n

Constraints:

\n\n\n", "translatedTitle": "UTF-8 编码验证", "translatedContent": "

给定一个表示数据的整数数组 data ,返回它是否为有效的 UTF-8 编码。

\n\n

UTF-8 中的一个字符可能的长度为 1 到 4 字节,遵循以下的规则:

\n\n
    \n\t
  1. 对于 1 字节 的字符,字节的第一位设为 0 ,后面 7 位为这个符号的 unicode 码。
  2. \n\t
  3. 对于 n 字节 的字符 (n > 1),第一个字节的前 n 位都设为1,第 n+1 位设为 0 ,后面字节的前两位一律设为 10 。剩下的没有提及的二进制位,全部为这个符号的 unicode 码。
  4. \n
\n\n

这是 UTF-8 编码的工作方式:

\n\n
\n      Number of Bytes  |        UTF-8 octet sequence\n                       |              (binary)\n   --------------------+---------------------------------------------\n            1          | 0xxxxxxx\n            2          | 110xxxxx 10xxxxxx\n            3          | 1110xxxx 10xxxxxx 10xxxxxx\n            4          | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n
\n\n

x 表示二进制形式的一位,可以是 0 或 1

\n\n

注意:输入是整数数组。只有每个整数的 最低 8 个有效位 用来存储数据。这意味着每个整数只表示 1 字节的数据。

\n\n

 

\n\n

示例 1:

\n\n
\n输入:data = [197,130,1]\n输出:true\n解释:数据表示字节序列:11000101 10000010 00000001。\n这是有效的 utf-8 编码,为一个 2 字节字符,跟着一个 1 字节字符。\n
\n\n

示例 2:

\n\n
\n输入:data = [235,140,4]\n输出:false\n解释:数据表示 8 位的序列: 11101011 10001100 00000100.\n前 3 位都是 1 ,第 4 位为 0 表示它是一个 3 字节字符。\n下一个字节是开头为 10 的延续字节,这是正确的。\n但第二个延续字节不以 10 开头,所以是不符合规则的。\n
\n\n

 

\n\n

提示:

\n\n\n", "isPaidOnly": false, "difficulty": "Medium", "likes": 194, "dislikes": 0, "isLiked": null, "similarQuestions": "[]", "contributors": [], "langToValidPlayground": "{\"cpp\": true, \"java\": true, \"python\": true, \"python3\": true, \"mysql\": false, \"mssql\": false, \"oraclesql\": false, \"c\": false, \"csharp\": false, \"javascript\": false, \"typescript\": false, \"bash\": false, \"php\": false, \"swift\": false, \"kotlin\": false, \"dart\": false, \"golang\": false, \"ruby\": false, \"scala\": false, \"html\": false, \"pythonml\": false, \"rust\": false, \"racket\": false, \"erlang\": false, \"elixir\": false, \"pythondata\": false, \"react\": false, \"vanillajs\": false, \"postgresql\": false}", "topicTags": [ { "name": "Bit Manipulation", "slug": "bit-manipulation", "translatedName": "位运算", "__typename": "TopicTagNode" }, { "name": "Array", "slug": "array", "translatedName": "数组", "__typename": "TopicTagNode" } ], "companyTagStats": null, "codeSnippets": [ { "lang": "C++", "langSlug": "cpp", "code": "class Solution {\npublic:\n bool validUtf8(vector& data) {\n\n }\n};", "__typename": "CodeSnippetNode" }, { "lang": "Java", "langSlug": "java", "code": "class Solution {\n public boolean validUtf8(int[] data) {\n\n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "Python", "langSlug": "python", "code": "class Solution(object):\n def validUtf8(self, data):\n \"\"\"\n :type data: List[int]\n :rtype: bool\n \"\"\"", "__typename": "CodeSnippetNode" }, { "lang": "Python3", "langSlug": "python3", "code": "class Solution:\n def validUtf8(self, data: List[int]) -> bool:", "__typename": "CodeSnippetNode" }, { "lang": "C", "langSlug": "c", "code": "bool validUtf8(int* data, int dataSize) {\n \n}", "__typename": "CodeSnippetNode" }, { "lang": "C#", "langSlug": "csharp", "code": "public class Solution {\n public bool ValidUtf8(int[] data) {\n\n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "JavaScript", "langSlug": "javascript", "code": "/**\n * @param {number[]} data\n * @return {boolean}\n */\nvar validUtf8 = function(data) {\n\n};", "__typename": "CodeSnippetNode" }, { "lang": "TypeScript", "langSlug": "typescript", "code": "function validUtf8(data: number[]): boolean {\n \n};", "__typename": "CodeSnippetNode" }, { "lang": "PHP", "langSlug": "php", "code": "class Solution {\n\n /**\n * @param Integer[] $data\n * @return Boolean\n */\n function validUtf8($data) {\n\n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "Swift", "langSlug": "swift", "code": "class Solution {\n func validUtf8(_ data: [Int]) -> Bool {\n\n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "Kotlin", "langSlug": "kotlin", "code": "class Solution {\n fun validUtf8(data: IntArray): Boolean {\n\n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "Dart", "langSlug": "dart", "code": "class Solution {\n bool validUtf8(List data) {\n \n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "Go", "langSlug": "golang", "code": "func validUtf8(data []int) bool {\n\n}", "__typename": "CodeSnippetNode" }, { "lang": "Ruby", "langSlug": "ruby", "code": "# @param {Integer[]} data\n# @return {Boolean}\ndef valid_utf8(data)\n\nend", "__typename": "CodeSnippetNode" }, { "lang": "Scala", "langSlug": "scala", "code": "object Solution {\n def validUtf8(data: Array[Int]): Boolean = {\n\n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "Rust", "langSlug": "rust", "code": "impl Solution {\n pub fn valid_utf8(data: Vec) -> bool {\n\n }\n}", "__typename": "CodeSnippetNode" }, { "lang": "Racket", "langSlug": "racket", "code": "(define/contract (valid-utf8 data)\n (-> (listof exact-integer?) boolean?)\n )", "__typename": "CodeSnippetNode" }, { "lang": "Erlang", "langSlug": "erlang", "code": "-spec valid_utf8(Data :: [integer()]) -> boolean().\nvalid_utf8(Data) ->\n .", "__typename": "CodeSnippetNode" }, { "lang": "Elixir", "langSlug": "elixir", "code": "defmodule Solution do\n @spec valid_utf8(data :: [integer]) :: boolean\n def valid_utf8(data) do\n \n end\nend", "__typename": "CodeSnippetNode" } ], "stats": "{\"totalAccepted\": \"40.2K\", \"totalSubmission\": \"92K\", \"totalAcceptedRaw\": 40208, \"totalSubmissionRaw\": 92043, \"acRate\": \"43.7%\"}", "hints": [ "Read the data integer by integer. When you read it, process the least significant 8 bits of it.", "Assume the next encoding is 1-byte data. If it is not 1-byte data, read the next integer and assume it is 2-bytes data.", "Similarly, if it is not 2-bytes data, try 3-bytes then 4-bytes. If you read four integers and it still does not match any pattern, return false." ], "solution": null, "status": null, "sampleTestCase": "[197,130,1]", "metaData": "{\r\n \"name\": \"validUtf8\",\r\n \"params\": [\r\n {\r\n \"name\": \"data\",\r\n \"type\": \"integer[]\"\r\n }\r\n ],\r\n \"return\": {\r\n \"type\": \"boolean\"\r\n }\r\n}", "judgerAvailable": true, "judgeType": "large", "mysqlSchemas": [], "enableRunCode": true, "envInfo": "{\"cpp\":[\"C++\",\"

\\u7248\\u672c\\uff1aclang 11<\\/code> \\u91c7\\u7528\\u6700\\u65b0C++ 20\\u6807\\u51c6\\u3002<\\/p>\\r\\n\\r\\n

\\u7f16\\u8bd1\\u65f6\\uff0c\\u5c06\\u4f1a\\u91c7\\u7528-O2<\\/code>\\u7ea7\\u4f18\\u5316\\u3002AddressSanitizer<\\/a> \\u4e5f\\u88ab\\u5f00\\u542f\\u6765\\u68c0\\u6d4bout-of-bounds<\\/code>\\u548cuse-after-free<\\/code>\\u9519\\u8bef\\u3002<\\/p>\\r\\n\\r\\n

\\u4e3a\\u4e86\\u4f7f\\u7528\\u65b9\\u4fbf\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8\\u5bfc\\u5165\\u3002<\\/p>\"],\"java\":[\"Java\",\"

\\u7248\\u672c\\uff1aOpenJDK 17<\\/code>\\u3002\\u53ef\\u4ee5\\u4f7f\\u7528Java 8\\u7684\\u7279\\u6027\\u4f8b\\u5982\\uff0clambda expressions \\u548c stream API\\u3002<\\/p>\\r\\n\\r\\n

\\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u88ab\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n

\\u5305\\u542b Pair \\u7c7b: https:\\/\\/docs.oracle.com\\/javase\\/8\\/javafx\\/api\\/javafx\\/util\\/Pair.html <\\/p>\"],\"python\":[\"Python\",\"

\\u7248\\u672c\\uff1a Python 2.7.12<\\/code><\\/p>\\r\\n\\r\\n

\\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u5e38\\u7528\\u5e93\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8 \\u5bfc\\u5165\\uff0c\\u5982\\uff1aarray<\\/a>, bisect<\\/a>, collections<\\/a>\\u3002\\u5982\\u679c\\u60a8\\u9700\\u8981\\u4f7f\\u7528\\u5176\\u4ed6\\u5e93\\u51fd\\u6570\\uff0c\\u8bf7\\u81ea\\u884c\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n

\\u6ce8\\u610f Python 2.7 \\u5c06\\u57282020\\u5e74\\u540e\\u4e0d\\u518d\\u7ef4\\u62a4<\\/a>\\u3002 \\u5982\\u60f3\\u4f7f\\u7528\\u6700\\u65b0\\u7248\\u7684Python\\uff0c\\u8bf7\\u9009\\u62e9Python 3\\u3002<\\/p>\"],\"c\":[\"C\",\"

\\u7248\\u672c\\uff1aGCC 8.2<\\/code>\\uff0c\\u91c7\\u7528GNU11\\u6807\\u51c6\\u3002<\\/p>\\r\\n\\r\\n

\\u7f16\\u8bd1\\u65f6\\uff0c\\u5c06\\u4f1a\\u91c7\\u7528-O1<\\/code>\\u7ea7\\u4f18\\u5316\\u3002 AddressSanitizer<\\/a>\\u4e5f\\u88ab\\u5f00\\u542f\\u6765\\u68c0\\u6d4bout-of-bounds<\\/code>\\u548cuse-after-free<\\/code>\\u9519\\u8bef\\u3002<\\/p>\\r\\n\\r\\n

\\u4e3a\\u4e86\\u4f7f\\u7528\\u65b9\\u4fbf\\uff0c\\u5927\\u90e8\\u5206\\u6807\\u51c6\\u5e93\\u7684\\u5934\\u6587\\u4ef6\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n

\\u5982\\u60f3\\u4f7f\\u7528\\u54c8\\u5e0c\\u8868\\u8fd0\\u7b97, \\u60a8\\u53ef\\u4ee5\\u4f7f\\u7528 uthash<\\/a>\\u3002 \\\"uthash.h\\\"\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5bfc\\u5165\\u3002\\u8bf7\\u770b\\u5982\\u4e0b\\u793a\\u4f8b:<\\/p>\\r\\n\\r\\n

1. \\u5f80\\u54c8\\u5e0c\\u8868\\u4e2d\\u6dfb\\u52a0\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n

\\r\\nstruct hash_entry {\\r\\n    int id;            \\/* we'll use this field as the key *\\/\\r\\n    char name[10];\\r\\n    UT_hash_handle hh; \\/* makes this structure hashable *\\/\\r\\n};\\r\\n\\r\\nstruct hash_entry *users = NULL;\\r\\n\\r\\nvoid add_user(struct hash_entry *s) {\\r\\n    HASH_ADD_INT(users, id, s);\\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\\r\\n\\r\\n

2. \\u5728\\u54c8\\u5e0c\\u8868\\u4e2d\\u67e5\\u627e\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n

\\r\\nstruct hash_entry *find_user(int user_id) {\\r\\n    struct hash_entry *s;\\r\\n    HASH_FIND_INT(users, &user_id, s);\\r\\n    return s;\\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\\r\\n\\r\\n

3. \\u4ece\\u54c8\\u5e0c\\u8868\\u4e2d\\u5220\\u9664\\u4e00\\u4e2a\\u5bf9\\u8c61\\uff1a<\\/b>\\r\\n

\\r\\nvoid delete_user(struct hash_entry *user) {\\r\\n    HASH_DEL(users, user);  \\r\\n}\\r\\n<\\/pre>\\r\\n<\\/p>\"],\"csharp\":[\"C#\",\"

C# 10<\\/a> \\u8fd0\\u884c\\u5728 .NET 6 \\u4e0a<\\/p>\"],\"javascript\":[\"JavaScript\",\"

\\u7248\\u672c\\uff1aNode.js 16.13.2<\\/code><\\/p>\\r\\n\\r\\n

\\u60a8\\u7684\\u4ee3\\u7801\\u5728\\u6267\\u884c\\u65f6\\u5c06\\u5e26\\u4e0a --harmony<\\/code> \\u6807\\u8bb0\\u6765\\u5f00\\u542f \\u65b0\\u7248ES6\\u7279\\u6027<\\/a>\\u3002<\\/p>\\r\\n\\r\\n

lodash.js<\\/a> \\u5e93\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5305\\u542b\\u3002<\\/p>\\r\\n\\r\\n

\\u5982\\u9700\\u4f7f\\u7528\\u961f\\u5217\\/\\u4f18\\u5148\\u961f\\u5217\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 datastructures-js\\/priority-queue@5.3.0<\\/a> \\u548c datastructures-js\\/queue@4.2.1<\\/a>\\u3002<\\/p>\"],\"ruby\":[\"Ruby\",\"

\\u4f7f\\u7528Ruby 3.1<\\/code>\\u6267\\u884c<\\/p>\\r\\n\\r\\n

\\u4e00\\u4e9b\\u5e38\\u7528\\u7684\\u6570\\u636e\\u7ed3\\u6784\\u5df2\\u5728 Algorithms \\u6a21\\u5757\\u4e2d\\u63d0\\u4f9b\\uff1ahttps:\\/\\/www.rubydoc.info\\/github\\/kanwei\\/algorithms\\/Algorithms<\\/p>\"],\"swift\":[\"Swift\",\"

\\u7248\\u672c\\uff1aSwift 5.5.2<\\/code><\\/p>\\r\\n\\r\\n

\\u6211\\u4eec\\u901a\\u5e38\\u4fdd\\u8bc1\\u66f4\\u65b0\\u5230 Apple\\u653e\\u51fa\\u7684\\u6700\\u65b0\\u7248Swift<\\/a>\\u3002\\u5982\\u679c\\u60a8\\u53d1\\u73b0Swift\\u4e0d\\u662f\\u6700\\u65b0\\u7248\\u7684\\uff0c\\u8bf7\\u8054\\u7cfb\\u6211\\u4eec\\uff01\\u6211\\u4eec\\u5c06\\u5c3d\\u5feb\\u66f4\\u65b0\\u3002<\\/p>\"],\"golang\":[\"Go\",\"

\\u7248\\u672c\\uff1aGo 1.21<\\/code><\\/p>\\r\\n\\r\\n

\\u652f\\u6301 https:\\/\\/godoc.org\\/github.com\\/emirpasic\\/gods@v1.18.1<\\/a> \\u7b2c\\u4e09\\u65b9\\u5e93\\u3002<\\/p>\"],\"python3\":[\"Python3\",\"

\\u7248\\u672c\\uff1aPython 3.10<\\/code><\\/p>\\r\\n\\r\\n

\\u4e3a\\u4e86\\u65b9\\u4fbf\\u8d77\\u89c1\\uff0c\\u5927\\u90e8\\u5206\\u5e38\\u7528\\u5e93\\u5df2\\u7ecf\\u88ab\\u81ea\\u52a8 \\u5bfc\\u5165\\uff0c\\u5982array<\\/a>, bisect<\\/a>, collections<\\/a>\\u3002 \\u5982\\u679c\\u60a8\\u9700\\u8981\\u4f7f\\u7528\\u5176\\u4ed6\\u5e93\\u51fd\\u6570\\uff0c\\u8bf7\\u81ea\\u884c\\u5bfc\\u5165\\u3002<\\/p>\\r\\n\\r\\n

\\u5982\\u9700\\u4f7f\\u7528 Map\\/TreeMap \\u6570\\u636e\\u7ed3\\u6784\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 sortedcontainers<\\/a> \\u5e93\\u3002<\\/p>\"],\"scala\":[\"Scala\",\"

\\u7248\\u672c\\uff1aScala 2.13<\\/code><\\/p>\"],\"kotlin\":[\"Kotlin\",\"

\\u7248\\u672c\\uff1aKotlin 1.9.0<\\/code><\\/p>\\r\\n\\r\\n

\\u6211\\u4eec\\u4f7f\\u7528\\u7684\\u662f JetBrains \\u63d0\\u4f9b\\u7684 experimental compiler\\u3002\\u5982\\u679c\\u60a8\\u8ba4\\u4e3a\\u60a8\\u9047\\u5230\\u4e86\\u7f16\\u8bd1\\u5668\\u76f8\\u5173\\u7684\\u95ee\\u9898\\uff0c\\u8bf7\\u5411\\u6211\\u4eec\\u53cd\\u9988<\\/p>\"],\"rust\":[\"Rust\",\"

\\u7248\\u672c\\uff1arust 1.58.1<\\/code><\\/p>\\r\\n\\r\\n

\\u652f\\u6301 crates.io \\u7684 rand<\\/a><\\/p>\"],\"php\":[\"PHP\",\"

PHP 8.1<\\/code>.<\\/p>\\r\\n\\r\\n

With bcmath module.<\\/p>\"],\"typescript\":[\"TypeScript\",\"

TypeScript 5.1.6<\\/p>\\r\\n\\r\\n

Compile Options: --alwaysStrict --strictBindCallApply --strictFunctionTypes --target ES2022<\\/p>\\r\\n\\r\\n

lodash.js<\\/a> \\u5e93\\u5df2\\u7ecf\\u9ed8\\u8ba4\\u88ab\\u5305\\u542b\\u3002<\\/p>\\r\\n\\r\\n

\\u5982\\u9700\\u4f7f\\u7528\\u961f\\u5217\\/\\u4f18\\u5148\\u961f\\u5217\\uff0c\\u60a8\\u53ef\\u4f7f\\u7528 datastructures-js\\/priority-queue@5.3.0<\\/a> \\u548c datastructures-js\\/queue@4.2.1<\\/a>\\u3002<\\/p>\"],\"racket\":[\"Racket\",\"

Racket CS<\\/a> v8.3<\\/p>\\r\\n\\r\\n

\\u4f7f\\u7528 #lang racket<\\/p>\\r\\n\\r\\n

\\u5df2\\u9884\\u5148 (require data\\/gvector data\\/queue data\\/order data\\/heap). \\u82e5\\u9700\\u4f7f\\u7528\\u5176\\u5b83\\u6570\\u636e\\u7ed3\\u6784\\uff0c\\u53ef\\u81ea\\u884c require\\u3002<\\/p>\"],\"erlang\":[\"Erlang\",\"Erlang\\/OTP 24.2\"],\"elixir\":[\"Elixir\",\"Elixir 1.13.0 with Erlang\\/OTP 24.2\"],\"dart\":[\"Dart\",\"

Dart 2.17.3<\\/p>\\r\\n\\r\\n

\\u60a8\\u7684\\u4ee3\\u7801\\u5c06\\u4f1a\\u88ab\\u4e0d\\u7f16\\u8bd1\\u76f4\\u63a5\\u8fd0\\u884c<\\/p>\"]}", "book": null, "isSubscribed": false, "isDailyQuestion": false, "dailyRecordStatus": null, "editorType": "CKEDITOR", "ugcQuestionId": null, "style": "LEETCODE", "exampleTestcases": "[197,130,1]\n[235,140,4]", "__typename": "QuestionNode" } } }