mirror of
https://gitee.com/coder-xiaomo/leetcode-problemset
synced 2025-09-02 14:12:17 +08:00
update
This commit is contained in:
@@ -7,12 +7,12 @@
|
||||
"boundTopicId": 1862,
|
||||
"title": "UTF-8 Validation",
|
||||
"titleSlug": "utf-8-validation",
|
||||
"content": "<p>Given an integer array <code>data</code> representing the data, return whether it is a valid <strong>UTF-8</strong> encoding.</p>\n\n<p>A character in <strong>UTF8</strong> can be from <b>1 to 4 bytes</b> long, subjected to the following rules:</p>\n\n<ol>\n\t<li>For a <strong>1-byte</strong> character, the first bit is a <code>0</code>, followed by its Unicode code.</li>\n\t<li>For an <strong>n-bytes</strong> character, the first <code>n</code> bits are all one's, the <code>n + 1</code> bit is <code>0</code>, followed by <code>n - 1</code> bytes with the most significant <code>2</code> bits being <code>10</code>.</li>\n</ol>\n\n<p>This is how the UTF-8 encoding would work:</p>\n\n<pre>\n<code> Char. number range | UTF-8 octet sequence\n (hexadecimal) | (binary)\n --------------------+---------------------------------------------\n 0000 0000-0000 007F | 0xxxxxxx\n 0000 0080-0000 07FF | 110xxxxx 10xxxxxx\n 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx\n 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</code>\n</pre>\n\n<p><b>Note: </b>The input is an array of integers. Only the <b>least significant 8 bits</b> of each integer is used to store the data. This means each integer represents only 1 byte of data.</p>\n\n<p> </p>\n<p><strong>Example 1:</strong></p>\n\n<pre>\n<strong>Input:</strong> data = [197,130,1]\n<strong>Output:</strong> true\n<strong>Explanation:</strong> data represents the octet sequence: 11000101 10000010 00000001.\nIt is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.\n</pre>\n\n<p><strong>Example 2:</strong></p>\n\n<pre>\n<strong>Input:</strong> data = [235,140,4]\n<strong>Output:</strong> false\n<strong>Explanation:</strong> data represented the octet sequence: 11101011 10001100 00000100.\nThe first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.\nThe next byte is a continuation byte which starts with 10 and that's correct.\nBut the second continuation byte does not start with 10, so it is invalid.\n</pre>\n\n<p> </p>\n<p><strong>Constraints:</strong></p>\n\n<ul>\n\t<li><code>1 <= data.length <= 2 * 10<sup>4</sup></code></li>\n\t<li><code>0 <= data[i] <= 255</code></li>\n</ul>\n",
|
||||
"content": "<p>Given an integer array <code>data</code> representing the data, return whether it is a valid <strong>UTF-8</strong> encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).</p>\n\n<p>A character in <strong>UTF8</strong> can be from <strong>1 to 4 bytes</strong> long, subjected to the following rules:</p>\n\n<ol>\n\t<li>For a <strong>1-byte</strong> character, the first bit is a <code>0</code>, followed by its Unicode code.</li>\n\t<li>For an <strong>n-bytes</strong> character, the first <code>n</code> bits are all one's, the <code>n + 1</code> bit is <code>0</code>, followed by <code>n - 1</code> bytes with the most significant <code>2</code> bits being <code>10</code>.</li>\n</ol>\n\n<p>This is how the UTF-8 encoding would work:</p>\n\n<pre>\n Number of Bytes | UTF-8 Octet Sequence\n | (binary)\n --------------------+-----------------------------------------\n 1 | 0xxxxxxx\n 2 | 110xxxxx 10xxxxxx\n 3 | 1110xxxx 10xxxxxx 10xxxxxx\n 4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n</pre>\n\n<p><code>x</code> denotes a bit in the binary form of a byte that may be either <code>0</code> or <code>1</code>.</p>\n\n<p><strong>Note: </strong>The input is an array of integers. Only the <strong>least significant 8 bits</strong> of each integer is used to store the data. This means each integer represents only 1 byte of data.</p>\n\n<p> </p>\n<p><strong>Example 1:</strong></p>\n\n<pre>\n<strong>Input:</strong> data = [197,130,1]\n<strong>Output:</strong> true\n<strong>Explanation:</strong> data represents the octet sequence: 11000101 10000010 00000001.\nIt is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.\n</pre>\n\n<p><strong>Example 2:</strong></p>\n\n<pre>\n<strong>Input:</strong> data = [235,140,4]\n<strong>Output:</strong> false\n<strong>Explanation:</strong> data represented the octet sequence: 11101011 10001100 00000100.\nThe first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.\nThe next byte is a continuation byte which starts with 10 and that's correct.\nBut the second continuation byte does not start with 10, so it is invalid.\n</pre>\n\n<p> </p>\n<p><strong>Constraints:</strong></p>\n\n<ul>\n\t<li><code>1 <= data.length <= 2 * 10<sup>4</sup></code></li>\n\t<li><code>0 <= data[i] <= 255</code></li>\n</ul>\n",
|
||||
"translatedTitle": "UTF-8 编码验证",
|
||||
"translatedContent": "<p>给定一个表示数据的整数数组 <code>data</code> ,返回它是否为有效的 <strong>UTF-8</strong> 编码。</p>\n\n<p><strong>UTF-8</strong> 中的一个字符可能的长度为 <strong>1 到 4 字节</strong>,遵循以下的规则:</p>\n\n<ol>\n\t<li>对于 <strong>1 字节</strong> 的字符,字节的第一位设为 0 ,后面 7 位为这个符号的 unicode 码。</li>\n\t<li>对于 <strong>n 字节</strong> 的字符 (n > 1),第一个字节的前 n 位都设为1,第 n+1 位设为 0 ,后面字节的前两位一律设为 10 。剩下的没有提及的二进制位,全部为这个符号的 unicode 码。</li>\n</ol>\n\n<p>这是 UTF-8 编码的工作方式:</p>\n\n<pre>\n<code> Char. number range | UTF-8 octet sequence\n (hexadecimal) | (binary)\n --------------------+---------------------------------------------\n 0000 0000-0000 007F | 0xxxxxxx\n 0000 0080-0000 07FF | 110xxxxx 10xxxxxx\n 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx\n 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n</code></pre>\n\n<p><strong>注意:</strong>输入是整数数组。只有每个整数的 <strong>最低 8 个有效位</strong> 用来存储数据。这意味着每个整数只表示 1 字节的数据。</p>\n\n<p> </p>\n\n<p><strong>示例 1:</strong></p>\n\n<pre>\n<strong>输入:</strong>data = [197,130,1]\n<strong>输出:</strong>true\n<strong>解释:</strong>数据表示字节序列:<strong>11000101 10000010 00000001</strong>。\n这是有效的 utf-8 编码,为一个 2 字节字符,跟着一个 1 字节字符。\n</pre>\n\n<p><strong>示例 2:</strong></p>\n\n<pre>\n<strong>输入:</strong>data = [235,140,4]\n<strong>输出:</strong>false\n<strong>解释:</strong>数据表示 8 位的序列: <strong>11101011 10001100 00000100</strong>.\n前 3 位都是 1 ,第 4 位为 0 表示它是一个 3 字节字符。\n下一个字节是开头为 10 的延续字节,这是正确的。\n但第二个延续字节不以 10 开头,所以是不符合规则的。\n</pre>\n\n<p> </p>\n\n<p><strong>提示:</strong></p>\n\n<ul>\n\t<li><code>1 <= data.length <= 2 * 10<sup>4</sup></code></li>\n\t<li><code>0 <= data[i] <= 255</code></li>\n</ul>\n",
|
||||
"isPaidOnly": false,
|
||||
"difficulty": "Medium",
|
||||
"likes": 167,
|
||||
"likes": 169,
|
||||
"dislikes": 0,
|
||||
"isLiked": null,
|
||||
"similarQuestions": "[]",
|
||||
@@ -143,7 +143,7 @@
|
||||
"__typename": "CodeSnippetNode"
|
||||
}
|
||||
],
|
||||
"stats": "{\"totalAccepted\": \"33.8K\", \"totalSubmission\": \"76.9K\", \"totalAcceptedRaw\": 33802, \"totalSubmissionRaw\": 76937, \"acRate\": \"43.9%\"}",
|
||||
"stats": "{\"totalAccepted\": \"34.2K\", \"totalSubmission\": \"78K\", \"totalAcceptedRaw\": 34245, \"totalSubmissionRaw\": 78008, \"acRate\": \"43.9%\"}",
|
||||
"hints": [
|
||||
"All you have to do is follow the rules. For a given integer, obtain its binary representation in the string form and work with the rules given in the problem.",
|
||||
"An integer can either represent the start of a UTF-8 character, or a part of an existing UTF-8 character. There are two separate rules for these two scenarios in the problem.",
|
||||
|
Reference in New Issue
Block a user