1
0
mirror of https://gitee.com/coder-xiaomo/leetcode-problemset synced 2025-09-05 15:31:43 +08:00
Code Issues Projects Releases Wiki Activity GitHub Gitee
Files
leetcode-problemset/leetcode-cn/problem (Chinese)/DNA 模式识别 [dna-pattern-recognition].html
2025-03-14 03:44:12 +08:00

132 lines
4.9 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<p>表:<code>Samples</code></p>
<pre>
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| sample_id | int |
| dna_sequence | varchar |
| species | varchar |
+----------------+---------+
sample_id 是这张表的唯一主键。
每一行包含一个 DNA 序列以一个字符ATGC组成的字符串表示以及它所采集自的物种。
</pre>
<p>生物学家正在研究 DNA 序列中的基本模式。编写一个解决方案以识别具有以下模式的&nbsp;<code>sample_id</code></p>
<ul>
<li>&nbsp;<strong>ATG</strong> <strong>开头</strong>&nbsp;的序列(一个常见的 <strong>起始密码子</strong></li>
<li><strong>TAA</strong><strong>TAG</strong>&nbsp;&nbsp;<strong>TGA</strong>&nbsp;<strong>结尾</strong>&nbsp;的序列(终止密码子)</li>
<li>包含基序 <strong>ATAT</strong> 的序列(一个简单重复模式)</li>
<li><strong>至少</strong>&nbsp;<code>3</code>&nbsp;<strong>个连续</strong>&nbsp;<strong>G</strong>&nbsp;的序列(如&nbsp;<strong>GGG</strong>&nbsp;&nbsp;<strong>GGGG</strong></li>
</ul>
<p>返回结果表以&nbsp;sample_id <strong>升序</strong>&nbsp;排序<em></em></p>
<p>结果格式如下所示。</p>
<p>&nbsp;</p>
<p><strong class="example">示例:</strong></p>
<div class="example-block">
<p><strong>输入:</strong></p>
<p>Samples 表:</p>
<pre class="example-io">
+-----------+------------------+-----------+
| sample_id | dna_sequence | species |
+-----------+------------------+-----------+
| 1 | ATGCTAGCTAGCTAA | Human |
| 2 | GGGTCAATCATC | Human |
| 3 | ATATATCGTAGCTA | Human |
| 4 | ATGGGGTCATCATAA | Mouse |
| 5 | TCAGTCAGTCAG | Mouse |
| 6 | ATATCGCGCTAG | Zebrafish |
| 7 | CGTATGCGTCGTA | Zebrafish |
+-----------+------------------+-----------+
</pre>
<p><strong>输出:</strong></p>
<pre class="example-io">
+-----------+------------------+-------------+-------------+------------+------------+------------+
| sample_id | dna_sequence | species | has_start | has_stop | has_atat | has_ggg |
+-----------+------------------+-------------+-------------+------------+------------+------------+
| 1 | ATGCTAGCTAGCTAA | Human | 1 | 1 | 0 | 0 |
| 2 | GGGTCAATCATC | Human | 0 | 0 | 0 | 1 |
| 3 | ATATATCGTAGCTA | Human | 0 | 0 | 1 | 0 |
| 4 | ATGGGGTCATCATAA | Mouse | 1 | 1 | 0 | 1 |
| 5 | TCAGTCAGTCAG | Mouse | 0 | 0 | 0 | 0 |
| 6 | ATATCGCGCTAG | Zebrafish | 0 | 1 | 1 | 0 |
| 7 | CGTATGCGTCGTA | Zebrafish | 0 | 0 | 0 | 0 |
+-----------+------------------+-------------+-------------+------------+------------+------------+
</pre>
<p><strong>解释:</strong></p>
<ul>
<li>样本 1ATGCTAGCTAGCTAA
<ul>
<li>以 ATG 开头has_start = 1</li>
<li>以 TAA 结尾has_stop = 1</li>
<li>不包含 ATAThas_atat = 0</li>
<li>不包含至少 3 个连续 Ghas_ggg = 0</li>
</ul>
</li>
<li>样本 2GGGTCAATCATC
<ul>
<li>不以 ATG 开头has_start = 0</li>
<li>不以 TAATAG 或 TGA 结尾has_stop = 0</li>
<li>不包含 ATAThas_atat = 0</li>
<li>包含 GGGhas_ggg = 1</li>
</ul>
</li>
<li>样本 3ATATATCGTAGCTA
<ul>
<li>不以 ATG 开头has_start = 0</li>
<li>不以 TAATAG 或 TGA 结尾has_stop = 0</li>
<li>包含 ATAThas_atat = 1</li>
<li>不包含至少 3 个连续 Ghas_ggg = 0</li>
</ul>
</li>
<li>样本 4ATGGGGTCATCATAA
<ul>
<li>以 ATG 开头has_start = 1</li>
<li>以 TAA 结尾has_stop = 1</li>
<li>不包含 ATAThas_atat = 0</li>
<li>包含 GGGGhas_ggg = 1</li>
</ul>
</li>
<li>样本 5TCAGTCAGTCAG
<ul>
<li>不匹配任何模式(所有字段 = 0</li>
</ul>
</li>
<li>样本 6ATATCGCGCTAG
<ul>
<li>不以 ATG 开头has_start = 0</li>
<li>以 TAG 结尾has_stop = 1</li>
<li>包含 ATAThas_atat = 1</li>
<li>不包含至少 3 个连续 Ghas_ggg = 0</li>
</ul>
</li>
<li>样本 7CGTATGCGTCGTA
<ul>
<li>不以 ATG 开头has_start = 0</li>
<li>不以 TAATAG 或 TGA 结尾has_stop = 0</li>
<li>不包含 ATAThas_atat = 0</li>
<li>不包含至少 3 个连续 Ghas_ggg = 0</li>
</ul>
</li>
</ul>
<p><strong>注意:</strong></p>
<ul>
<li>结果以 sample_id 升序排序</li>
<li>对于每个模式1 表示该模式存在0 表示不存在</li>
</ul>
</div>