DynamoDBの設計手法 – 伊達研究室（旧下條研究室）

はじめに

この記事の目的は、DynamoDBのテーブル設計の全工程を説明することです。

先日私はインターン先のReviewBankで業務の一環としてDynamoDBのテーブル設計を行ないました。DynamoDBのテーブル設計はRDBのテーブル設計に比べると手順が確立されていません。ネット上の記事は断片的なものが多く、設計の流れの全体を掴むのに時間がかかりました。そこで得た知見をまとめ、この記事を読むだけでDynamoDBのテーブル設計が行えるようにします。

想定する読者

RDBのモデル設計ができる
DynamoDBの概念を理解している
- table, attribute, item
- Primary Key
- Partition Key
- Sort Key
- Global Secondary Index

DB設計の流れ

RDBのエンティティを定義する
アクセスパターンを書き出し,クエリを明らかにする
テーブルの数と、各モデルをどのテーブルに配置するか決める
各モデルのPrimary Keyを決める
アクセスパターンのうちPrimary Keyで実現できないものをGSIとして指定する
複数のモデルを一つのテーブルに統合する

サービスのサンプル

本記事では具体例を示しながらDynamoDBのテーブル設計手法を説明します。例にQiitaのようなサービスを考えます。

機能

投稿一覧が見れる
投稿を作成できる
投稿に対してコメントを作成できる
投稿にいいねができる
特定のユーザーの投稿一覧が見れる
投稿にジャンルを設定できる

ER図

この情報をもとに、最終的には次のようなテーブル定義書を作ります。

元のModel	Primary Key		Attributes
	Partition Key	Sort Key
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK	GSI2PK	GSI3PK
Post	77fcab6b	post	Post_2021-12-03T10:06:57.650Z	和田哲也	データベース	content	updated_at	title
						この記事の目的は…		DynamoDBのテーブル…
Comment	81fcab6a	77fcab6b	Comment_2021-12-03T10:11:44.137Z			content	updated_at	comment_author
						面白いですね		山田太郎
Like	6b9b9319	77fcab6b	Like_2021-12-03T10:12:45.033Z			like_author
						山田太郎

以下でステップバイステップで設計手順を解説します。

アクセスパターンを書き出し、クエリを明らかにする

アクセスパターンとは、どのようなデータを取得するか、ということです。

クエリとは、どのモデルに対してPK(Partition Key)とSK(Sort Key)を何に設定して取得するか、ということです。

アクセスパターンの例

投稿一覧を新しい順で取得する
指定されたidの投稿を一件取得する
指定されたidの投稿に紐づくコメントを古い順で全て取得する
指定されたidの投稿に紐づくいいねを全て取得する
指定されたユーザーが作成した投稿一覧を新しい順で取得する
指定されたジャンルの投稿一覧を新しい順で取得する

クエリを設計する

ここで、現在のモデル設計では投稿一覧を取得できないことに気がつきます。そこで次のように修正します。

Postモデルにattribute “type” を追加しました。typeには常に”post”という文字列を格納することにします。これにより、次のようなクエリが設計できます。

アクセスパターン	Model	Partition Key	Sort Key
投稿一覧を新しい順で取得する	Post	type	created_at
指定されたidの投稿を一件取得する	Post	id
指定されたidの投稿に紐づくコメントを古い順で全て取得する	Comment	post_id	created_at
指定されたidの投稿に紐づくいいねを全て取得する	Like	post_id	created_at
指定されたユーザーが作成した投稿一覧を新しい順で取得する	Post	author	created_at
指定されたジャンルの投稿一覧を新しい順で取得する	Post	genre	created_at

テーブルの数と、各モデルをどのテーブルに配置するか決める

RDBと違い、DynamoDBでは複数のモデルを一つのテーブルに格納することができます。

通常は全てのモデルを一つのテーブルに格納しますが、特定の状況では複数のテーブルに分けて格納する場合もあります。

複数のテーブルに分割する例

課金方式に関するキャパシティ最適化のため

これを行う可能性があるのは従量課金のクラウドサービスを利用している場合のみです。アクセス頻度が似たグループごとにテーブルを分割することで、料金を最適化できる場合があります。

クエリに関するキャパシティ最適化のため

これを行う可能性があるのは３つ以上のattributeを指定して複数のitemを取得する場合のみです。

通常のDynamoDBテーブルから３つ以上のattributeを指定して複数のitemを取得する場合、Partition keyでitemを取得してから2,3番目のattributeでfilterすることになり、効率が下がります。

そこで、３つのうち一つのattributeで事前にテーブルを分けておくことで上記問題を解決します。ただし、テーブルをまたぐクエリの実装には一手間かかることになります。

そもそも、このような要件が発生する場合はそもそもDynamoDBに向いていないと考えられるので、Elastic SearchやRDBを使うべきです。

本記事の例では一つのテーブルに格納することとします。

各モデルのPrimary Keyを決める

この時

Primary Keyはテーブル内で一意になるようにします。
複数の実現方法がある場合はアクセスパターンに登場するものを優先します。
Primary KeyのPartition Keyは出来るだけデータが均等に分かれるように決めます。これは、特定のPartitionにアクセスが集中するとDynamoDBのアクセス性能が発揮されにくくなるためです。

現段階でテーブル定義書は次のようになります。

Post	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key
	id	type	title	author	content	genre	created_at	updated_at
	77fcab6b	post	DynamoDBのテーブル…	和田哲也	この記事の目的は…	データベース	2021-12-03T10:06:57.650Z

Comment	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key
	id	post_id	author	content	created_at	updated_at
	81fcab6a	77fcab6b	山田太郎	面白いですね	2021-12-03T10:11:44.137Z

Like	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key
	id	post_id	author	created_at
	6b9b9319	77fcab6b	山田太郎	2021-12-03T10:12:45.033Z

Primary Keyで実現できたアクセスパターンは

投稿一覧を新しい順で取得する

のみです。

アクセスパターンのうちPrimary Keyで実現できないものをGSIとして指定する

残りのアクセスパターンをGSIで実現します。

Post	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key, GSI1PK		GSI2PK		GSI3PK	GSI1SK,GSI2SK,GSI3SK
	id	type	title	author	content	genre	created_at	updated_at
	77fcab6b	post	DynamoDBのテーブル…	和田哲也	この記事の目的は…	データベース	2021-12-03T10:06:57.650Z

Comment	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK			GSI1SK
	id	post_id	author	content	created_at	updated_at
	81fcab6a	77fcab6b	山田太郎	面白いですね	2021-12-03T10:11:44.137Z

Like	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK		GSI1SK
	id	post_id	author	created_at
	6b9b9319	77fcab6b	山田太郎	2021-12-03T10:12:45.033Z

複数のモデルを一つのテーブルに統合する

３ステップで行います。

1. attribute名を共通化する

最終的には各モデルを統合して一つのテーブルを作ります。その準備として各モデルのattribute名をできる限り共通化します。

PK,SKは必ず共通化します。GSIを貼るattributeは出来るだけ共通化した方がGSIの数を抑えることができます。その他のattributeについては任意です。

まず、分かりやすいようにattributeの順番を次のように入れ替えます。

Post	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key, GSI1PK	GSI1SK,GSI2SK,GSI3SK	GSI2PK				GSI3PK
	id	type	created_at	author	content	updated_at	title	genre
	77fcab6b	post	2021-12-03T10:06:57.650Z	和田哲也	この記事の目的は…		DynamoDBのテーブル…	データベース

Comment	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK	GSI1SK
	id	post_id	created_at	author	content	updated_at
	81fcab6a	77fcab6b	2021-12-03T10:11:44.137Z	山田太郎	面白いですね

Like	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK	GSI1SK
	id	post_id	created_at	author
	6b9b9319	77fcab6b	2021-12-03T10:12:45.033Z	山田太郎

attribute名を共通化すると次のようになります。

Post	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key, GSI1PK	GSI1SK,GSI2SK,GSI3SK	GSI2PK				GSI3PK
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK	author	content	updated_at	title	genre
	77fcab6b	post	2021-12-03T10:06:57.650Z	和田哲也	この記事の目的は…		DynamoDBのテーブル…	データベース

Comment	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK	GSI1SK
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK	author	content	updated_at
	81fcab6a	77fcab6b	2021-12-03T10:11:44.137Z	山田太郎	面白いですね

Like	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK	GSI1SK
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK	author
	6b9b9319	77fcab6b	2021-12-03T10:12:45.033Z	山田太郎

2. 各アクセスパターンにおいて他のモデルのデータが混ざらないことを確認する

混ざる可能性がある場合は次のどちらかの対応をします。

attribute名を別のものにする
データにモデルを表す接頭辞をつける

基本的に1の対応をした方が実装コストが少なくなります。1の対応をすることでGSIが増えてしまう場合、2の対応をした方が良いことがあります。

今回の場合、

指定されたidの投稿に紐づくコメントを古い順で全て取得する(a)
指定されたidの投稿に紐づくいいねを全て取得する(b)
指定されたユーザーが作成した投稿一覧を新しい順で取得する(c)

の結果が混ざってしまいます。

(a)、(b)は1の対応をするとGSIが増えてしまうため、2の対応をします。(c)は1の対応をします。

修正すると次のようになります。

Post	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key, GSI1PK	GSI1SK,GSI2SK,GSI3SK	GSI2PK				GSI3PK
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK	GSI2PK	content	updated_at	title	GSI3PK
	77fcab6b	post	Post_2021-12-03T10:06:57.650Z	和田哲也	この記事の目的は…		DynamoDBのテーブル…	データベース

Comment	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK	GSI1SK
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK		content	updated_at	comment_author
	81fcab6a	77fcab6b	Comment_2021-12-03T10:11:44.137Z		面白いですね		山田太郎

Like	Primary Key		Attributes	Attributes	Attributes	Attributes	Attributes	Attributes
	Partition Key	Sort Key,GSI1PK	GSI1SK
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK				like_author
	6b9b9319	77fcab6b	Like_2021-12-03T10:12:45.033Z				山田太郎

3. テーブルを統合する

元のModel	Primary Key		Attributes
	Partition Key	Sort Key
	PK	SK_GSI1PK	GSI1SK_GSI2SK_GSI3SK	GSI2PK	GSI3PK
Post	77fcab6b	post	Post_2021-12-03T10:06:57.650Z	和田哲也	データベース	content	updated_at	title
						この記事の目的は…		DynamoDBのテーブル…
Comment	81fcab6a	77fcab6b	Comment_2021-12-03T10:11:44.137Z			content	updated_at	comment_author
						面白いですね		山田太郎
Like	6b9b9319	77fcab6b	Like_2021-12-03T10:12:45.033Z			like_author
						山田太郎

以上でテーブル設計は終了です。上の表がDynamoDBのテーブル定義書となります。