Bigtable: A Distributed Storage System for Structured Data : part2 Data Model (数据模型)

2 Data Model
A Bigtable is a sparse, distributed, persistent multidimensional sorted map. 
The map is indexed by a row key, column key, and a timestamp; 
each value in the map is an uninterpreted array of bytes.
(row:string, column:string, time:int64) → string

map 中的每个值都是一个未解释的字节数组。


Bigtable: A Distributed Storage System for Structured Data : part2 Data Model (数据模型)

Figure 1: A slice of an example table that stores Web pages. 
The row name is a reversed URL. 
The contents column family contains the page contents, 
and the anchor column family contains the text of any anchors that reference the page. 
CNN’s home page is referenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named and 
Each anchor cell has one version; 
the contents column has three versions, at timestamps t3 , t5 , and t6 .

(PS : 有两种column :一种是内容的,一种是anchor的。)
CNN的主页由 Sports Illustrated 和 MY-look 主页引用,因此该行包含名为anchor的列:
每个anchor单元有一个版本;                              (1->1)

We settled on this data model after examining a variety of potential uses of a Bigtable-like system. 
As one concrete example that drove some of our design decisions, suppose we want to keep a copy of a large collection of web pages and related information that could be used by many different projects; let us call this particular table the Webtable. 
In Webtable, we would use URLs as row keys, various aspects of web pages as column names, and store the contents of the web pages in the contents: column under the timestamps when they were fetched, as illustrated in Figure 1.


The row keys in a table are arbitrary strings (currently up to 64KB in size, although 10-100 bytes is a typical size for most of our users). Every read or write of data under a single row key is atomic (regardless of the number of different columns being read or written in the row),a design decision that makes it easier for clients to reason about the system’s behavior in the presence of concurrent updates to the same row.


Bigtable maintains data in lexicographic order by row key. 
The row range for a table is dynamically partitioned.
Each row range is called a tablet, which is the unit of distribution and load balancing. 
As a result, reads of short row ranges are efficient and typically require communication with only a small number of machines. 
Clients can exploit this property by selecting their row keys so that they get good locality for their data accesses. 
For example, in Webtable, pages in the same domain are grouped together into contiguous rows by reversing the hostname components of the URLs. 
For example, we store data for under the key 
Storing pages from the same domain near each other makes some host and domain analyses more efficient.

Bigtable通过 row key 维护字典顺序的数据。
表的 row 范围是动态分区的。
客户可以通过选择它们的 row key 来利用此属性,以便他们获得良好的数据访问位置。
例如,我们会 在 key 下存储的数据(value)。
在key 对应的字符串存储对应的value 。

Column Families
Column keys are grouped into sets called column families, which form the basic unit of access control. 
All data stored in a column family is usually of the same type (we compress data in the same column family together). 
A column family must be created before data can be stored under any column key in that family; 
after a family has been created, any column key within the family can be used. 
It is our intent that the number of distinct column families in a table be small (in the hundreds at most), and that families rarely change during operation. 
In contrast, a table may have an unbounded number of columns.

Column键被分组成称为 Column 族的集合,这是组成访问控制的基本单元。
存储在 Column 系列中的所有数据通常是相同的类型(我们将数据压缩在同一 Column 系列中)。
必须创建Column族,才能将数据存储在该族中的任何Column  key 下;
families成立后,可以使用families内的任何column key。
我们的意图是,表中不同column families的数量很小(最多为数百个),并且在操作期间families很少改变。

A column key is named using the following syntax:
Column family names must be printable, but qualifiers may be arbitrary strings. 
An example column family for the Webtable is language, which stores the language in which a web page was written. 
We use only one column key in the language family, and it stores each web page’s language ID. 
Another useful column family for this table is anchor; each column key in this family represents a single anchor, as shown in Figure 1. 
The qualifier is the name of the referring site; the cell contents is the link text.
Access control and both disk and memory accounting are performed at the column-family level. 
In our Webtable example, these controls allow us to manage several different types of applications: 
some that add new base data, 
some that read the base data and create derived column families, 
and some that are only allowed to view existing data (and possibly not even to view all of the existing families for privacy reasons).

column family 名称必须是可打印的,但限定符可以是任意字符串。
Webtable的一个示例column family是用于存储编写(网页的语言)的语言。
我们只使用一个column key在语言family中,它存储每个网页的语言ID。
该表的另一个有用的列系列是anchor;该family中的每column key代表单个anchor,如图1所示。
其中一些读取基本数据并创建派生column families,

Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp. 
Bigtable timestamps are 64-bit integers. 
They can be assigned by Bigtable, in which case they represent “real time” in microseconds, or be explicitly assigned by client applications. 
Applications that need to avoid collisions must generate unique timestamps themselves. 
Different versions of a cell are stored in decreasing timestamp order, so that the most recent versions can be read first.
To make the management of versioned data less onerous, we support two per-column-family settings that tell Bigtable to garbage-collect cell versions automatically.
The client can specify either that only the last n versions of a cell be kept, or that only new-enough versions be kept 
(e.g., only keep values that were written in the last seven days).
In our Webtable example, we set the timestamps of the crawled pages stored in the contents: 
column to the times at which these page versions were actually crawled. 
The garbage-collection mechanism described above lets us keep only the most recent three versions of every page.
