@lijiang

Sculpting in time

Do one thing and do it well.
Every story has a beginning and an end.

4-Minute Read

#历史

Vitess was created in 2010 to solve the MySQL scalability challenges that the team at YouTube faced. This section briefly summarizes the sequence of events that led to Vitess’ creation:

  1. YouTube’s MySQL database reached a point when peak traffic would soon exceed the database’s serving capacity. To temporarily alleviate the problem, YouTube created a master database for write traffic and a replica database for read traffic.

  2. With demand for cat videos at an all-time high, read-only traffic was still high enough to overload the replica database. So YouTube added more replicas, again providing a temporary solution.

  3. Eventually, write traffic became too high for the master database to handle, requiring YouTube to shard data to handle incoming traffic. As an aside, sharding would have also become necessary if the overall size of the database became too large for a single MySQL instance.

  4. YouTube’s application layer was modified so that before executing any database operation, the code could identify the right database shard to receive that particular query.

what

Vitess is a database solution for deploying, scaling and managing large clusters of open-source database instances. It currently supports MySQL and Percona Server for MySQL. It’s architected to run as effectively in a public or private cloud architecture as it does on dedicated hardware. It combines and extends many important SQL features with the scalability of a NoSQL database. Vitess can help you with the following problems:

  1. Scaling a SQL database by allowing you to shard it, while keeping application changes to a minimum.
  2. Migrating from baremetal to a private or public cloud.
  3. Deploying and managing a large number of SQL database instances.

Vitess includes compliant JDBC and Go database drivers using a native query protocol. Additionally, it implements the MySQL server protocol which is compatible with virtually any other language.

Vitess served all YouTube database traffic for over five years. Many enterprises have now adopted Vitess for their production needs.

vitess vs mysql

mysql -> vitess

Every MySQL connection has a memory overhead that ranges between 256KB and almost 3MB, depending on which MySQL release you’re using. As your user base grows, you need to add RAM to support additional connections, but the RAM does not contribute to faster queries. In addition, there is a significant CPU cost associated with obtaining the connections.

Vitess creates very lightweight connections. Vitess’ connection pooling feature uses Go’s concurrency support to map these lightweight connections to a small pool of MySQL connections. As such, Vitess can easily handle thousands of connections.

Poorly written queries, such as those that don’t set a LIMIT, can negatively impact database performance for all users.

Vitess employs a SQL parser that uses a configurable set of rules to rewrite queries that might hurt database performance.

Sharding is a process of partitioning your data to improve scalability and performance. MySQL lacks native sharding support, requiring you to write sharding code and embed sharding logic in your application.

Vitess supports a variety of sharding schemes. It can also migrate tables into different databases and scale up or down the number of shards. These functions are performed non-intrusively, completing most data transitions with just a few seconds of read-only downtime.

A MySQL cluster using replication for availability has a primary database and a few replicas. If the primary fails, a replica should become the new primary. This requires you to manage the database lifecycle and communicate the current system state to your application.

Vitess helps to manage the lifecycle of your database scenarios. It supports and automatically handles various scenarios, including primary failover and data backups.

A MySQL cluster can have custom database configurations for different workloads, like a primary database for writes, fast read-only replicas for web clients, slower read-only replicas for batch jobs, and so forth. If the database has horizontal sharding, the setup is repeated for each shard, and the app needs baked-in logic to know how to find the right database.

Vitess uses a topology backed by a consistent data store, like etcd or ZooKeeper. This means the cluster view is always up-to-date and consistent for different clients. Vitess also provides a proxy that routes queries efficiently to the most appropriate MySQL instance.

Architecture

k8s demo

Concepts

Cell:

相当于一个管理区域,可以使用多个Cell来防止网络的中断,或者某个区域的宕机,做到冗余性。

  • Execution Plans

  • Keyspace

  • Keyspace ID

  • MoveTables

  • Query Rewriting

  • Replication Graph

  • Shard

  • Tablet

  • Topology service

  • VSchema

  • VStream

  • vtctl

  • vtctld

  • VTGate

Running in production

Planning

Global TopoServer

vtctld

Creating a cell

Keyspaces and Shards

VTTablet and MySQL

vtgate

Backups and Restores

Monitoring

Troubleshooting

Configuration

User Management and Authentication

Authorization

Resharding

Upgrading Vitess

  • None
  • None

Recent Posts

Categories

About

Keep thinking, Stay curious
Always be sensitive to new things