Randomly Generate Base62 for URL Shortening

Last week I wrote a post about how to design an URL shortening service, like TinyURL, and came up with 2 strategies to encode an URL, and one of which is to randomly generate base64 sequence.

Read More

How To Design a URL Shortening Service

1. Encoding Actual URL

The path part, which we would call it key, of an encoded URL can consist of 6 ~ 8 digits in base62, up to desired uniqueness rate.

Randomly Generated

We can randomly generate those digits each within the range [0, 61].

Then we must check if the generated key is unique by consulting our key storage.

Aside: this checking process can be optimized with data structures like bloom-filter, provided we have to deal hundres of million records.

If we are unlucky, we may have a key collision; in this case, just re-generate another key until its unique.

To make encoding more efficient, we can have a standalone Key Generation Service(KGS) that generates those random key beforehand, and store them.

Read More

CMake 入门指南

这篇文章来自于我在知乎上的这个回答

天知道为什么当初会花时间写一个这么严肃的答案,考虑到国内社区的审查力度保不准哪天我的帐号就GG了,这里特意做一个备份。


作为一个从2018年下半年开始到现在断断续续折腾了一年半 CMake 的人,刚好经历了 CMake 从懵逼到入门阶段。

注:虽然问题是17年提的,但是考虑到 CMake 的频繁迭代和最佳实践的变化,希望以下内容仍有帮助。

Why CMake ?

先回答括号中的问题:被逼的

这三个字是认真的。

不管 CMake 是否是一个优秀的构建工具,不管你是否认同 CMake,都无法否认 CMake 目前是 C++ 的 de facto build system[^1]。

所以在社区以及生态的影响下,使用 CMake 作为构建工具的项目会越来越多。

Read More

SYN Queue and Accept Queue

0x00 A Single Queue or Two Queues

TCP/IP stack has two options to implement the backlog queue for a socket in the LISTEN state.

Door #1: A Single Queue

A single queue can contain connections in two different states:

  • SYN RECEIVED
  • ESTABLISHED

And only connections in ESTABLISHED state can be returned to the application by calling the accept() syscall.

Therefore, the length of this queue is determined by the backlog argument of the listen() syscall.

Door #2: A SYN Queue & An Accept Queue

In this case, connections in state SYN RECEIVED are added to the SYN queue, and later moved to the accept queue when their state changes to ESTABLISHED.

Read More

Toml to Golang Struct

之前写某个东西的副产品,本质上和 https://xuri.me/toml-to-go/ 一样。

解析 Toml 仍然依赖 github.com/BurntSushi/toml 这个库。

Read More

auto-cfs-cores C++ 版 automaxprocs

之前写了一篇分析 uber 的 automaxprocs 的源码,后面抽个了时间写了一个 C++ 版本的。

其一是为了加深对原库的理解;其二是避免太长时间没写 C++ 手生,以及顺带体验一下 C++ 17 的感觉。

代码在:https://github.com/kingsamchen/Eureka/tree/master/auto-cfs-cores

Read More

uber automaxprocs 源码分析

最近直播内部的 golang 服务都使用了 uber 出品的 automaxprocs 这个库。

据伟大的 @ice 马总说,这个库解决了一个困扰B站整个golang(container)技术栈一年多的问题。

出于好奇这个库到底做了什么 magic,能够解决这个持续了一年多的 pain in the ass,抽了一点时间,稍微翻了一下库的源代码,记录如下。

automaxprocs 解决了什么问题

线上容器里的服务通常都对 CPU 资源做了限制,例如默认的 4C。

但是在容器里通过 lscpu 仍然能看到宿主机的所有 CPU 核心:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
rchitecture:          x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping: 1

这就导致 golang 服务默认会拿宿主机的 CPU 核心数来调用 runtime.GOMAXPROCS(),导致 P 数量远远大于可用的 CPU 核心,引起频繁上下文切换,影响高负载情况下的服务性能。

Read More