atoi With Overflow Handled

leetcode 上有一题是让你实现一个 atoi(),但是因为函数类型是 32-bit int,所以可以直接内部转换到 64-bit int,避免需要直接处理溢出的情况。

如果要针对 int64 实现一个 atoi() 又该如何做?

我翻了一下 golang 的标准库源码,大致梳理了一下

(1) 如果对象是 int32,并且输入字符串有效数据长度小于10位,则一定不会溢出,所以可以直接处理

(2) 否则统一走 int64 处理;int64 内部会先处理掉符号位 neg,然后转成 uint64

(3) 溢出处理的核心部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Cutoff is the smallest number such that cutoff*base > maxUint64.
// Use compile-time constants for common cases.
var cutoff uint64
switch base {
case 10:
cutoff = maxUint64/10 + 1
case 16:
cutoff = maxUint64/16 + 1
default:
cutoff = maxUint64/uint64(base) + 1
}

if n >= cutoff {
// n*base overflows
return maxVal, rangeError(fnParseUint, s0)
}

maxVal := uint64(1)<<uint(bitSize) - 1

n *= uint64(base)

n1 := n + uint64(d)
if n1 < n || n1 > maxVal {
// n+v overflows
return maxVal, rangeError(fnParseUint, s0)
}
n = n1

(4) cutoff 是防止溢出的第一重栅栏

对于 uint64 来说

1
2
cutoff     = 1844674407370955162
UINT64_MAX = 18446744073709551615

parse 的时候如果当前数已经 >= cutoff 那么后面的数字不管是几都绝对溢出。

但是即使 cutoff 这里满足,加上后一位数字之后仍然可能会溢出,所以这里要做溢出处理。

因为类型是 uint64,所以只可能发生上溢出,数值会 wrap-around,所以用 n + uint64(d) < n 来判断

n1 > maxVal 是用来处理 int32 的时候的情况(函数的输入类型已经被转成了 64-bit)

(5) 如果输入类型本身就是无符号,那么到这里就结束了。

如果是有符号,需要判断最终结果是否落在范围里,另外要注意补码体系下两端的值是不对称的。

Allow Connections From LAN for Trojan-qt5

因为众所周知的原因,机场的服务商把协议从 SS 切换到了 Trojan。但是之前的 SS windows 客户端只支持 SS 协议,所以切换后只能换 Trojan-qt5 作为客户端。

但是这个客户端做的实在不怎么样,作者似乎是在原来的 ss-qt5 的基础上做的修改,集成了 libtrojan;以至于之前 SS-Windows 用的好好的 Allow Connections From LAN 功能到这里没法用了。

更改 socks5 监听地址为 INETADDR_ANY 确实能让局域网内的其他主机链接 socks5 代理,但是 HTTP 代理就神奇的不能用了…

试验了一番结果发现要使 HTTP 代理能够正常工作,socks5 代理监听地址必须是 localhost… 🤪

因为我的 Linux 开发环境是搭建在虚拟机里并通过 X11 转发到宿主机的,所以不能使用来自 LAN 的连接会严重影响开发效率。

解决方案

  1. 自己写一个 tcp socket 运行在宿主机上,并监听 INETADDR_ANY,Mint 把转发器作为代理
  2. 用 netcat 实现
  3. 找其他替代品

先考虑 (1),然而发现 netcat 可以实现单次连接转发,但是后续的同一个连接的通信会直接 hang 住,可能哪里姿势不对

再考虑 (2),发现可以用 socat。

WSL 里通过 apt 安装,然后

1
socat tcp-l:10086,fork,reuseaddr tcp:127.0.0.1:10081

Mint 把宿主机的 10086 端口作为代理端口

嗯,发现工作非常顺畅…

这样就不用考虑 (3) 了,不然哪怕用 ASIO 也差不多要百来行代码。

Randomly Generate Base62 for URL Shortening

Last week I wrote a post about how to design an URL shortening service, like TinyURL, and came up with 2 strategies to encode an URL, and one of which is to randomly generate base64 sequence.

Although I knew it works, in theory, I was not so confident deep in mind.

So I wrote some code and made a simulation for generating 10^8 6-digit and 8-digit base62 sequences, respectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
struct Alphabet {
std::array<char, 62> arr;
constexpr Alphabet()
: arr()
{
for (auto ch = 'a'; ch <= 'z'; ++ch) {
arr[ch - 'a'] = ch;
}

for (auto ch = 'A'; ch <= 'Z'; ++ch) {
arr[ch - 'A' + 26] = ch;
}

for (auto ch = '0'; ch <= '9'; ++ch) {
arr[ch - '0' + 52] = ch;
}
}

constexpr operator std::array<char, 62>()
{
return arr;
}
};

constexpr std::array<char, 62> kAlphabet = Alphabet();

int main()
{
std::default_random_engine engine(std::random_device{}());
std::uniform_int_distribution<> dist(0, 61);

std::unordered_set<std::string> keys;

constexpr size_t kNum = 100000000; // 10^8

for (auto i = 0; i < kNum; ++i) {
std::string key;
key.reserve(8);
for (auto k = 0; k < 8; ++k) {
key.append(1, kAlphabet[dist(engine)]);
}

keys.insert(std::move(key));
}

std::cout << "keys-size=" << keys.size() << "; total=" << kNum << "\n"
<< "unique ratio=" << static_cast<double>(keys.size()) / kNum;

return 0;
}

The result shows

  • for generating 10^8 6-digit sequences, unique ratio of one-time generation is 0.9991
  • for the 8-digit case, the unique ratio of one-time generation is high as 0.99999975

Even for 6-digit sequences, probability of two consecutive repeated generation is only 0.000001; and we can generate at least 1 million keys in less than 1 second.

Therefore, this strategy is indeed feasible in practice.

Notwithstanding, there is still one concernn we need to pay: the success ratio of one-time generation may drop dramatically after years, when number of generated combinations exceeds a threshould, like half of the entire combination space.

Unfortunately, I don’t know what this threshould is in probability thoery; but if that day came, we can extend our sample space by extend length of base62 sequence, XD.

How To Design a URL Shortening Service

1. Encoding Actual URL

The path part, which we would call it key, of an encoded URL can consist of 6 ~ 8 digits in base62, up to desired uniqueness rate.

Randomly Generated

We can randomly generate those digits each within the range [0, 61].

Then we must check if the generated key is unique by consulting our key storage.

Aside: this checking process can be optimized with data structures like bloom-filter, provided we have to deal hundres of million records.

If we are unlucky, we may have a key collision; in this case, just re-generate another key until its unique.

To make encoding more efficient, we can have a standalone Key Generation Service(KGS) that generates those random key beforehand, and store them.

CMake 入门指南

这篇文章来自于我在知乎上的这个回答

天知道为什么当初会花时间写一个这么严肃的答案,考虑到国内社区的审查力度保不准哪天我的帐号就GG了,这里特意做一个备份。


作为一个从2018年下半年开始到现在断断续续折腾了一年半 CMake 的人,刚好经历了 CMake 从懵逼到入门阶段。

注:虽然问题是17年提的,但是考虑到 CMake 的频繁迭代和最佳实践的变化,希望以下内容仍有帮助。

Why CMake ?

先回答括号中的问题:被逼的

这三个字是认真的。

不管 CMake 是否是一个优秀的构建工具,不管你是否认同 CMake,都无法否认 CMake 目前是 C++ 的 de facto build system[^1]。

所以在社区以及生态的影响下,使用 CMake 作为构建工具的项目会越来越多。

SYN Queue and Accept Queue

0x00 A Single Queue or Two Queues

TCP/IP stack has two options to implement the backlog queue for a socket in the LISTEN state.

Door #1: A Single Queue

A single queue can contain connections in two different states:

  • SYN RECEIVED
  • ESTABLISHED

And only connections in ESTABLISHED state can be returned to the application by calling the accept() syscall.

Therefore, the length of this queue is determined by the backlog argument of the listen() syscall.

Door #2: A SYN Queue & An Accept Queue

In this case, connections in state SYN RECEIVED are added to the SYN queue, and later moved to the accept queue when their state changes to ESTABLISHED.

Toml to Golang Struct

之前写某个东西的副产品,本质上和 https://xuri.me/toml-to-go/ 一样。

解析 Toml 仍然依赖 github.com/BurntSushi/toml 这个库。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Eliminates _ and capitalizes the key. The key needs to be exported.
// e.g hello_world -> HelloWorld
func normalizeStructKey(key string) string {
str := strings.ReplaceAll(key, "_", " ")
str = strings.Title(str)
str = strings.ReplaceAll(str, " ", "")
return str
}

// Generates golang strcut from a given toml file.
// Result needs re-format.
func translate(data map[string]interface{}) string {
def := ""
for key, value := range data {
key = normalizeStructKey(key)
vt := reflect.ValueOf(value)
if vt.Kind() == reflect.Map {
def += key + " struct {\n" + translate(value.(map[string]interface{})) + "\n}\n"
} else if vt.Kind() == reflect.Slice {
s := "[]interface{}"
if vt.Len() > 0 {
s = reflect.ValueOf(vt.Index(0).Interface()).Type().String()
}
def += key + " []" + s + "\n"
} else {
def += key + " " + vt.Type().String() + "\n"
}
}

return def
}

if _, err := toml.DecodeFile(tomlPath, &data); err != nil {
return err
}
inner := translate(data)

支持结构嵌套。

不过生成的文本大概需要 gofmt 一下才符合格式规范。