这周(02/02 ~ 02/08)是2月份第一周,下周是节前最后一个工作周了。
Life
#1
周一的时候老崔私聊我说约了周三下午谈绩效,因为已经知道了当初老崔给我打的初始绩效和他的一些扣分点,所以我提前都做了准备,本来是打算逐一抨击之后找US的大经理S“聊一聊”。
结果没想到老崔最后时刻学 Trump TACO 了,给了我一个正常的绩效,而且从头到尾没提到之前在+1和hr面前抨击我的点。
估计是之前出的那个RCA影响太大了,而且事前我当众提醒 E2E 有相关失败可能有可能,所以老崔强行给我按头打那个绩效确实太欲盖弥彰了。
估计他也怕我真的去找S冲他
不过怎么说呢,这个利好不改既定结局。只要S手里没人,还得依赖老崔给他做事,26年还得是我在这团队的最后一年。除非有变数能限制掉老崔
不过蔡司令 transfer 的时间定了,这个算是个大利好,虽然他自己说是逃难跑到 US,但是只要存有机会,翻盘的理论可能还是有的。
毕竟当初确实也没想到老张真能下课啊。
#2
因为秋宝还太小,所以今年春节就不回去了。给家里人买了票,让他们春节过来。
家里初四做东,所以他们初三下午回去就行,然后初六祠堂的事情结束我妈又要再来接着带娃。
不过老婆觉得春节呆在杭州实在是太无聊了
之前21年在杭州呆满了一个春节,确实挺无聊的
#3
这周把 Daredevil Born Again S01 看完了,周日陪老婆看了阿凡达3。老婆看了一半中间直接睡着了 🤣
- 阿凡达:火与烬 Avatar: Fire and Ash (2025) 3/5 没有传的那么差 整体商业片的水准还是有的 就是实在太长了 这个长度只做单纯的商业片对于卡梅隆来说就太失身份了
- 夜魔侠:重生 第一季 Daredevil: Born Again Season 1 (2025) 质感上还是差Netflix的一点。另外不能因为这不是罚叔的个人剧就把人写的那么降智,人罚叔的风格这群anti-vigilant task force能直接里外里杀穿三次😑
Work
#1
CppNow 2025 | std::optional — Standardizing Optionals over References - A Case Study - Steve Downey https://www.youtube.com/watch?v=cSOzD78yQV4&list=PL_AKIMJc4roW7umwjjd9Td-rtoqkiyqFl&index=53
- 核心内容是 std::optional<T&> 标准化的一些故事
- tips: 引用版本是特化,內部存的其实还是指针;但是有海量的语义相关的 corner cases
C++ Weekly - Ep 517 - Tool Spotlight: ClangBuildAnalyzer https://www.youtube.com/watch?v=gEQ5_FjCihA
C++ Weekly - Ep 200 - Christmas Class 2019 - Chapter 1 of 5 - Understanding Lambdas https://www.youtube.com/watch?v=3hGSlUGEXtA
- Jason 的 Christmas 特辑的首篇
- 这次是讲 mutable lambda 做 Fibonacci
C++ Weekly - Ep 199 - [[nodiscard]] Constructors And Their Uses https://www.youtube.com/watch?v=E_ROB_xUQQQ
- C++ 20 开始
[[nodiscard]] 可以修饰构造函数,作用是为了避免经典的 std::lock_guard(mutex) 这种压根没锁上的问题
C++ Weekly - Ep - 198 - Surprise Uses For explicit Constructors https://www.youtube.com/watch?v=Q4SXFkTzD28
- 任意 constructor 都可以标记为 explicit,包括 default constructor
- 标记之后
{} 或者 {42, 42} 这种构造都会失败,需要手动加上类名
C++ Weekly - Ep 197 - 25 Years of Best Practices https://www.youtube.com/watch?v=ayIJ4b6z-7g
- 探索 25 年前 OS/2 上 Borland C++ 对于 RVO/NRVO 的优化
- 基本是考古系列
Determining if a template specialization exists https://www.lukas-barth.net/blog/checking-if-specialized/
检查是否存在函数模板特化:直接利用 SFINAE + std::declval
1 2
| template <class T, class Dummy = decltype(std::swap<T>(std::declval<T &>(), std::declval<T &>()))>
|
但是如果想做 generic 的实现就会更麻烦一些
1 2 3 4 5 6
| auto betterL = [](auto &lhs, auto &rhs) -> decltype(std::swap(lhs, rhs)) { return std::swap(lhs, rhs); }; constexpr bool sometype_has_swap = std::is_invocable_v<decltype(betterL), SomeType &, SomeType &>;
|
检查是否存在类模板特化:初始方案依赖 default ctor,也即直接尝试构造一个对象 decltype(T{})
这个方案有个坑是如果特化是某个 TU 中间部分引入或这个只有某个TU才有,那么会导致 odr-violation,整个程序是 ill-formed
另外这个方案在 MSVC 下有一些问题,MSVC 对于其他类型始终会定义,但是没有提供 operator()
#2
这周抽了点时间,做了 fawkes 和 gin 两个维度的 benchmark,目的其实就是做一个心里有数,后面也有一个比较 compelling reason 说不应该花那么大代价用 golang 重写 2 million LoC 的业务代码
测试核心是两个点:
- 一个是简单字符串的返回,主要是想压一下框架的极限
- 一个是随机wait [10ms, 50ms],主要是模拟业务停等在网络请求上,这个显然不是CPU bound,也符合大部分 rest api servers 的情况
测试环境是 ubuntu 2404,8个CPU core 用 taskset 只绑定0-3,wrk 绑定 4-7,避免影响
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
| namespace asio = boost::asio; namespace http = boost::beast::http;
DEFINE_uint32(port, 7890, "Port number to listen on");
namespace {
int tls_rnd() { thread_local std::mt19937 eng{std::random_device{}()}; thread_local std::uniform_int_distribution<int> dist{10, 50}; return dist(eng); }
}
int main(int argc, char* argv[]) { gflags::ParseCommandLineFlags(&argc, &argv, true); spdlog::cfg::load_env_levels();
try { asio::io_context ioc{1}; fawkes::io_thread_pool io_pool(4);
fawkes::server server(ioc, io_pool);
server.do_get("/status", [](const fawkes::request&, fawkes::response& resp) -> asio::awaitable<void> { resp.text(http::status::ok, std::string{"hello world"}); co_return; });
server.do_get("/delayed", [](const fawkes::request&, fawkes::response& resp) -> asio::awaitable<void> { asio::steady_timer timer(co_await asio::this_coro::executor); timer.expires_after(std::chrono::milliseconds(tls_rnd())); co_await timer.async_wait(); resp.text(http::status::ok, std::string{"hello world"}); co_return; });
server.listen_and_serve("0.0.0.0", static_cast<std::uint16_t>(FLAGS_port)); SPDLOG_INFO("Server is listening at {}", FLAGS_port);
ioc.run(); } catch (const std::exception& ex) { SPDLOG_ERROR("Unexpected error: {}", ex.what()); }
return 0; }
|
对应的 gin
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| package main
import ( "io" "net/http" "runtime" "time"
"github.com/gin-gonic/gin" "github.com/valyala/fastrand" )
func main() { runtime.GOMAXPROCS(4) r := gin.New()
r.GET("/status", func(c *gin.Context) { c.String(http.StatusOK, "hello world") })
r.GET("/delayed", func(c *gin.Context) { delay := time.Duration(fastrand.Uint32n(41)+10) * time.Millisecond time.Sleep(delay) c.String(http.StatusOK, "hello world") })
r.Run(":9876") }
|
用 fastrand 包是因为标准库的 math.rand 存在全局锁竞争,不过实测下来这个差别对 RPS 延迟 吞吐 啥的几乎没有肉眼可见区别
/status 的 bench 结果大概这样
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
| # # fawkes(cpp) #
$ taskset -c 4-7 wrk -t 4 -c 100 -d 30s http://127.0.0.1:7890/status Running 30s test @ http://127.0.0.1:7890/status 4 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 118.49us 88.97us 8.52ms 90.55% Req/Sec 195.58k 8.21k 211.33k 80.77% 23366131 requests in 30.10s, 2.20GB read Requests/sec: 776297.81 Transfer/sec: 74.77MB
---
$ taskset -c 4-7 wrk -t 4 -c 200 -d 30s http://127.0.0.1:7890/status Running 30s test @ http://127.0.0.1:7890/status 4 threads and 200 connections Thread Stats Avg Stdev Max +/- Stdev Latency 235.08us 121.81us 5.72ms 91.36% Req/Sec 202.29k 5.53k 230.88k 77.08% 24152657 requests in 30.02s, 2.27GB read Requests/sec: 804512.83 Transfer/sec: 77.49MB
---
$ taskset -c 4-7 wrk -t 4 -c 400 -d 30s http://127.0.0.1:7890/status Running 30s test @ http://127.0.0.1:7890/status 4 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 488.18us 153.83us 11.47ms 87.24% Req/Sec 197.11k 4.52k 223.46k 79.33% 23538202 requests in 30.04s, 2.21GB read Requests/sec: 783437.58 Transfer/sec: 75.46MB
---
$ taskset -c 4-7 wrk -t 4 -c 600 -d 30s http://127.0.0.1:7890/status Running 30s test @ http://127.0.0.1:7890/status 4 threads and 600 connections Thread Stats Avg Stdev Max +/- Stdev Latency 769.46us 201.83us 17.32ms 89.10% Req/Sec 189.78k 18.84k 510.99k 98.83% 22509621 requests in 29.85s, 2.12GB read Requests/sec: 753979.66 Transfer/sec: 72.62MB
---
$ taskset -c 4-7 wrk -t 4 -c 800 -d 30s http://127.0.0.1:7890/status Running 30s test @ http://127.0.0.1:7890/status 4 threads and 800 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.04ms 250.08us 18.75ms 91.22% Req/Sec 186.44k 3.85k 205.60k 76.92% 22260003 requests in 30.05s, 2.09GB read Requests/sec: 740848.80 Transfer/sec: 71.36MB
# # gin(golang) #
$ taskset -c 4-7 wrk -t 4 -c 200 -d 30s http://127.0.0.1:9876/status Running 30s test @ http://127.0.0.1:9876/status 4 threads and 200 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.00ms 1.15ms 16.07ms 85.53% Req/Sec 72.32k 4.67k 86.64k 68.75% 8634517 requests in 30.03s, 1.03GB read Requests/sec: 287563.51 Transfer/sec: 35.10MB
---
$ taskset -c 4-7 wrk -t 4 -c 400 -d 30s http://127.0.0.1:9876/status Running 30s test @ http://127.0.0.1:9876/status 4 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.45ms 1.49ms 17.41ms 85.75% Req/Sec 89.29k 5.88k 109.06k 68.83% 10660721 requests in 30.04s, 1.27GB read Requests/sec: 354847.17 Transfer/sec: 43.32MB
---
$ taskset -c 4-7 wrk -t 4 -c 600 -d 30s http://127.0.0.1:9876/status Running 30s test @ http://127.0.0.1:9876/status 4 threads and 600 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.88ms 1.67ms 24.00ms 82.51% Req/Sec 94.39k 5.66k 110.18k 69.58% 11269424 requests in 30.06s, 1.34GB read Requests/sec: 374877.28 Transfer/sec: 45.76MB
---
$ taskset -c 4-7 wrk -t 4 -c 800 -d 30s http://127.0.0.1:9876/status Running 30s test @ http://127.0.0.1:9876/status 4 threads and 800 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.50ms 2.00ms 26.56ms 81.53% Req/Sec 91.06k 5.54k 105.63k 69.42% 10874122 requests in 30.06s, 1.30GB read Requests/sec: 361805.33 Transfer/sec: 44.17MB
|
同等压力下 fawkes 的 rps 和吞吐基本是 gin 的两倍,而且延迟比较稳定
/delayed 的 bench 大概这样
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
| # # fawkes(cpp) #
$ taskset -c 4-7 wrk -t 4 -c 4000 -d 30s http://127.0.0.1:7890/delayed Running 30s test @ http://127.0.0.1:7890/delayed 4 threads and 4000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 30.47ms 11.83ms 57.44ms 56.88% Req/Sec 32.95k 435.03 33.90k 82.17% 3933987 requests in 30.08s, 378.93MB read Requests/sec: 130799.34 Transfer/sec: 12.60MB
$ taskset -c 4-7 wrk -t 4 -c 10000 -d 30s http://127.0.0.1:7890/delayed Running 30s test @ http://127.0.0.1:7890/delayed 4 threads and 10000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 38.32ms 12.39ms 255.57ms 60.27% Req/Sec 64.70k 3.68k 69.53k 94.23% 7698731 requests in 30.01s, 741.55MB read Requests/sec: 256528.10 Transfer/sec: 24.71MB
$ taskset -c 4-7 wrk -t 4 -c 15000 -d 30s http://127.0.0.1:7890/delayed Running 30s test @ http://127.0.0.1:7890/delayed 4 threads and 15000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 70.70ms 15.38ms 119.88ms 68.80% Req/Sec 52.31k 2.21k 84.44k 88.36% 6216137 requests in 30.10s, 598.75MB read Requests/sec: 206526.70 Transfer/sec: 19.89MB
$ taskset -c 4-7 wrk -t 4 -c 20000 -d 30s http://127.0.0.1:7890/delayed Running 30s test @ http://127.0.0.1:7890/delayed 4 threads and 20000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 100.47ms 19.71ms 167.23ms 68.67% Req/Sec 49.89k 4.17k 83.19k 95.11% 5788293 requests in 30.09s, 557.53MB read Requests/sec: 192347.32 Transfer/sec: 18.53MB
# # gin(golang) #
$ taskset -c 4-7 wrk -t 4 -c 2000 -d 30s http://127.0.0.1:9876/delayed Running 30s test @ http://127.0.0.1:9876/delayed 4 threads and 2000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 30.34ms 11.85ms 62.91ms 57.53% Req/Sec 16.55k 296.03 17.29k 83.08% 1975861 requests in 30.07s, 241.19MB read Requests/sec: 65709.10 Transfer/sec: 8.02MB
$ taskset -c 4-7 wrk -t 4 -c 4000 -d 30s http://127.0.0.1:9876/delayed Running 30s test @ http://127.0.0.1:9876/delayed 4 threads and 4000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 32.01ms 12.43ms 108.75ms 60.21% Req/Sec 31.39k 1.25k 33.94k 64.33% 3747334 requests in 30.07s, 457.44MB read Requests/sec: 124615.33 Transfer/sec: 15.21MB
$ taskset -c 4-7 wrk -t 4 -c 10000 -d 30s http://127.0.0.1:9876/delayed Running 30s test @ http://127.0.0.1:9876/delayed 4 threads and 10000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 66.93ms 36.72ms 309.80ms 60.65% Req/Sec 37.42k 3.87k 45.22k 73.83% 4454003 requests in 30.04s, 543.70MB read Requests/sec: 148282.24 Transfer/sec: 18.10MB
$ taskset -c 4-7 wrk -t 4 -c 15000 -d 30s http://127.0.0.1:9876/delayed Running 30s test @ http://127.0.0.1:9876/delayed 4 threads and 15000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 100.38ms 65.44ms 261.84ms 53.35% Req/Sec 37.58k 4.30k 89.10k 80.80% 4459974 requests in 30.09s, 544.43MB read Requests/sec: 148214.09 Transfer/sec: 18.09MB
$ taskset -c 4-7 wrk -t 4 -c 20000 -d 30s http://127.0.0.1:9876/delayed Running 30s test @ http://127.0.0.1:9876/delayed 4 threads and 20000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 128.96ms 91.54ms 358.05ms 49.71% Req/Sec 38.63k 4.47k 46.42k 82.36% 4489456 requests in 30.08s, 548.03MB read Requests/sec: 149267.64 Transfer/sec: 18.22MB
|
低负载(压力没打满)的情况下,fawkes 只比 gin 好一点点;但是压力上来之后 fawkes 比 gin rps 要好 50% ~ 100%,并且延迟只有后者的 50% ~ 70%
基于这个结果,fawkes 只要后续没有太坑的实现,性能下限是绝对有保证的。因为我们的目标压根就不是做什么性能最好的 rest server,而是业务开发足够友好,性能上不拖后腿,能顺利替换掉之前基于 httpd 的那托实现。
#3
fawkes 做了 expected 100-continue 的支持,顺带显式处理了一下对端 connection reset 的错误,即不认为是 unhandled/unexpected
wrk 测试结束后会利用 RST 来避免进入 TIME_WAIT
PR 见:https://github.com/kingsamchen/fawkes/pull/22
好了这周就这样,下周见