docs: fix install instructions and qualify ARM64 performance claims#58
Conversation
- Replace luarocks install with luarocks make (package not yet published) - Add note about future luarocks.org publication - Qualify ARM64 NEON/PMULL as correctness-tested, benchmarks x86_64 only - Add platform scope note to docs/benchmarks.md Closes #57
📝 WalkthroughWalkthroughClarifies benchmark platform scope (published numbers are x86_64-only; ARM64 NEON is correctness-tested) and adds an ARM64 LuaJIT benchmark script ( ChangesDocumentation and ARM64 benchmark additions
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Updates documentation to ensure installation instructions are accurate before luarocks.org publication, and to scope ARM64 performance claims to correctness-only (with benchmarks explicitly x86_64-only).
Changes:
- Replace
luarocks install lua-qjsonwith a local-install workflow and add a note about future luarocks.org availability. - Qualify ARM64 NEON/PMULL as correctness-tested (no published perf data) and clarify that published benchmark results are x86_64 only.
- Add an explicit platform-scope note to the benchmarks documentation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| README.md | Updates install command and qualifies ARM64 performance claim / benchmark scope. |
| docs/benchmarks.md | Adds an explicit x86_64-only scope note for published benchmark data. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
luarocks.org/modules/membphis/lua-qjson is live with 1,657 downloads. Keep the original install instruction; only the ARM64 qualification is needed.
- Add ARM64 parse+access throughput table (cjson comparison) to docs/benchmarks.md - Add arm_bench.lua reproduction script for macOS ARM64 - Label existing sections as x86_64 for clarity - Update platform scope note: ARM now has performance data - Add observation #9 contrasting ARM64 vs x86_64 speedup ratios - 1.8-13.8x over cjson on Apple M4; absolute qjson.parse throughput competitive with x86_64 Zen 2
- README Status: reconcile "x86_64 only" wording with the ARM64 table
added in docs/benchmarks.md (parse + access on Apple M4, cjson only).
- docs/benchmarks.md: fix repro command path (benches/arm_bench.lua).
- benches/arm_bench.lua:
- header: correct invocation path and add LUA_PATH so the example is
self-contained;
- drop the dead './target/release/lib?.so' package.cpath template
(qjson is loaded via ffi.load, not require);
- make B64_BLOCK / B64_BLOCK_LEN locals, matching benches/lua_bench.lua,
and move the assignment above make_b64 so the closure sees them.
Stale Copilot comments about the install line (luarocks make vs.
luarocks install) are not actionable: the rock IS published at
luarocks.org/modules/membphis/lua-qjson, and 423f463 already restored
the original 'luarocks install lua-qjson' instruction.
|
Thanks for the review. Triage below — pushed f61520b for the actionable items. Stale, declining:
Fixed in f61520b:
Verified locally on Apple M4: |
The reason ARM64 numbers exclude simdjson is not load-bearing for readers — only the fact that the comparison is cjson-only matters.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/benchmarks.md`:
- Around line 234-239: The ordered-list items "8. **Fresh-process isolation**"
and "9. **ARM64 NEON delivers 1.8–13.8× over cjson**" are indented one space too
many; remove the extra leading space so these top-level list lines align with
the other numbered items (no additional indent) to satisfy markdownlint MD005
and ensure consistent rendering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 45c3dd5f-cca0-440e-8c8c-65e60d0fa887
📒 Files selected for processing (3)
README.mdbenches/arm_bench.luadocs/benchmarks.md
✅ Files skipped from review due to trivial changes (1)
- README.md
| 8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache | ||
| interference between payload sizes. Each size now runs in its own | ||
| `resty` process, eliminating the systemic cross-scenario variance | ||
| observed in earlier benchmark runs. | ||
| 9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal | ||
| workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than |
There was a problem hiding this comment.
Fix ordered-list indentation to satisfy markdownlint (MD005).
Line 234 and Line 238 are indented one space more than other top-level list items; this triggers the lint warning and can cause inconsistent rendering.
Suggested fix
- 8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache
- interference between payload sizes. Each size now runs in its own
- `resty` process, eliminating the systemic cross-scenario variance
- observed in earlier benchmark runs.
- 9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal
- workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than
- x86_64 at equivalent sizes (~9.5× vs ~30.3× at 100 KB) primarily
- because cjson runs faster on ARM64 hardware (JIT-compiled scalar code
- benefits from wider out-of-order execution on M4). The absolute
- `qjson.parse` throughput is competitive: ~146k ops/s at 100 KB vs
- ~84k on the x86_64 Zen 2.
+8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache
+ interference between payload sizes. Each size now runs in its own
+ `resty` process, eliminating the systemic cross-scenario variance
+ observed in earlier benchmark runs.
+9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal
+ workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than
+ x86_64 at equivalent sizes (~9.5× vs ~30.3× at 100 KB) primarily
+ because cjson runs faster on ARM64 hardware (JIT-compiled scalar code
+ benefits from wider out-of-order execution on M4). The absolute
+ `qjson.parse` throughput is competitive: ~146k ops/s at 100 KB vs
+ ~84k on the x86_64 Zen 2.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache | |
| interference between payload sizes. Each size now runs in its own | |
| `resty` process, eliminating the systemic cross-scenario variance | |
| observed in earlier benchmark runs. | |
| 9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal | |
| workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than | |
| 8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache | |
| interference between payload sizes. Each size now runs in its own | |
| `resty` process, eliminating the systemic cross-scenario variance | |
| observed in earlier benchmark runs. | |
| 9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal | |
| workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than | |
| x86_64 at equivalent sizes (~9.5× vs ~30.3× at 100 KB) primarily | |
| because cjson runs faster on ARM64 hardware (JIT-compiled scalar code | |
| benefits from wider out-of-order execution on M4). The absolute | |
| `qjson.parse` throughput is competitive: ~146k ops/s at 100 KB vs | |
| ~84k on the x86_64 Zen 2. |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 234-234: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1
(MD005, list-indent)
[warning] 238-238: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1
(MD005, list-indent)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/benchmarks.md` around lines 234 - 239, The ordered-list items "8.
**Fresh-process isolation**" and "9. **ARM64 NEON delivers 1.8–13.8× over
cjson**" are indented one space too many; remove the extra leading space so
these top-level list lines align with the other numbered items (no additional
indent) to satisfy markdownlint MD005 and ensure consistent rendering.
The top "Platform scope" callout still carried the "simdjson is not available on macOS ARM64" rationale that 2660a4b removed from the ARM64 table's introduction. Drop it here too so the document speaks with one voice — readers only need to know the comparison is cjson-only.
关联 Issue
Closes #57
做了什么
Issue #57 假设了两件事,调查后只有一件需要改:
## Installing保留luarocks install lua-qjson。docs/benchmarks.md,README Status 段把 x86_64 与 ARM64 的基准范围分开陈述。具体改动:
README.mdStatus 段重写:明确 x86_64 基准对比 cjson + simdjson;ARM64 NEON/PMULL 由 scanner cross-check 做正确性,Apple M4 上的 parse + access 数据见docs/benchmarks.md。docs/benchmarks.md:(x86_64)限定。benches/arm_bench.lua:macOS ARM64 上的 parse + access 复现脚本(独立于make bench,不进 CI)。如何验证
luarocks install lua-qjson直接走 luarocks.org(已上线)LUA_PATH='./lua/?.lua;;' DYLD_LIBRARY_PATH=./target/release luajit benches/arm_bench.lua在 Apple M4 上跑通,数字与docs/benchmarks.mdARM64 表一致(误差在运行间噪声范围内)Breaking Changes
无。纯文档 + 新增独立基准脚本,无 public API / FFI / 配置变化。