Skip to content

docs: fix install instructions and qualify ARM64 performance claims#58

Merged
membphis merged 6 commits into
mainfrom
docs/fix-install-and-arm64-docs
May 26, 2026
Merged

docs: fix install instructions and qualify ARM64 performance claims#58
membphis merged 6 commits into
mainfrom
docs/fix-install-and-arm64-docs

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

@membphis membphis commented May 25, 2026

关联 Issue

Closes #57

做了什么

Issue #57 假设了两件事,调查后只有一件需要改:

  1. 安装指令 ([Docs]: Fix misleading install instructions and unsubstantiated ARM64 performance claims #57 §1) — 维持原样。 调查发现 luarocks.org/modules/membphis/lua-qjson 已发布且有 1,657 次下载,issue 里关于 "package was never published" 的前提不成立。README ## Installing 保留 luarocks install lua-qjson
  2. ARM64 NEON 性能声明 ([Docs]: Fix misleading install instructions and unsubstantiated ARM64 performance claims #57 §2) — 补数据 + 限定措辞。 在 Apple M4 上跑 parse + access 基准(cjson 对比,simdjson 不参与),把数据落到 docs/benchmarks.md,README Status 段把 x86_64 与 ARM64 的基准范围分开陈述。

具体改动:

  • README.md Status 段重写:明确 x86_64 基准对比 cjson + simdjson;ARM64 NEON/PMULL 由 scanner cross-check 做正确性,Apple M4 上的 parse + access 数据见 docs/benchmarks.md
  • docs/benchmarks.md
    • Environment 段加 Platform scope callout,区分 x86_64 / ARM64 两套数据的覆盖范围。
    • 新增 "Results — throughput (ARM64 NEON, median ops/s)" 表(small / medium / 100k / 1m / 10m,cjson 对比,1.8–13.8× 加速比)。
    • 给原有 x86_64 表 / Speed-up vs. baselines 段加 (x86_64) 限定。
    • Observations 加 perf(scan): AVX-512 + vpclmulqdq scanner backend #9 解读 ARM64 vs. x86_64 加速比差异。
  • 新增 benches/arm_bench.lua:macOS ARM64 上的 parse + access 复现脚本(独立于 make bench,不进 CI)。

如何验证

  • luarocks install lua-qjson 直接走 luarocks.org(已上线)
  • LUA_PATH='./lua/?.lua;;' DYLD_LIBRARY_PATH=./target/release luajit benches/arm_bench.lua 在 Apple M4 上跑通,数字与 docs/benchmarks.md ARM64 表一致(误差在运行间噪声范围内)
  • Manual read-through of README and docs/benchmarks.md — 无内部矛盾

Breaking Changes

无。纯文档 + 新增独立基准脚本,无 public API / FFI / 配置变化。

- Replace luarocks install with luarocks make (package not yet published)
- Add note about future luarocks.org publication
- Qualify ARM64 NEON/PMULL as correctness-tested, benchmarks x86_64 only
- Add platform scope note to docs/benchmarks.md

Closes #57
Copilot AI review requested due to automatic review settings May 25, 2026 01:10
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

Clarifies benchmark platform scope (published numbers are x86_64-only; ARM64 NEON is correctness-tested) and adds an ARM64 LuaJIT benchmark script (benches/arm_bench.lua) that measures parse+access ops/s for cjson vs qjson and prints speedups.

Changes

Documentation and ARM64 benchmark additions

Layer / File(s) Summary
README status clarification
README.md
Status text expanded to state ARM64 NEON/PMULL correctness testing via a scanner cross-check suite and to clarify that published/benchmarked performance is x86_64-focused with Apple M4 reporting scope noted.
Benchmark docs: platform scope & ARM64 results
docs/benchmarks.md
Insert "Platform scope" disclaimer (x86_64-only published figures), relabel x86_64 throughput heading, add ARM64 NEON throughput section with Apple M4 environment and reproduce command, scope speed-up table to x86_64, and extend Observations with fresh-process isolation and ARM64 NEON speedup notes.
ARM64 LuaJIT benchmark script
benches/arm_bench.lua
Add a standalone LuaJIT ARM64 NEON benchmark comparing cjson.decode vs qjson.parse for parse+access workloads; includes deterministic payload generation, timing harness across rounds, and tabular ops/s + speedup output.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • api7/lua-qjson#56: Updates to benchmark documentation and alignment of reported benchmark results.
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning PR lacks E2E tests; benches/arm_bench.lua has unhandled require() errors (criterion 7 violation); Markdown indentation issue in docs/benchmarks.md remains unfixed. Add error handling to require() calls in benches/arm_bench.lua; apply review comment fix to remove extra indentation from items 8-9 in docs/benchmarks.md.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title "docs: fix install instructions and qualify ARM64 performance claims" accurately summarizes the main changes: fixing install instructions and qualifying ARM64 performance claims in documentation.
Linked Issues check ✅ Passed The PR changes fully address issue #57 requirements: qualifying ARM64 NEON as correctness-tested with benchmark data via benches/arm_bench.lua, clarifying x86_64-only benchmarks in docs/benchmarks.md, and restoring working install instructions with luarocks.org publication status.
Out of Scope Changes check ✅ Passed All changes are directly within scope of issue #57: README updates clarifying install and ARM64 status, docs/benchmarks.md platform qualification, and benches/arm_bench.lua as supporting ARM benchmark evidence.
Security Check ✅ Passed PR modifies only documentation and benchmarks. No code containing secrets, credentials, authorization logic, TLS config, or DB operations. Security check not applicable here.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/fix-install-and-arm64-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates documentation to ensure installation instructions are accurate before luarocks.org publication, and to scope ARM64 performance claims to correctness-only (with benchmarks explicitly x86_64-only).

Changes:

  • Replace luarocks install lua-qjson with a local-install workflow and add a note about future luarocks.org availability.
  • Qualify ARM64 NEON/PMULL as correctness-tested (no published perf data) and clarify that published benchmark results are x86_64 only.
  • Add an explicit platform-scope note to the benchmarks documentation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
README.md Updates install command and qualifies ARM64 performance claim / benchmark scope.
docs/benchmarks.md Adds an explicit x86_64-only scope note for published benchmark data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
membphis added 2 commits May 25, 2026 09:53
luarocks.org/modules/membphis/lua-qjson is live with 1,657 downloads.
Keep the original install instruction; only the ARM64 qualification
is needed.
- Add ARM64 parse+access throughput table (cjson comparison) to
  docs/benchmarks.md
- Add arm_bench.lua reproduction script for macOS ARM64
- Label existing sections as x86_64 for clarity
- Update platform scope note: ARM now has performance data
- Add observation #9 contrasting ARM64 vs x86_64 speedup ratios
- 1.8-13.8x over cjson on Apple M4; absolute qjson.parse throughput
  competitive with x86_64 Zen 2
Copilot AI review requested due to automatic review settings May 25, 2026 01:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread docs/benchmarks.md Outdated
Comment thread benches/arm_bench.lua Outdated
Comment thread benches/arm_bench.lua Outdated
Comment thread benches/arm_bench.lua Outdated
- README Status: reconcile "x86_64 only" wording with the ARM64 table
  added in docs/benchmarks.md (parse + access on Apple M4, cjson only).
- docs/benchmarks.md: fix repro command path (benches/arm_bench.lua).
- benches/arm_bench.lua:
  - header: correct invocation path and add LUA_PATH so the example is
    self-contained;
  - drop the dead './target/release/lib?.so' package.cpath template
    (qjson is loaded via ffi.load, not require);
  - make B64_BLOCK / B64_BLOCK_LEN locals, matching benches/lua_bench.lua,
    and move the assignment above make_b64 so the closure sees them.

Stale Copilot comments about the install line (luarocks make vs.
luarocks install) are not actionable: the rock IS published at
luarocks.org/modules/membphis/lua-qjson, and 423f463 already restored
the original 'luarocks install lua-qjson' instruction.
@membphis
Copy link
Copy Markdown
Collaborator Author

Thanks for the review. Triage below — pushed f61520b for the actionable items.

Stale, declining:

  • Comments on the luarocks make / luarocks install lua-qjson lines (the install line review on both de38f67 and b959aec). The rock is in fact published — https://luarocks.org/modules/membphis/lua-qjson is live with 1,657 downloads — so commit 423f463 already restored the original luarocks install lua-qjson instruction. No further change needed.

Fixed in f61520b:

  • README Status sentence claiming "published benchmarks are x86_64 only" — reconciled with the new ARM64 NEON table in docs/benchmarks.md. The new wording scopes simdjson/x86_64 vs. cjson-only/ARM64 explicitly.
  • docs/benchmarks.md repro command: luajit arm_bench.lualuajit benches/arm_bench.lua.
  • benches/arm_bench.lua header comment: same path fix, plus LUA_PATH='./lua/?.lua;;' so the example is self-contained.
  • B64_BLOCK / B64_BLOCK_LEN made local (matches benches/lua_bench.lua); moved the assignment above make_b64 so the closure resolves them lexically.
  • Dropped the dead ./target/release/lib?.so entry from package.cpath — qjson is loaded via ffi.load (honoring DYLD_LIBRARY_PATH / LD_LIBRARY_PATH), not require, so the .so-vs-.dylib extension mismatch on macOS was never actually exercised. Removing it also addresses the platform-extension concern.

Verified locally on Apple M4: LUA_PATH='./lua/?.lua;;' DYLD_LIBRARY_PATH=./target/release luajit benches/arm_bench.lua reproduces the table within run-to-run noise.

The reason ARM64 numbers exclude simdjson is not load-bearing for
readers — only the fact that the comparison is cjson-only matters.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/benchmarks.md`:
- Around line 234-239: The ordered-list items "8. **Fresh-process isolation**"
and "9. **ARM64 NEON delivers 1.8–13.8× over cjson**" are indented one space too
many; remove the extra leading space so these top-level list lines align with
the other numbered items (no additional indent) to satisfy markdownlint MD005
and ensure consistent rendering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 45c3dd5f-cca0-440e-8c8c-65e60d0fa887

📥 Commits

Reviewing files that changed from the base of the PR and between de38f67 and f61520b.

📒 Files selected for processing (3)
  • README.md
  • benches/arm_bench.lua
  • docs/benchmarks.md
✅ Files skipped from review due to trivial changes (1)
  • README.md

Comment thread docs/benchmarks.md
Comment on lines +234 to +239
8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache
interference between payload sizes. Each size now runs in its own
`resty` process, eliminating the systemic cross-scenario variance
observed in earlier benchmark runs.
9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal
workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix ordered-list indentation to satisfy markdownlint (MD005).

Line 234 and Line 238 are indented one space more than other top-level list items; this triggers the lint warning and can cause inconsistent rendering.

Suggested fix
- 8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache
-    interference between payload sizes. Each size now runs in its own
-    `resty` process, eliminating the systemic cross-scenario variance
-    observed in earlier benchmark runs.
- 9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal
-    workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than
-    x86_64 at equivalent sizes (~9.5× vs ~30.3× at 100 KB) primarily
-    because cjson runs faster on ARM64 hardware (JIT-compiled scalar code
-    benefits from wider out-of-order execution on M4). The absolute
-    `qjson.parse` throughput is competitive: ~146k ops/s at 100 KB vs
-    ~84k on the x86_64 Zen 2.
+8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache
+   interference between payload sizes. Each size now runs in its own
+   `resty` process, eliminating the systemic cross-scenario variance
+   observed in earlier benchmark runs.
+9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal
+   workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than
+   x86_64 at equivalent sizes (~9.5× vs ~30.3× at 100 KB) primarily
+   because cjson runs faster on ARM64 hardware (JIT-compiled scalar code
+   benefits from wider out-of-order execution on M4). The absolute
+   `qjson.parse` throughput is competitive: ~146k ops/s at 100 KB vs
+   ~84k on the x86_64 Zen 2.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache
interference between payload sizes. Each size now runs in its own
`resty` process, eliminating the systemic cross-scenario variance
observed in earlier benchmark runs.
9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal
workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than
8. **Fresh-process isolation** removes accumulated GC and JIT trace-cache
interference between payload sizes. Each size now runs in its own
`resty` process, eliminating the systemic cross-scenario variance
observed in earlier benchmark runs.
9. **ARM64 NEON delivers 1.8–13.8× over cjson** on the same multimodal
workload (Apple M4, LuaJIT 2.1 Homebrew). The speedup is lower than
x86_64 at equivalent sizes (~9.5× vs ~30.3× at 100 KB) primarily
because cjson runs faster on ARM64 hardware (JIT-compiled scalar code
benefits from wider out-of-order execution on M4). The absolute
`qjson.parse` throughput is competitive: ~146k ops/s at 100 KB vs
~84k on the x86_64 Zen 2.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 234-234: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)


[warning] 238-238: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 1

(MD005, list-indent)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/benchmarks.md` around lines 234 - 239, The ordered-list items "8.
**Fresh-process isolation**" and "9. **ARM64 NEON delivers 1.8–13.8× over
cjson**" are indented one space too many; remove the extra leading space so
these top-level list lines align with the other numbered items (no additional
indent) to satisfy markdownlint MD005 and ensure consistent rendering.

The top "Platform scope" callout still carried the
"simdjson is not available on macOS ARM64" rationale that 2660a4b
removed from the ARM64 table's introduction. Drop it here too so the
document speaks with one voice — readers only need to know the
comparison is cjson-only.
@membphis membphis merged commit 8205c84 into main May 26, 2026
4 checks passed
@membphis membphis deleted the docs/fix-install-and-arm64-docs branch May 26, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Docs]: Fix misleading install instructions and unsubstantiated ARM64 performance claims

2 participants