-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add script for initializing data #18
Conversation
This could potentially be used for adding data for test files more efficiently, but most important it will allow us to setup the data when installing from scratch.
WalkthroughThe update introduces new functionality for managing emoji data. A new development dependency, Changes
Sequence Diagram(s)sequenceDiagram
participant Script as copy-emoji-data.ts
participant GitHub as GitHub Repo
participant Worker as Worker Process
participant API as Upload API
Script->>+GitHub: Fetch emoji tarball data
GitHub-->>-Script: Return tarball (or error)
Script->>Worker: Start worker with config
Script->>Script: Create FormData with tarball blob
Script->>+API: POST request with FormData
API-->>-Script: Confirm upload
Script->>Worker: Dispose worker
sequenceDiagram
participant Client as Request Client
participant Server as HTTP Server (Hono)
participant Parser as parseTarGzip
participant R2 as R2 Bucket
Client->>Server: POST /upload with tar.gz file
Server->>Parser: Extract archive contents
Parser-->>Server: Return listing of files
Server->>Server: Filter files by valid prefix
Server->>R2: Upload valid JSON files
R2-->>Server: Acknowledge uploads
Server->>Client: Respond with JSON success message
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
New dependencies detected. Learn more about Socket for GitHub ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
scripts/copy-emoji-data.ts (3)
7-31
: Implement better error handling and response validationThe script successfully fetches emoji data from GitHub and uploads it via a worker, but several improvements would make it more robust:
- Add validation for the worker response
- Include progress reporting for potentially large downloads
- Consider adding a timeout for the fetch operation
async function run() { const res = await fetch("https://github.com/mojisdev/emoji-data/archive/refs/heads/main.tar.gz"); if (!res.ok) { throw new Error(`Failed to fetch emoji-data: ${res.statusText}`); } const blob = await res.blob(); const worker = await unstable_startWorker({ config: path.join(root.toString(), "./wrangler.jsonc"), entrypoint: path.join(root.toString(), "./scripts/emoji-data-setup-app.ts"), }); const formData = new FormData(); formData.append("file", blob, "emoji-data.tar.gz"); - await worker.fetch("https://api.mojis.dev/upload", { + const uploadResponse = await worker.fetch("https://api.mojis.dev/upload", { method: "POST", // @ts-expect-error hmmm body: formData, }); + + if (!uploadResponse.ok) { + throw new Error(`Failed to upload emoji data: ${await uploadResponse.text()}`); + } + + const result = await uploadResponse.json(); + console.log("Upload successful:", result.message); await worker.dispose(); }
16-19
: Consider making worker configuration more flexibleThe worker configuration is hardcoded with specific paths. Consider making these configurable through environment variables or command-line arguments for better flexibility across different environments.
24-24
: Avoid hardcoded URLsThe URL "https://api.mojis.dev/upload" is hardcoded, which could cause issues in different environments (dev, staging, prod).
- await worker.fetch("https://api.mojis.dev/upload", { + const apiUrl = process.env.API_URL || "https://api.mojis.dev"; + await worker.fetch(`${apiUrl}/upload`, {scripts/emoji-data-setup-app.ts (4)
18-22
: Improve validation error messageThe error message for a missing file could be more informative to help users understand what they need to provide.
if (file == null) { throw new HTTPException(400, { - message: "No file uploaded", + message: "No file uploaded. Please provide a tar.gz file containing emoji data", }); }
24-28
: Fix inconsistent error message casingThe error message for invalid file uploads uses lowercase while other messages start with uppercase. Maintain consistency in error messages.
if (typeof file === "string") { throw new HTTPException(400, { - message: "invalid file uploaded", + message: "Invalid file uploaded", }); }
45-52
: Add file size validationThe code doesn't check the size of the files being uploaded to the R2 bucket. Implement size validation to prevent storing excessively large files.
const promises = []; + const MAX_FILE_SIZE = 1024 * 1024; // 1MB limit per file for (const entry of tar) { if (entry.type !== "file") continue; const normalizedEntryName = entry.name.replace("./", ""); + if (entry.text.length > MAX_FILE_SIZE) { + console.warn(`Skipping ${normalizedEntryName}: File exceeds maximum size of ${MAX_FILE_SIZE} bytes`); + continue; + } promises.push(c.env.EMOJI_DATA.put(normalizedEntryName, entry.text)); }
58-60
: Enhance response with upload statisticsThe current response only includes a generic message. Enhance it to include statistics about the uploaded files.
return c.json({ message: "Files uploaded", + count: promises.length, + timestamp: new Date().toISOString() });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (3)
package.json
(1 hunks)scripts/copy-emoji-data.ts
(1 hunks)scripts/emoji-data-setup-app.ts
(1 hunks)
🔇 Additional comments (1)
package.json (1)
31-31
:✅ Verification successful
New dependency added for tar.gz processing
The addition of
nanotar
is appropriate for handling the tar.gz files in the new scripts. This lightweight package seems to be a good choice for the specific task of extracting emoji data from archives.
🌐 Web query:
What is nanotar npm package and is it suitable for production use?
💡 Result:
The nanotar npm package is a minimalistic and efficient utility for handling TAR archives in JavaScript environments. Designed for modern runtimes like Node.js 18+, Bun, Deno, browsers, and Edge Workers, it emphasizes performance and compactness (~1KB minified + gzipped). Here's a breakdown of its features and production viability:
Key Features
- Cross-Platform Compatibility: Works in all major JS runtimes with TypeScript/ESM support[1].
- Streamlined API: Includes utilities for creating and parsing TAR archives:
createTar
/createTarGzip
for archive generation[1].parseTar
/parseTarGzip
for parsing, with options to filter files or skip data extraction for metadata-only use cases[1].- Compression Support: Built-in GZIP compression/decompression via
createTarGzipStream
andparseTarGzip
[1].- Modern Attributes: Supports file metadata like permissions (
mode
), timestamps (mtime
), and ownership details[1].
Production Suitability
Security:
No known vulnerabilities are reported for the latest version (0.2.0) as of March 2025[13].Performance:
Optimized for speed and minimal resource usage, making it ideal for resource-constrained environments like Edge Workers[1].Maintenance:
Actively maintained by theunjs
ecosystem, with recent updates and clear documentation[1][10].Limitations:
- Lacks explicit test suite mentions in its public documentation[1][10].
- Targets modern JS environments, which may require transpilation for legacy systems[1].
Comparison with Alternatives
Feature nanotar Traditional TAR Tools (e.g., tar-fs
)Size ~1KB Larger (e.g., tar-fs
: ~20KB)ESM/CJS Support Native Often requires polyfills Compression Built-in GZIP External dependencies (e.g., zlib
)Runtime Compatibility Universal (Node, Deno, Browser) Node.js-focused
Recommendation
Nanotar is production-ready for lightweight TAR operations where bundle size and modern runtime compatibility are critical. For complex workflows requiring extensive testing or legacy system support, evaluate additional validation or alternatives like
tar-fs
. Always verify compatibility with your deployment environment.Citations:
- 1: https://www.npmjs.com/package/nanotar
- 2: https://www.processunity.com/how-to-create-a-barebones-production-ready-npm-package-with-babel-7/
- 3: https://nodejs.org/en/learn/getting-started/an-introduction-to-the-npm-package-manager
- 4: https://docs.npmjs.com/cli/v8/using-npm/package-spec
- 5: https://www.npmjs.com/package/nano
- 6: https://snyk.io/blog/best-practices-create-modern-npm-package/
- 7: https://www.npmjs.com/package/@types/nanoid-dictionary
- 8: https://www.youtube.com/watch?v=oGO_-DWTmKA
- 9: https://www.youtube.com/watch?v=jaKmqGxDSxQ
- 10: https://github.com/unjs/nanotar/blob/main/package.json
- 11: https://www.digitalocean.com/community/tutorials/how-to-use-node-js-modules-with-npm-and-package-json
- 12: nanoid@3(3.3.6) doesn't work with CommonJS ai/nanoid#422
- 13: https://security.snyk.io/package/npm/nanotar
- 14: https://deploybot.com/blog/guest-post-how-to-set-up-and-deploy-nodejs-express-application-for-production
- 15: https://www.milesweb.com/blog/technology-hub/what-is-npm/
- 16: https://deps.dev/npm/@docus%2Fdocs-theme-edge/3.0.0-b99961f/dependencies/graph?filter=package%3A%22nanotar%22+version%3A%220.1.1%22
- 17: https://dev.to/zhnedyalkow/the-easiest-way-to-patch-your-npm-package-4ece
- 18: https://docs.npmjs.com/files/package.json/
- 19: https://github.com/unjs/nanotar/activity
- 20: https://www.npmjs.com/package/nanoid
Approved: Nanotar Dependency Verified for TAR Processing
The addition of the
nanotar
dependency inpackage.json
(line 31) is well-justified. As verified, nanotar is a lightweight (~1KB minified + gzipped), production-ready package designed for efficiently handling TAR/GZIP archives across modern JS runtimes. This makes it a suitable choice for processing tar.gz files—as required for extracting emoji data in the new scripts.
// @ts-expect-error hmmm | ||
body: formData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix TypeScript error instead of suppressing it
The code has a TypeScript error that's being suppressed with a @ts-expect-error
comment. This should be addressed properly.
A possible solution would be to properly type the FormData:
- // @ts-expect-error hmmm
- body: formData,
+ body: formData as unknown as BodyInit,
Or you could use an alternative approach that's type-safe.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// @ts-expect-error hmmm | |
body: formData, | |
body: formData as unknown as BodyInit, |
scripts/emoji-data-setup-app.ts
Outdated
app.post( | ||
"/upload", | ||
async (c) => { | ||
const body = await c.req.parseBody(); | ||
|
||
const file = Array.isArray(body.file) ? body.file[0] : body.file; | ||
|
||
if (file == null) { | ||
throw new HTTPException(400, { | ||
message: "No file uploaded", | ||
}); | ||
} | ||
|
||
if (typeof file === "string") { | ||
throw new HTTPException(400, { | ||
message: "invalid file uploaded", | ||
}); | ||
} | ||
|
||
const filePrefix = "emoji-data-main/data/"; | ||
|
||
const tar = await parseTarGzip(await file.arrayBuffer(), { | ||
filter(file) { | ||
if (!file.name.startsWith(filePrefix) || !file.name.endsWith(".json")) { | ||
return false; | ||
} | ||
|
||
// remove the file prefix | ||
file.name = file.name.slice(filePrefix.length); | ||
|
||
return true; | ||
}, | ||
}); | ||
|
||
const promises = []; | ||
|
||
for (const entry of tar) { | ||
if (entry.type !== "file") continue; | ||
const normalizedEntryName = entry.name.replace("./", ""); | ||
|
||
promises.push(c.env.EMOJI_DATA.put(normalizedEntryName, entry.text)); | ||
} | ||
|
||
c.executionCtx.waitUntil( | ||
Promise.all(promises), | ||
); | ||
|
||
return c.json({ | ||
message: "Files uploaded", | ||
}); | ||
}, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add comprehensive error handling for file processing
The upload endpoint lacks error handling for processing and uploading files. If any file upload fails, the error won't be properly captured or reported back to the client.
app.post(
"/upload",
async (c) => {
const body = await c.req.parseBody();
const file = Array.isArray(body.file) ? body.file[0] : body.file;
if (file == null) {
throw new HTTPException(400, {
message: "No file uploaded",
});
}
if (typeof file === "string") {
throw new HTTPException(400, {
message: "invalid file uploaded",
});
}
const filePrefix = "emoji-data-main/data/";
+ try {
const tar = await parseTarGzip(await file.arrayBuffer(), {
filter(file) {
if (!file.name.startsWith(filePrefix) || !file.name.endsWith(".json")) {
return false;
}
// remove the file prefix
file.name = file.name.slice(filePrefix.length);
return true;
},
});
const promises = [];
+ const results = { success: [], failed: [] };
for (const entry of tar) {
if (entry.type !== "file") continue;
const normalizedEntryName = entry.name.replace("./", "");
- promises.push(c.env.EMOJI_DATA.put(normalizedEntryName, entry.text));
+ promises.push(
+ c.env.EMOJI_DATA.put(normalizedEntryName, entry.text)
+ .then(() => results.success.push(normalizedEntryName))
+ .catch((error) => results.failed.push({ name: normalizedEntryName, error: error.message }))
+ );
}
c.executionCtx.waitUntil(
Promise.all(promises),
);
return c.json({
message: "Files uploaded",
+ stats: {
+ successful: results.success.length,
+ failed: results.failed.length,
+ },
+ details: results
});
+ } catch (error) {
+ throw new HTTPException(500, {
+ message: `Failed to process uploaded file: ${error instanceof Error ? error.message : String(error)}`,
+ });
+ }
},
);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
app.post( | |
"/upload", | |
async (c) => { | |
const body = await c.req.parseBody(); | |
const file = Array.isArray(body.file) ? body.file[0] : body.file; | |
if (file == null) { | |
throw new HTTPException(400, { | |
message: "No file uploaded", | |
}); | |
} | |
if (typeof file === "string") { | |
throw new HTTPException(400, { | |
message: "invalid file uploaded", | |
}); | |
} | |
const filePrefix = "emoji-data-main/data/"; | |
const tar = await parseTarGzip(await file.arrayBuffer(), { | |
filter(file) { | |
if (!file.name.startsWith(filePrefix) || !file.name.endsWith(".json")) { | |
return false; | |
} | |
// remove the file prefix | |
file.name = file.name.slice(filePrefix.length); | |
return true; | |
}, | |
}); | |
const promises = []; | |
for (const entry of tar) { | |
if (entry.type !== "file") continue; | |
const normalizedEntryName = entry.name.replace("./", ""); | |
promises.push(c.env.EMOJI_DATA.put(normalizedEntryName, entry.text)); | |
} | |
c.executionCtx.waitUntil( | |
Promise.all(promises), | |
); | |
return c.json({ | |
message: "Files uploaded", | |
}); | |
}, | |
); | |
app.post( | |
"/upload", | |
async (c) => { | |
const body = await c.req.parseBody(); | |
const file = Array.isArray(body.file) ? body.file[0] : body.file; | |
if (file == null) { | |
throw new HTTPException(400, { | |
message: "No file uploaded", | |
}); | |
} | |
if (typeof file === "string") { | |
throw new HTTPException(400, { | |
message: "invalid file uploaded", | |
}); | |
} | |
const filePrefix = "emoji-data-main/data/"; | |
try { | |
const tar = await parseTarGzip(await file.arrayBuffer(), { | |
filter(file) { | |
if (!file.name.startsWith(filePrefix) || !file.name.endsWith(".json")) { | |
return false; | |
} | |
// remove the file prefix | |
file.name = file.name.slice(filePrefix.length); | |
return true; | |
}, | |
}); | |
const promises = []; | |
const results = { success: [], failed: [] }; | |
for (const entry of tar) { | |
if (entry.type !== "file") continue; | |
const normalizedEntryName = entry.name.replace("./", ""); | |
promises.push( | |
c.env.EMOJI_DATA.put(normalizedEntryName, entry.text) | |
.then(() => results.success.push(normalizedEntryName)) | |
.catch((error) => | |
results.failed.push({ name: normalizedEntryName, error: error.message }) | |
) | |
); | |
} | |
c.executionCtx.waitUntil(Promise.all(promises)); | |
return c.json({ | |
message: "Files uploaded", | |
stats: { | |
successful: results.success.length, | |
failed: results.failed.length, | |
}, | |
details: results, | |
}); | |
} catch (error) { | |
throw new HTTPException(500, { | |
message: `Failed to process uploaded file: ${ | |
error instanceof Error ? error.message : String(error) | |
}`, | |
}); | |
} | |
}, | |
); |
This could potentially be used for adding data for test files more efficiently, but most important it will allow us to setup the data when installing from scratch.
Summary by CodeRabbit