Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to front-end friendly, remove 'fs' #62

Open
knod opened this issue Dec 11, 2016 · 13 comments
Open

Convert to front-end friendly, remove 'fs' #62

knod opened this issue Dec 11, 2016 · 13 comments

Comments

@knod
Copy link

knod commented Dec 11, 2016

This module could pretty easily be converted to be front end friendly by removing the need for 'fs'. Since the stopwords files are the only things being accessed with 'fs', this could be solved pretty easily.

I created my own solution, but I'm not sure it was particularly elegant. My current thought is to create an object with a key of the language code and a value of the stopwords array. Do you have other thoughts about more elegant solutions?

@ageitgey
Copy link
Owner

Supporting front-end use cases just hasn't ever come up before. I'd be worried that the stopwords data would be too large to be used in most front-end applications if it was bundled in.

@knod
Copy link
Author

knod commented Dec 15, 2016

I'll admit I haven't tested it on a throttled system. It did work for me on a MacBook Pro, OSX 10.11, Chrome 55.0.2883.75 with no problems. Used it in the development of a Chrome extension.

The lists aren't actually that big. What I did at the time was turn those stopwords files into js files/node modules and require them in an object. I don't think a plain old object would be any more of a problem than that. We just need to test it on a slower system and see if it becomes an issue.

@aknobloch
Copy link

aknobloch commented Feb 15, 2018

I also was looking at node-unfluff for front end bro purposes. The solution offered by @knod works, an alternative could also be to utilize local HTML5 File IO when applicable, and fallback to fs when the browser does not support HTML5. This would leave the current structure untouched, and front-end users could work around that limitation in WebPack by setting up their configuration to supply an empty module for fs, bearing in mind that the extension would not work on browsers that don't support HTML5. But the list of users not on HTML5-compatible browsers is shrinking daily.

I'd be happy to code this and submit a PR, but I also wonder if this would be better suited to its own fork and module. A separate module via npm install unfluff-standalone or something similar.

@ageitgey
Copy link
Owner

If it's a feature you want, feel free to either submit a PR or make a fork. I'm happy with whichever route works best for you. I just don't have the need to use this in a browser myself, so it's not something I ever attempted to support.

@aknobloch
Copy link

Makes sense - but I notice there's a few PR's that address this issue while maintaining the robustness of the original design. However, these have been there for a while with little activity, so I'm wondering if there is a reason why they aren't being merged. The ideal solution would be to have this base updated with a solution so that separate maintenance on the stopwords files doesn't have to be done.

Thanks for the speedy reply, by the way!

@ageitgey
Copy link
Owner

so I'm wondering if there is a reason why they aren't being merged.

A fair question! I don't have a good answer beyond I thought the existing PRs made the code slightly worse for me and I didn't need the new feature, so I did a bad job of ever getting around to doing anything with it :)

The ideal solution would be to have this base updated with a solution so that separate maintenance on the stopwords files doesn't have to be done.

Sure, that makes sense.

My issues with the current PRs are just that they required loading all stop words for all languages into memory even if when 99.9% of users would only need them for one language. Loading all languages not only would that make memory usage higher on the server side (100KB per process), but it seemed like a pretty bad solution for delivering the library to users client side. Is it really worth adding 100KB of useless data to to every page load for ever user for no reason?

I imagine there might be a more clever solution where you pre-generate a separate js bundle file per language or something like that. I just never looked into it.

But that's the honest answer - I didn't love the current solutions but I also didn't spend any time coming up with a better answer either. But I'm totally open to discussing it or revisiting my opinion or whatever.

@aknobloch
Copy link

Gotcha, that makes perfect sense. And thanks for the explanation. I'll give the problem some more thought and submit a PR request if I come up with a solution that I think works well.

@ageitgey
Copy link
Owner

Cool, thanks!

@knod
Copy link
Author

knod commented Feb 23, 2018

Oh, cool. I didn't know that that was the problem with the PRs. That's good to know.

@cjanietz
Copy link

cjanietz commented Apr 1, 2018

I have made a solution which uses require for getting the stopword files. When combined with webpack and raw-loader you can make the download of the actual stopword files completely async and dependent on the relevant language of your content. If you would like to check it out, check my fork.

@bhowmikp
Copy link

bhowmikp commented Apr 4, 2018

Hopefully your PR gets accepted because I too would like this feature

@ageitgey
Copy link
Owner

ageitgey commented Apr 4, 2018

@cjanietz I really like your approach. I created a PR here: #84

Let me know if you are OK with me pulling that in.

@justinmchase
Copy link

I just tried to use this in an Electron app, which uses webpack to build the UI and non-UI parts of the app. It doesn't seem to support bundling txt files like along with the modules (as far as I can figure out). So if these were just hardcoded into js modules it would solve it there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants