-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add download progress bar #697
base: development
Are you sure you want to change the base?
Conversation
I'd imagine for this to have any chance of making it into the main build you'd have to put it behind an option like |
This was declined for specific reasons that haven't changed. The BDFR is designed to be used in scripts and cron jobs. The progress bar and tqdm cause havoc in log files and output when trying to save it. The idea was always to keep this as the most basic and ubiquitous level tool that can be used for everything and then other more user-friendly tools can be used that expand the interface, such as a progress bar. Now, since there has been no effort by anyone I know of to make such an interface, I may have softened a bit but I'm still not sure that it's a good thing to include. Such a small thing doesn't seem worth making an entirely separate tool but I try to subscribe to the Unix Philosophy: do one thing and do it well. The BDFR's thing is to download data from Reddit in any environment or conditions and for any use case. I myself use it as a script, and I worry that this feature would negatively impact those kinds of uses. |
@OMEGARAZER and @Serene-Arc thank you for your remarks - I now understand more philosophy behind In this case, here is a version that provides end-user CLI experience (as for many command-line apps, including many dev tools, such as What do you think about this kind of approach, @Serene-Arc ? |
This should be fine. One change I will ask for is to make the progress bar False by default e.g. Also as a side note, there is a bug in the logging portion of tqdm that means it doesn't handle logging messages. This isn't related to your PR per se but it will mean that it doesn't work as expected since logging messages of all levels will be printed to the screen, which is not the intended behaviour. Despite there being about 5 duplicate PRs to fix this behaviour, nothing has been done to date. |
@Serene-Arc Thank you for the remark on the default behavior of In the current setup (as in this PR) it seems that it works as intended, though. Or do you have some examples of unintended behavior? |
I genuinely don't know so I apologize if this doesn't help but does Alive-Progress have the logging issues you mentioned Serene? This mentions it does handle the logging but not sure if it's in the same situation. It's the library I was looking at for this purpose when I had last looked into it but I didn't want to step on stared's toes as I hadn't even started on it at all. |
@OMEGARAZER I wouldn't know unfortunately. My PR for tqdm is here but there are at least three others that fix the same issue. @stared tqdm does handle the logging but it prints logging messages of every level, not just info level ones, but debugging messages and lower as well. |
We recently underwent a project-wide format to add consistency, if you could please resolve the conflicts that it has introduced, I can merge the PR. |
54f91c0
to
e9022ff
Compare
@Serene-Arc Right now it looks like that: |
I've been thinking and I'm not sure that this is the best approach, specifically that most of the code has been added to the downloader module. For one, this ignores the cloner and archiver modules completely; the downloader module now has an option that applies to only it when there isn't really a reason for that. Additionally, the parsing and reorganisation of logging messages to be more 'user-friendly' definitely doesn't belong in that module. I think another architectural approach is needed, maybe a separate module for a child of Do you think that you are able to make these changes? |
@Serene-Arc Yes, I should add it to other modules as well. Or for now, I can make it an option for the downloader, to be expanded later if needed.
Not sure if I understand. Do you mean something like
in which |
Yes that is what I meant. Something that can wrap the iterator with as little code in those files as possible and handle any logging calls without changing the master log file. Like I said, this is the kind of change that I'd hoped would be handled in different tool built on the BDFR so try to keep as much of the code in a new module as possible. Basically, try to make as much of it self-contained and separate as possible. I'm going to think about making a more modular architecture but that's a problem for another PR. |
@Serene-Arc Thanks for this nice suggestion - refactored code. Now indeed it is more modular and manageable. In particular, now I was able to add it with ease to the archiver. Let me know what you think about that. |
I think I am going to make changes to standardize the creation and handling of loggers. This will make your PR much easier if you wait for that. |
@Serene-Arc Up to you. Let me know when I can start working back on this PR. |
@ Serene-Arc So, what's the status of the loggers? |
I'm working on a solution, should be ready soon. The aim will be to decouple the logging so arbitrary logging handlers can be plugged into the RedditConnector class, which will make your PR and subsequent similar PRs much easier in the future. |
Right now that #750 is merged, you should have an easy way to do it. Make an alternative logging handler interface that can handle the messages however you want. Then, use |
@Serene-Arc Thanks, I will give it a try. After so many changes (quite a few other PRs) resolving conflicts will take me some time, though. |
It's fine, no rush. This will probably hit next release anyway. |
@Serene-Arc Rebasing through the previous PRs was a pain (as they also touched downloader loops), but I managed. Then it comes to the new logging handlers - frankly, I don't understand this pattern, so I don't know how to use it for the progress bar. Do you have any pointers on what to do? |
So the change in the way loggers are handled mean that you don't have to modify the download loop. You should leave that alone; if everyone who wanted a minor change to the loop implemented it there, then it would get very long and crowded. Instead you should make a separate logger, and replace the default console logger with your replacement. To do that, you make a logging handler class of your own that utilises tqdm to display a progress bar and the messages that you want. Here's an example bit of code that I wrote that does this. import logging
from time import sleep
from typing import Optional
from tqdm import tqdm
logger = logging.getLogger()
class TestHandler(logging.StreamHandler):
def __init__(self, num_of_submissions: int):
super().__init__()
self.progress_bar: Optional[tqdm] = None
self.num_of_submissions = num_of_submissions
def emit(self, record: logging.LogRecord) -> bool:
if "test" in record.msg:
if self.progress_bar is None:
self.progress_bar = tqdm(total=self.num_of_submissions, initial=0)
self.progress_bar.update(1)
self.progress_bar.write(record.msg)
self.flush()
return True
if __name__ == "__main__":
logger.setLevel(1)
handler = TestHandler(20)
handler.setLevel(logging.INFO)
logger.addHandler(handler)
logger.info("random example message")
logger.info("random example message")
logger.info("random example message")
for i in range(0, 20):
logger.info("testing")
sleep(0.5)
logger.info("done") The only thing that this needs is the maximum number of submissions to download, which can be passed in. So make this logger just before the |
@Serene-Arc Thank you for this example. I guess I would need to start another PR, as this current code is not salvageable for the new architecture. |
@Serene-Arc I made some changes, and now it should work. The intrusion in the loop is minimal - a few other instructions, but no contexts or wraps. That way it is also highly refactorable. If you need to add anything in progress after each subreddit and post (including logs!), it is easy to do so without cluttering the loop. Should I proceed with this approach? |
@Serene-Arc By any chance, did you have some time to look at the last commit? |
Hi, I just looked at it. That isn't what I meant with the design, you've still added a Progress object to the download loop and changed the signatures of some of those methods? Not sure why you've done that. |
Well, if you want to transmit data via text logs, it is a hard no for me. It would be a dirty hack; in the presence of other possibilities - a really, really bad practice. Sure, we can use a library that supports log payloads (How to make context logging in Python less cumbersome), but I don't think it is worth it. If you have a cleaner solution than
I don't think it is cleaner. Do you? Sure, you might want to have something in between, e.g.:
Yet still, I hardly see how it is cleaner than Signature - to get if the download operation succeeds. It is a standard pattern that (in a practical way) a function returns an object or So again:
|
You claim that this is a cleaner solution, but it isn't. You haven't changed either the archiver or the cloner, nor would duplicating the code there make it simpler to maintain going forwards. Wouldn't even have the same code in the case of the cloner. And, again, this PR is neither unique nor special moving forwards; any other PR attempting to change the output of the BDFR will run into the exact same problem and add similar code in the exact same location. The function doesn't change the internal state, nor does it have a valuable return so I'm not sure why that change is good practice here. Attempting to shoehorn the code to fit your specific use case doesn't seem like good practice to me at all. Sure, more context aware logging could work. Could make an intermediate level to abstract that handling away from the higher level classes and deal with the messages in a consistent way. Do I have a better solution? No, but neither have I tried to find one. I have outlined the requirements that I feel are best for the project, but this is an open-source tool that I work on in my spare time. If you aren't willing to change it, or I can't convey the requirements properly and you don't want to waste your own time, then the solution is ultimately to wait until I have time to devise a solution. Up until now, the unofficial stance has been that features like this require a different project built on top of the bdfr's core modules. Moving forwards, that can change, but it may require more work before I'm satisfied with it. Most of the work has gone into broadening the scope of the project, not narrowing it. Let me know how you'd like to proceed. |
@Serene-Arc To be honest, I feel bitter here. While I heard your initial reservations, later it turned into a guessing game, which gave me a lot frustration. While writing code always takes time, a lot of it was needless - especially waiting for quite a few other PR to be merged, so I could go through rebase hell, to have to rewrite the whole code anyway, to well-being in a place where I am. I guess you are aware that to introduce any functionality you need to change some code. If it was your condition that no code in the If you think the loop logging part is the cleanest part of this project - I reassure you, they aren't. There are a lot of "if" statements, which are error-prone, and not transmitting status as data. If it were my library, instead of "log it, and return None" I would return results with status data or at least use this pattern: self.failed_submission_status('filter_score', { 'score': score })
return False Then within Well, what's next? With the said bitterness, I don't wish to develop this PR any further. Regardless - thank you for developing BDFR! I will keep using mine locally. And will create a fork, so others may use it - as it seems I am not the only one, #745. |
I'm sorry if I haven't been clear. I have tried to lay out my requirements clearly, to the point of combing through tqdm's and the logging library's API to provide a MWE that I thought fit your requirements but I don't think I understood and there may have been a miscommunication issue. You are correct that the download loop contains a lot of if statements; when I wrote it, there were much fewer, but there have been several PRs that have added features to filter the downloads further and they have added their code in the same place. That area is definitely due for a rewrite to make it clearer for further additions, perhaps moving it to the If you don't want to continue developing for the BDFR, I really am sorry that this has been so difficult. Keep the PR open and once I have a solution, I'll implement this myself if need be. Thank you for your effort. |
Hi, It's nice to see @stared here again, one of the earliest contributors of the project :) I haven't read the whole discussion but I want to say couple of things. Although I understand Serene's points and we have had multiple discussions about it, I don't think a feature as simple as this (and much needed, imo) should be this controversial or hard to implement. I cannot explain the frustration I have when I cannot determine if the process hanged for some reason or the file it is downloading is huge. There were other feature requests and PRs which were rejected because bdfr itself was no place to implement them. Although, I agree with the rejections, some are requested multiple times and it feels like community actually needs them. I think there should be a common ground which allows people to use bdfr however they like easily. We already provide great flexibility through CLI and sometimes it is not enough, as it was in this case. I want to discuss this matter in the discussion section together with @stared , @Serene-Arc along with the other contributors. I am sure we will find a way. |
My issue here isn't the feature per se but the implementation. I think we need a more comprehensive event logging system so that, in the future, adding similar features is easy and doesn't clutter the codebase in the wrong locations. |
@aliparlakci It is nice to hear you again! Of course, it is up to you to include (or not) the PR. If you look at the code diff, the changes are minimal and encapsulated. In particular, if (and when) there is a different logging system or (what I consider more important) a consistent and well-designed error handling, moving the progress bar will be a breeze. While I appreciate some pointers of @Serene-Arc on how to separate the progress bar code better (and I put quite a few of them in action), I don't understand the cluttering argument. Right now inclusions are minimal and won't be much smaller not matter the way they are implemented. Sure, refactoring logging and error handling will vastly improve code readability and robustness, but it is another project. And the improvement will be mostly in the places not touched by the progress bar at all. |
@aliparlakci @Serene-Arc By any chance, did anything change with the status of this PR? I would like to note that after more than a year of development of BDFR, there is no conflict between this PR and the development branch (except for two trivial ones, as you see above). So, empirically, this progress bar adds an optional feature while it does not affect other functionalities or hamper development. (I am still using the progress bar on my device.) |
Hi, given that, by all means. If you'll resolve the conflicts, I'll merge. If you could add a section on the new option to the README, that would be great too. |
47085f7
to
a18685c
Compare
@Serene-Arc I added README, resolved conflicts, and rebased on the newest development branch. |
Is there a way to just have the tqdm progress bar without it hijacking the log output like the first version of the PR? I process the log output txt file after bdfr runs and with --progress-bar the txt file is blank after a handful of lines. Wondering if there's a place to see the original commit? I'm assuming it's this |
@ymgenesis Yes - the initial approach is in the https://github.com/stared/bulk-downloader-for-reddit/tree/old-progress-bar branch. |
If the log file is empty when this option is used, that's a bug that will have to be fixed before it's merged. |
It’s not totally empty, but ya the txt log isn’t written to after the handful of initial setup lines. Then, the progress bar takes over the downloading log output.
|
Well, it is intended behavior that most errors (except for critical) are suspended in the if self.progress_bar:
logger.setLevel(logging.CRITICAL) Otherwise, the terminal-based progress bar wouldn't work - a visual summary of downloads (with success or error) would be littered with messages not intended for the end user. |
Intended or not is irrelevant, sorry. Nothing you do can interfere with the file log. Change the printed log if you must but the file log must remain complete and unchanged. That's non-negotiable on any PR or change. |
I was dearly missing a progress bar, so with a few lines of code, I added tqdm.
As I checked, previously such a feature request was turned down (#469). So I am leaving it up to maintainers if it is for polishing & potentially merging, or for some reason progress bars are taboo.