Select an article from {{ journal.name }}.
+ + {% if articles %} + + {% else %} +No articles found for {{ journal.name }}.
+diff --git a/README.md b/README.md new file mode 100644 index 0000000..f59d837 --- /dev/null +++ b/README.md @@ -0,0 +1,132 @@ +# Deferred Reports Plugin + +Generates CSV reports as background tasks and notifies users by email when their download is ready. Replaces the synchronous reporting pages for reports that are too large or slow to generate within a web request. + +## How it works + +1. A user requests a report from the manager interface, selecting a report type and any required parameters (date range, journal, etc.). +2. A `ReportTask` record is created in the database with status `pending`. +3. The `process_pending_reports` management command (run via cron) picks up pending tasks, generates the CSV file on disk, and marks the task `complete`. +4. The user receives an email with a download link. If the report fails after three attempts, a failure email is sent instead. +5. The user can download or re-request their report from the "My Reports" page. + +Reports are stored under `files/deferred_reports/` relative to `BASE_DIR`. The `cleanup_old_reports` command deletes reports and their files after 30 days (configurable). + +## Installation + +### 1. Install the plugin + +Clone the plugin into `src/plugins/` and run the following commands: + +```bash +python src/manage.py install_plugins +python src/manage.py migrate +``` + +`install_plugins` calls the plugin's `install()` function which registers the plugin and loads the email notification settings into the database. + +### 2. Install cron jobs + +The plugin ships its own `install_cron` command. Run it once after installation: + +```bash +python src/manage.py install_cron --module deferred_reports +``` + +Or, if you are running this manually, the two jobs that need to be scheduled are: + +| Command | Recommended schedule | Purpose | +|---|---|---| +| `process_pending_reports` | Every 5 minutes | Picks up and runs pending report tasks | +| `cleanup_old_reports` | Daily | Deletes reports and files older than 30 days | + +Example crontab entries (adjust paths to your environment): + +```cron +*/5 * * * * /path/to/venv/bin/python /path/to/src/manage.py process_pending_reports +0 3 * * * /path/to/venv/bin/python /path/to/src/manage.py cleanup_old_reports +``` + +## Management commands + +### `process_pending_reports` + +Processes report tasks in `pending` state. Tasks stuck in `processing` for longer than `--stuck-after` minutes are assumed to have crashed and reset to `pending`. A task that fails three times is marked permanently `failed` and the user is notified. + +``` +python src/manage.py process_pending_reports [--limit N] [--stuck-after MINUTES] +``` + +| Option | Default | Description | +|---|---|---| +| `--limit` | `5` | Maximum reports to process per run | +| `--stuck-after` | `10` | Minutes before a processing task is considered crashed | + +### `cleanup_old_reports` + +Deletes `ReportTask` records and their CSV files older than `--days` days. + +``` +python src/manage.py cleanup_old_reports [--days N] +``` + +### `install_cron` + +Installs the two cron jobs above for the current user. Requires the `python-crontab` package and `/usr/bin/crontab`. Pass `--action test` to print the crontab without writing it. + +``` +python src/manage.py install_cron [--action test|quiet] +``` + +## Available reports + +Reports marked **Instant** run synchronously on request and are available immediately on the My Reports page. Reports marked **Background** are queued and processed by the cron job; the user receives an email when the file is ready. + +| Report | Scope | Mode | Access | Filter | Description | +|---|---|---|---|---|---| +| Press Report | Press | Background | Editor | Date range | Submissions, publications, rejections, views, and downloads per journal. User counts are a current snapshot, not date-filtered. | +| Article Metrics | Journal | Background | Editor | Date range (metrics only) | One row per published article regardless of date. Date range filters access metrics only, not the article list. | +| Journal Usage by Month | Press | Background | Editor | Month range | Combined views and downloads per journal per month. Month range controls both the columns and the access events counted. | +| Production Times | Press | Background | Editor | Date range (assigned date) | Completed typesetting tasks assigned within the date range. Only tasks with both accepted and completed dates recorded are included. | +| Geographical Spread | Press/Journal | Background | Editor | Date range (access date) | Access events to published articles grouped by country. | +| Peer Review | Press/Journal | Background | Editor | Date range (requested date) | Completed review assignments where the review was requested within the date range. Only assignments with both acceptance and completion dates are included. | +| Article Citations | Press/Journal | Background | Editor | Year (citing work year) | Citation counts from Crossref data. Year filters by the year of the citing work; set to all-time for all years. Only articles with at least one recorded citation are included. | +| Journal Citations | Press | Instant | Editor | None | All-time citation totals per journal, aggregated from all Crossref citation records. | +| Journal Article Citations | Journal | Instant | Editor | None | All-time Crossref citation counts for articles in the selected journal that have at least one recorded citation. | +| Article Citing Works | Press | Instant | Editor | None | All Crossref citation records for a specific article. Select a journal then pick an article. | +| Article Authors | Press/Journal | Background | Editor | Date range (published date) | Authors included if they have at least one article published within the date range; all their published articles are then listed. | +| Peer Reviewers Data | Journal | Background | Editor | None | Lifetime review statistics for all reviewers with at least one assignment in the journal. | +| Author Submission Data | Journal | Background | Editor | None | All-time submission statistics for all accounts with the Author role in the journal. | +| Workflow Report | Press/Journal | Background | Editor | Month range (published date) | Lead times (submission-to-acceptance, acceptance-to-publication, submission-to-publication) for articles published within the selected months. | +| Workflow Stage Completion | Journal | Background | Editor | Month range (submitted date) | Time spent in each workflow stage for articles submitted within the selected months that have since been published. | +| Yearly Statistics | Journal | Instant | Editor | None | All years from the earliest submission on record to the current year. Each year's counts are based on submission date. | +| Articles Under Review | Journal | Instant | Editor | None | Live snapshot of open review assignments for articles currently in the Under Review stage. | +| Time to First Decision | Journal | Background | Editor | Date range (submitted date) | First editorial decision for articles submitted within the date range. | +| Crossref DOI URLs | Press/Journal | Instant | Editor | None | All published articles with a registered DOI. Includes supplementary file DOIs where available. | +| Crossref CrossCheck URLs | Press/Journal | Instant | Editor | None | All published articles with a registered DOI and a PDF galley. | +| License Report | Press | Background | Editor | Date range (published date) | Article counts grouped by licence and journal for articles published within the date range. | +| Preprints Metrics | Press | Background | Repository Manager | Date range (access date) | Views and downloads per preprint. Only preprints with at least one access in the date range appear. | +| Book Citations | Press | Instant | Staff | None | All-time citation counts per published book from Crossref BookLink data. Requires the Books plugin. | +| Book Citing Works | Press | Instant | Staff | None | All Crossref BookLink citation records for a specific book. Requires the Books plugin. | + +## Email notifications + +Two email templates are installed into Janeway's setting system by `install()`: + +| Setting name | Group | Description | +|---|---|---| +| `deferred_reports_report_ready` | `email` | Body sent when a report is ready to download | +| `subject_deferred_reports_report_ready` | `email_subject` | Subject for the ready email | +| `deferred_reports_report_failed` | `email` | Body sent when a report has permanently failed | +| `subject_deferred_reports_report_failed` | `email_subject` | Subject for the failure email | + +Templates can be customised per-journal via the Janeway manager email settings. The following context variables are available in both templates: + +| Variable | Description | +|---|---| +| `task` | The `ReportTask` instance | +| `task.report_name` | Human-readable report name | +| `task.user` | The user who requested the report | +| `task.error_message` | Error detail (failure email only) | +| `download_url` | Direct download link (success email only) | +| `my_reports_url` | Link to the user's reports list | diff --git a/__init__.py b/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/email.py b/email.py new file mode 100644 index 0000000..70873dd --- /dev/null +++ b/email.py @@ -0,0 +1,64 @@ +from django.core.mail import EmailMultiAlternatives +from django.template import Context, Template +from django.urls import reverse + +from utils import setting_handler +from utils.logger import get_logger + +logger = get_logger(__name__) + +def build_download_url(task): + path = reverse( + 'deferred_reports_download', + kwargs={'task_id': task.pk}, + ) + return f'{task.site_url.rstrip("/")}{path}' + + +def build_my_reports_url(task): + path = reverse('deferred_reports_my_reports') + return f'{task.site_url.rstrip("/")}{path}' + + +def send_success_email(task): + journal = task.journal + context = Context({ + 'task': task, + 'download_url': build_download_url(task), + 'my_reports_url': build_my_reports_url(task), + }) + subject = Template( + setting_handler.get_setting('email_subject', 'subject_deferred_reports_report_ready', journal).value + ).render(context) + body = Template( + setting_handler.get_setting('email', 'deferred_reports_report_ready', journal).value + ).render(context) + send_email(task.user.email, subject, body) + + +def send_failure_email(task): + journal = task.journal + context = Context({ + 'task': task, + 'my_reports_url': build_my_reports_url(task), + }) + subject = Template( + setting_handler.get_setting('email_subject', 'subject_deferred_reports_report_failed', journal).value + ).render(context) + body = Template( + setting_handler.get_setting('email', 'deferred_reports_report_failed', journal).value + ).render(context) + send_email(task.user.email, subject, body) + + +def send_email(to_address, subject, html_body): + try: + email = EmailMultiAlternatives( + subject=subject, + body='', + to=[to_address], + ) + email.attach_alternative(html_body, 'text/html') + email.send(fail_silently=True) + except Exception as e: + logger.exception('Failed to send report email to %s: %s', to_address, e) diff --git a/execute.py b/execute.py new file mode 100644 index 0000000..f4fb4b7 --- /dev/null +++ b/execute.py @@ -0,0 +1,61 @@ +from dataclasses import dataclass, field +from typing import Optional + +from django.utils import timezone + +from utils.logger import get_logger + +from plugins.deferred_reports import email as report_email + +logger = get_logger(__name__) + + +@dataclass +class InstantReportContext: + """ + A lightweight substitute for ReportTask used when running a report + synchronously. Generators access task.parameters, task.journal, and + task.pk — this dataclass provides exactly those attributes. + + Note: any generator that accesses other ReportTask attributes will + raise an AttributeError on the instant path. + """ + parameters: dict = field(default_factory=dict) + journal: Optional[object] = None + pk: str = 'instant' + + +def execute_report(task_id): + """Execute a report task, updating attempt_count and status.""" + from plugins.deferred_reports.models import ReportTask + from plugins.deferred_reports.generators import REPORT_GENERATORS + + task = None + try: + task = ReportTask.objects.get(pk=task_id) + task.status = ReportTask.STATUS_PROCESSING + task.attempt_count += 1 + task.save(update_fields=['status', 'attempt_count']) + + generator = REPORT_GENERATORS.get(task.report_type) + if not generator: + raise ValueError(f'Unknown report type: {task.report_type}') + + filepath = generator(task) + + task.status = ReportTask.STATUS_COMPLETE + task.file_path = filepath + task.completed = timezone.now() + task.save(update_fields=['status', 'file_path', 'completed']) + + report_email.send_success_email(task) + + except Exception as e: + logger.exception('Report task %s failed: %s', task_id, e) + if task is not None: + try: + task.error_message = str(e) + task.completed = timezone.now() + task.save(update_fields=['error_message', 'completed']) + except Exception as e: + logger.exception('Could not update failed task %s: %s', task_id, e) diff --git a/forms.py b/forms.py new file mode 100644 index 0000000..04c93e8 --- /dev/null +++ b/forms.py @@ -0,0 +1,77 @@ +from django import forms +from django.forms import ModelChoiceField + +from journal import models + + +class JournalChoiceField(ModelChoiceField): + def label_from_instance(self, obj): + return obj.name + + +class DateInput(forms.DateInput): + input_type = 'date' + + +class MonthInput(forms.DateInput): + input_type = 'month' + + +class ReportRequestForm(forms.Form): + """Base form for all report requests. Subclassed per report type.""" + report_type = forms.CharField(widget=forms.HiddenInput()) + + +class DateRangeReportForm(ReportRequestForm): + start_date = forms.DateField(widget=DateInput()) + end_date = forms.DateField(widget=DateInput()) + + +class MonthRangeReportForm(ReportRequestForm): + start_month = forms.DateField(widget=MonthInput()) + end_month = forms.DateField(widget=MonthInput()) + + +class JournalDateReportForm(ReportRequestForm): + journal = JournalChoiceField( + queryset=models.Journal.objects.all().order_by('code'), + label='Select a Journal', + ) + start_date = forms.DateField(widget=DateInput()) + end_date = forms.DateField(widget=DateInput()) + + +class JournalOnlyReportForm(ReportRequestForm): + """For reports that only need a journal (uses request.journal).""" + pass + + +class YearReportForm(ReportRequestForm): + year = forms.IntegerField() + all_time = forms.BooleanField( + required=False, + help_text='Ignores the year value.', + ) + + +class OptionalJournalReportForm(ReportRequestForm): + """For reports where a journal filter is optional.""" + journal = JournalChoiceField( + queryset=models.Journal.objects.all().order_by('code'), + label='Filter by Journal (optional)', + required=False, + empty_label='All Journals', + ) + + +class ArticleJournalSelectForm(ReportRequestForm): + """Stage 1 for article_citing_works: pick a journal.""" + journal = JournalChoiceField( + queryset=models.Journal.objects.all().order_by('code'), + label='Select a Journal', + ) + + +class BookCitingWorksForm(ReportRequestForm): + """No extra fields — books are listed in a table for selection.""" + pass diff --git a/generators.py b/generators.py new file mode 100644 index 0000000..6c630cf --- /dev/null +++ b/generators.py @@ -0,0 +1,1147 @@ +import csv as csv_module +from datetime import date, timedelta + +from dateutil.relativedelta import relativedelta +from django.db.models import ( + Avg, + Case, + CharField, + Count, + DateTimeField, + DurationField, + ExpressionWrapper, + F, + Func, + IntegerField, + Max, + Min, + OuterRef, + Q, + Subquery, + Value, + When, + fields, +) +from django.db.models.functions import TruncMonth +from django.template.defaultfilters import strip_tags +from django.urls import reverse +from django.utils import timezone +from django.utils.text import capfirst + +from core import models as core_models +from journal import models as jm +from metrics import models as mm +from production import models as pm +from review import models as rm +from review import logic as rl +from submission import models as sm + +from plugins.deferred_reports.utils import ( + get_journal, + parse_dates, + parse_months, + report_file_path, + timedelta_average, + timedelta_display, + write_csv, +) + +def generate_press_report(task): + start_date, end_date = parse_dates(task.parameters) + journals = jm.Journal.objects.filter(is_remote=False).order_by('code') + + submissions_subq = sm.Article.objects.filter( + journal=OuterRef('id'), + date_submitted__gte=start_date, + date_submitted__lte=end_date, + ).annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + published_subq = sm.Article.objects.filter( + journal=OuterRef('id'), + date_published__gte=start_date, + date_published__lte=end_date, + ).annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + rejected_subq = sm.Article.objects.filter( + journal=OuterRef('id'), + stage=sm.STAGE_REJECTED, + date_declined__gte=start_date, + date_declined__lte=end_date, + ).order_by().annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + views_subq = mm.ArticleAccess.objects.filter( + article__journal=OuterRef('id'), + accessed__gte=start_date, + accessed__lte=end_date, + type='view', + ).order_by().annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + downloads_subq = mm.ArticleAccess.objects.filter( + article__journal=OuterRef('id'), + accessed__gte=start_date, + accessed__lte=end_date, + type='download', + ).order_by().annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + journals = journals.annotate( + submitted=Subquery(submissions_subq, output_field=IntegerField()), + published=Subquery(published_subq, output_field=IntegerField()), + rejected=Subquery(rejected_subq, output_field=IntegerField()), + total_views=Subquery(views_subq, output_field=IntegerField()), + total_downloads=Subquery(downloads_subq, output_field=IntegerField()), + ) + + rows = [[ + 'Journal', 'Submissions', 'Published Submissions', + 'Rejected Submissions', 'Number of Users', 'Views', 'Downloads', + ]] + for journal in journals: + rows.append([ + journal.name, + journal.submitted, + journal.published, + journal.rejected, + len(journal.journal_users()), + journal.total_views, + journal.total_downloads, + ]) + + filepath = report_file_path(task.pk, 'press_report.csv') + write_csv(filepath, rows) + return filepath + +def generate_articles_report(task): + start_date, end_date = parse_dates(task.parameters) + journal = get_journal(task) + + f_editorial_delta = ExpressionWrapper( + F('date_published') - F('date_submitted'), + output_field=DurationField(), + ) + + articles = sm.Article.objects.filter( + date_published__lte=timezone.now(), + journal=journal, + ).select_related('section').annotate(editorial_delta=f_editorial_delta) + + abstract_views = mm.ArticleAccess.objects.filter( + article=OuterRef('id'), + accessed__gte=start_date, + accessed__lte=end_date, + galley_type__isnull=True, + ).order_by().annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + html_views = mm.ArticleAccess.objects.filter( + article=OuterRef('id'), + accessed__gte=start_date, + accessed__lte=end_date, + galley_type__in={'html', 'xml'}, + type='view', + ).order_by().annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + pdf_views = mm.ArticleAccess.objects.filter( + article=OuterRef('id'), + accessed__gte=start_date, + accessed__lte=end_date, + galley_type='pdf', + type='view', + ).annotate( + count=Func(F('id'), function='Count') + ).values('count') + + pdf_downloads = mm.ArticleAccess.objects.filter( + article=OuterRef('id'), + accessed__gte=start_date, + accessed__lte=end_date, + galley_type='pdf', + type='download', + ).order_by().annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + other_downloads = mm.ArticleAccess.objects.filter( + article=OuterRef('id'), + accessed__gte=start_date, + accessed__lte=end_date, + type='download', + ).exclude( + galley_type__in={'pdf'}, + ).order_by().annotate( + count=Func(F('id'), function='Count') + ).order_by('count').values('count') + + articles = articles.annotate( + abstract_views=Subquery(abstract_views, output_field=IntegerField()), + html_views=Subquery(html_views, output_field=IntegerField()), + pdf_views=Subquery(pdf_views, output_field=IntegerField()), + pdf_downloads=Subquery(pdf_downloads, output_field=IntegerField()), + other_downloads=Subquery(other_downloads, output_field=IntegerField()), + ) + + rows = [[ + 'ID', 'Title', 'Section', 'Date Submitted', 'Date Accepted', + 'Date Published', 'Days to Publication', 'Abstract Views', + 'HTML Views', 'PDF Views', 'PDF Downloads', 'Other Downloads', + ]] + for article in articles: + rows.append([ + article.pk, + strip_tags(article.title), + article.section.name if article.section else 'No Section', + article.date_submitted, + article.date_accepted, + article.date_published, + article.editorial_delta.days if article.editorial_delta else '', + article.abstract_views, + article.html_views, + article.pdf_views, + article.pdf_downloads, + article.other_downloads, + ]) + + filepath = report_file_path(task.pk, 'article_metrics.csv') + write_csv(filepath, rows) + return filepath + +def generate_usage_by_month_report(task): + date_parts = parse_months(task.parameters) + journals = jm.Journal.objects.filter(is_remote=False, hide_from_press=False) + journal_id_map = {j.id: j for j in journals} + + start = timezone.make_aware(timezone.datetime( + int(date_parts['start_month_y']), + int(date_parts['start_month_m']), + 1, + )) + end = timezone.make_aware(timezone.datetime( + int(date_parts['end_month_y']), + int(date_parts['end_month_m']), + 1, + ) + relativedelta(months=1)) + + journal_metrics = mm.ArticleAccess.objects.filter( + article__journal__in=journals, + type__in=['view', 'download'], + accessed__gte=start, + accessed__lt=end, + ).exclude( + galley_type__isnull=True, + ).annotate( + month=TruncMonth('accessed'), + ).values( + 'article__journal', 'month', + ).annotate( + total=Count('id'), + ).values( + 'article__journal', 'month', 'total', + ).order_by('article__journal', 'month') + + dates = [] + current = start + while current < end: + dates.append(current) + current += relativedelta(months=1) + + data = {} + requested_start = start + current_journal = None + for row in journal_metrics: + journal = journal_id_map.get(row['article__journal']) + if not journal: + continue + data.setdefault(journal, []) + if journal != current_journal: + if row['month'] != requested_start: + delta = relativedelta( + row['month'].date(), requested_start.date(), + ) + months_delta = (delta.years * 12) + delta.months + for _ in range(months_delta): + data[journal].append(0) + data[journal].append(row['total']) + current_journal = journal + + header = ['Journal'] + [d.strftime('%Y-%m') for d in dates] + rows = [header] + for journal, metrics in data.items(): + rows.append([journal.name] + metrics) + + filepath = report_file_path(task.pk, 'usage_by_month.csv') + write_csv(filepath, rows) + return filepath + +def generate_production_report(task): + start_date, end_date = parse_dates(task.parameters) + assignments = pm.TypesetTask.objects.filter( + completed__isnull=False, + accepted__isnull=False, + assigned__gte=start_date, + assigned__lte=end_date, + ) + + rows = [[ + 'Title', 'Journal', 'Typesetter', 'Assigned', 'Accepted', + 'Completed', 'Time to Acceptance', 'Time to Completion', + ]] + for a in assignments: + rows.append([ + strip_tags(a.assignment.article.title), + a.assignment.article.journal.code, + str(a.typesetter), + a.assigned, + a.accepted, + a.completed, + (a.accepted - a.assigned).days, + a.completed - a.accepted, + ]) + + filepath = report_file_path(task.pk, 'production_times.csv') + write_csv(filepath, rows) + return filepath + +def generate_geo_report(task): + start_date, end_date = parse_dates(task.parameters) + journal = get_journal(task) + + metrics = mm.ArticleAccess.objects.filter( + article__stage=sm.STAGE_PUBLISHED, + accessed__gte=start_date, + accessed__lte=end_date, + ).values('country__name').annotate( + country_count=Count('country'), + ) + + if journal: + metrics = metrics.filter(article__journal=journal) + + rows = [['Country', 'Count']] + for row in metrics: + rows.append([row.get('country__name'), row.get('country_count')]) + + filepath = report_file_path(task.pk, 'geographical_spread.csv') + write_csv(filepath, rows) + return filepath + +def generate_review_report(task): + start_date, end_date = parse_dates(task.parameters) + journal = get_journal(task) + + if journal: + articles = sm.Article.objects.filter(journal=journal) + else: + articles = sm.Article.objects.all() + + rows = [[ + 'Reviewer', 'Article', 'Date Requested', 'Date Accepted', + 'Date Due', 'Date Complete', 'Time to Acceptance', + 'Time to Completion', + ]] + for article in articles: + reviews = rm.ReviewAssignment.objects.filter( + article=article, + date_accepted__isnull=False, + date_complete__isnull=False, + date_requested__gte=start_date, + date_requested__lte=end_date, + ) + for review in reviews: + rows.append([ + review.reviewer.full_name(), + strip_tags(article.title), + review.date_requested, + review.date_accepted, + review.date_due, + review.date_complete, + review.date_accepted - review.date_requested, + review.date_complete - review.date_accepted, + ]) + + filepath = report_file_path(task.pk, 'peer_review.csv') + write_csv(filepath, rows) + return filepath + +def generate_citations_report(task): + year = task.parameters.get('year', date.today().year) + all_time = task.parameters.get('all_time', False) + journal = get_journal(task) + + all_articles = sm.Article.objects.filter( + articlelink__year__isnull=False, + ).distinct() + + if journal: + all_articles = all_articles.filter(journal=journal) + + data = all_articles if all_time else ( + all_articles.filter(articlelink__year=year).distinct() + ) + + rows = [['Title', 'Publication Date', 'Total Citations']] + for article in data: + count = ( + article.citation_count if all_time + else mm.ArticleLink.objects.filter(article=article, year=year).count() + ) + rows.append([strip_tags(article.title), article.date_published, count]) + + filepath = report_file_path(task.pk, 'article_citations.csv') + write_csv(filepath, rows) + return filepath + +def generate_journal_citations_report(task): + journals = jm.Journal.objects.filter(hide_from_press=False) + + rows = [['Journal', 'Total Citations']] + for journal in journals: + articles = sm.Article.objects.filter( + articlelink__year__isnull=False, + journal=journal, + ).distinct() + total = sum(a.citation_count for a in articles) + rows.append([journal.name, total]) + + filepath = report_file_path(task.pk, 'journal_citations.csv') + write_csv(filepath, rows) + return filepath + +def generate_authors_report(task): + start_date, end_date = parse_dates(task.parameters) + journal = get_journal(task) + + accounts = core_models.Account.objects.filter( + authors__date_published__gte=start_date, + authors__date_published__lte=end_date, + ) + if journal: + accounts = accounts.filter(authors__journal=journal) + + rows = [[ + 'Author Name', 'Author Email', 'Author Affiliation', + 'Article ID', 'Article Title', 'Date Published', + ]] + for account in accounts: + for article in account.published_articles(): + rows.append([ + account.full_name(), + account.email, + account.affiliation(), + article.id, + strip_tags(article.title), + article.date_published, + ]) + + filepath = report_file_path(task.pk, 'authors.csv') + write_csv(filepath, rows) + return filepath + +def generate_reviewers_report(task): + journal = get_journal(task) + if not journal: + raise ValueError('Reviewers report requires a journal.') + + reviewers = rm.ReviewAssignment.objects.filter( + article__journal=journal, + ).values( + 'reviewer', + 'reviewer__first_name', + 'reviewer__last_name', + ).annotate( + total_assignments=Count('id'), + accepted_assignments=Count( + 'id', filter=Q(date_accepted__isnull=False), + ), + declined_assignments=Count( + 'id', + filter=Q(date_declined__isnull=False, decision__isnull=True), + ), + withdrawn_assignments=Count( + 'id', filter=Q(decision='withdrawn'), + ), + completed_assignments=Count( + 'id', + filter=Q( + date_declined__isnull=True, + decision__isnull=False, + date_complete__isnull=False, + is_complete=True, + ), + ), + assignments_awaiting_response=Count( + 'id', + filter=Q( + decision__isnull=True, + date_accepted__isnull=True, + date_declined__isnull=True, + ), + ), + earliest_completed_review=Min( + 'date_complete', + filter=Q(is_complete=True, date_declined__isnull=True), + ), + latest_completed_review=Max( + 'date_complete', + filter=Q(is_complete=True, date_declined__isnull=True), + ), + average_rating=Avg('reviewerrating__rating'), + average_time_to_complete=Avg( + Case( + When( + date_requested__lte=F('date_complete'), + then=F('date_complete') - F('date_requested'), + ), + default=None, + output_field=fields.DurationField(), + ), + filter=Q( + date_complete__isnull=False, date_declined__isnull=True, + ), + ), + ) + + rows = [[ + 'ID', 'First Name', 'Last Name', 'Total Requests', + 'Accepted Requests', 'Declined Requests', 'Withdrawn Requests', + 'Completed Requests', 'Requests Awaiting Response', + 'Earliest Completed Review', 'Latest Completed Review', + 'Average Time to Completion', 'Average Rating', + ]] + for r in reviewers: + rows.append([ + r.get('reviewer'), + r.get('reviewer__first_name'), + r.get('reviewer__last_name'), + r.get('total_assignments'), + r.get('accepted_assignments'), + r.get('declined_assignments'), + r.get('withdrawn_assignments'), + r.get('completed_assignments'), + r.get('assignments_awaiting_response'), + r.get('earliest_completed_review'), + r.get('latest_completed_review'), + r.get('average_time_to_complete'), + r.get('average_rating'), + ]) + + filepath = report_file_path(task.pk, 'reviewers.csv') + write_csv(filepath, rows) + return filepath + +def generate_author_data_report(task): + journal = get_journal(task) + if not journal: + raise ValueError('Author data report requires a journal.') + + authors = core_models.Account.objects.filter( + accountrole__role__slug='author', + accountrole__journal=journal, + ).values( + 'id', 'username', 'first_name', 'last_name', 'salutation', + ).annotate( + total_articles=Count( + 'authors__id', + filter=Q( + authors__date_submitted__isnull=False, + authors__journal=journal, + ), + ), + accepted_articles=Count( + 'authors__id', + filter=Q( + authors__date_submitted__isnull=False, + authors__date_accepted__isnull=False, + authors__journal=journal, + ), + ), + declined_articles=Count( + 'authors__id', + filter=Q( + authors__date_submitted__isnull=False, + authors__date_declined__isnull=False, + authors__journal=journal, + ), + ), + published_articles=Count( + 'authors__id', + filter=Q( + authors__date_submitted__isnull=False, + authors__date_published__isnull=False, + authors__journal=journal, + ), + ), + ) + + rows = [] + for i, author in enumerate(authors): + if i == 0: + rows.append([capfirst(k) for k in author.keys()]) + rows.append(list(author.values())) + + if not rows: + rows = [['No data']] + + filepath = report_file_path(task.pk, 'author_data.csv') + write_csv(filepath, rows) + return filepath + +def generate_workflow_report(task): + date_parts = parse_months(task.parameters) + journal = get_journal(task) + + article_list = sm.Article.objects.filter( + date_published__year__gte=date_parts.get('start_month_y'), + date_published__month__gte=date_parts.get('start_month_m'), + date_published__year__lte=date_parts.get('end_month_y'), + date_published__month__lte=date_parts.get('end_month_m'), + ) + + if journal: + article_list = article_list.filter(journal=journal) + + submission_to_accept_days = [] + submission_to_publication_days = [] + accept_to_publication_days = [] + + for article in article_list: + if article.date_accepted and article.date_submitted: + article.submission_to_accept = ( + article.date_accepted - article.date_submitted + ) + submission_to_accept_days.append(article.submission_to_accept) + if article.date_published and article.date_accepted: + article.accept_to_publication = ( + article.date_published - article.date_accepted + ) + accept_to_publication_days.append(article.accept_to_publication) + if article.date_published and article.date_submitted: + article.submission_to_publication = ( + article.date_published - article.date_submitted + ) + submission_to_publication_days.append( + article.submission_to_publication, + ) + + averages = { + 'submission_to_accept_average': timedelta_average( + submission_to_accept_days, + ), + 'accept_to_publication_average': timedelta_average( + accept_to_publication_days, + ), + 'submission_to_publication_average': timedelta_average( + submission_to_publication_days, + ), + } + + rows = [ + [ + 'Submission to Acceptance Average', + 'Acceptance to Publication Average', + 'Submission to Publication Average', + ], + [ + timedelta_display(averages['submission_to_accept_average']), + timedelta_display(averages['accept_to_publication_average']), + timedelta_display(averages['submission_to_publication_average']), + ], + [ + 'ID', 'Title', 'DOI', 'Date Submitted', 'Date Accepted', + 'Date Published', 'Submission to Acceptance', + 'Acceptance to Publication', 'Submission to Publication', + ], + ] + for article in article_list: + rows.append([ + article.pk, + strip_tags(article.title), + article.get_doi(), + article.date_submitted, + article.date_accepted, + article.date_published, + getattr(article, 'submission_to_accept', ''), + getattr(article, 'accept_to_publication', ''), + getattr(article, 'submission_to_publication', ''), + ]) + + filepath = report_file_path(task.pk, 'workflow.csv') + write_csv(filepath, rows) + return filepath + +def generate_workflow_stages_report(task): + date_parts = parse_months(task.parameters) + journal = get_journal(task) + if not journal: + raise ValueError('Workflow stages report requires a journal.') + + start = timezone.make_aware(timezone.datetime( + int(date_parts['start_month_y']), + int(date_parts['start_month_m']), + 1, + )) + end = timezone.make_aware(timezone.datetime( + int(date_parts['end_month_y']), + int(date_parts['end_month_m']), + 1, + ) + relativedelta(months=1)) + + articles = sm.Article.objects.filter( + journal=journal, + date_submitted__range=[start, end], + date_published__isnull=False, + ) + + workflow_elements = core_models.WorkflowElement.objects.filter( + journal=journal, + element_name__in=core_models.WorkflowLog.objects.filter( + article__journal=journal, + article__date_submitted__range=[start, end], + article__date_published__isnull=False, + ).values('element__element_name'), + ).order_by('order') + + element_names = [e.element_name for e in workflow_elements] + + workflow_logs = core_models.WorkflowLog.objects.filter( + article__journal=journal, + article__date_submitted__range=[start, end], + article__date_published__isnull=False, + ).select_related('article', 'element').order_by('article', 'timestamp') + + workflow_times_dict = {} + for article in articles: + workflow_times_dict[article.id] = {name: None for name in element_names} + article_logs = workflow_logs.filter(article=article) + for index, wlog in enumerate(article_logs): + try: + next_log = article_logs[index + 1] + time_in = next_log.timestamp - wlog.timestamp + except IndexError: + time_in = article.date_published - wlog.timestamp + workflow_times_dict[article.pk][wlog.element.element_name] = time_in + + headers = ['Article Title', 'Date Submitted'] + [ + capfirst(n) for n in element_names + ] + rows = [headers] + for article in articles: + row = [strip_tags(article.title), article.date_submitted] + for name in element_names: + row.append(workflow_times_dict[article.id].get(name, '')) + rows.append(row) + + filepath = report_file_path(task.pk, 'workflow_stages.csv') + write_csv(filepath, rows) + return filepath + +def generate_yearly_stats_report(task): + journal = get_journal(task) + if not journal: + raise ValueError('Yearly stats report requires a journal.') + + earliest_year_qs = sm.Article.objects.filter( + journal=journal, + ).order_by('date_submitted').values('date_submitted__year').first() + + if not earliest_year_qs: + filepath = report_file_path(task.pk, 'yearly_stats.csv') + write_csv(filepath, [['No data']]) + return filepath + + earliest_year = earliest_year_qs['date_submitted__year'] + current_year = timezone.now().year + + rows = [[ + 'Year', 'Articles Submitted', 'In Review', 'Articles Accepted', + 'Articles Rejected', 'Articles Published', 'Articles Archived', + ]] + for year in range(earliest_year, current_year + 1): + stats = sm.Article.objects.filter( + journal=journal, date_submitted__year=year, + ).aggregate( + articles_submitted=Count('id'), + articles_in_review=Count( + Case( + When( + stage__in=['Assigned', 'Under Review', 'Under Revision'], + then='id', + ), + default=None, + output_field=IntegerField(), + ), + ), + articles_accepted=Count( + Case( + When(date_accepted__isnull=False, then='id'), + default=None, + output_field=IntegerField(), + ), + ), + articles_rejected=Count( + Case( + When(date_declined__isnull=False, then='id'), + default=None, + output_field=IntegerField(), + ), + ), + articles_published=Count( + Case( + When(date_published__isnull=False, then='id'), + default=None, + output_field=IntegerField(), + ), + ), + articles_archived=Count( + Case( + When(stage='Archived', then='id'), + default=None, + output_field=IntegerField(), + ), + ), + ) + rows.append([ + year, + stats['articles_submitted'], + stats['articles_in_review'], + stats['articles_accepted'], + stats['articles_rejected'], + stats['articles_published'], + stats['articles_archived'], + ]) + + filepath = report_file_path(task.pk, 'yearly_stats.csv') + write_csv(filepath, rows) + return filepath + +def generate_under_review_report(task): + journal = get_journal(task) + if not journal: + raise ValueError('Articles under review report requires a journal.') + + assignments = rm.ReviewAssignment.objects.filter( + article__stage=sm.STAGE_UNDER_REVIEW, + article__journal=journal, + ).select_related( + 'article', 'article__journal', 'reviewer', + ).order_by('article__title') + + rows = [[ + 'Title', 'First Name', 'Last Name', 'Email Address', + 'Reviewer Decision', 'Recommendation', 'Access Code', + 'Due Date', 'Date Complete', + ]] + for review in assignments: + rows.append([ + strip_tags(review.article.title), + review.reviewer.first_name, + review.reviewer.last_name, + review.reviewer.email, + review.request_decision_status(), + review.decision, + review.article.journal.site_url( + path=rl.generate_access_code_url( + 'do_review', review, review.access_code, + ), + ), + review.date_due, + review.date_complete, + ]) + + filepath = report_file_path(task.pk, 'articles_under_review.csv') + write_csv(filepath, rows) + return filepath + +def generate_first_decision_report(task): + start_date, end_date = parse_dates(task.parameters) + journal = get_journal(task) + if not journal: + raise ValueError('Time to first decision report requires a journal.') + + articles = sm.Article.objects.filter( + journal=journal, + date_submitted__gte=start_date, + date_submitted__lte=end_date, + ).annotate( + first_decision_date=ExpressionWrapper( + Func( + F('date_accepted'), + F('date_declined'), + F('revisionrequest__date_requested'), + function='LEAST', + ), + output_field=DateTimeField(), + ), + decision_type=Case( + When( + date_accepted=F('first_decision_date'), + then=Value('accept'), + ), + When( + date_declined=F('first_decision_date'), + then=Value('decline'), + ), + When( + revisionrequest__date_requested=F('first_decision_date'), + then=Value('revision'), + ), + default=Value('unknown'), + output_field=CharField(), + ), + ) + + rows = [[ + 'ID', 'Title', 'Date Submitted', 'First Decision Date', 'Decision', + ]] + for article in articles: + rows.append([ + article.pk, + strip_tags(article.title), + article.date_submitted, + article.first_decision_date, + article.decision_type, + ]) + + filepath = report_file_path(task.pk, 'time_to_first_decision.csv') + write_csv(filepath, rows) + return filepath + +def generate_journal_citations_detail_report(task): + journal = get_journal(task) + if not journal: + raise ValueError('Journal article citations report requires a journal.') + + articles = sm.Article.objects.filter( + articlelink__year__isnull=False, + journal=journal, + ).distinct() + + rows = [['Title', 'Publication Date', 'Total Citations']] + for article in articles: + rows.append([ + strip_tags(article.title), + article.date_published, + article.citation_count, + ]) + + filepath = report_file_path(task.pk, 'journal_article_citations.csv') + write_csv(filepath, rows) + return filepath + +def generate_article_citing_works_report(task): + article_id = task.parameters.get('article_id') + if not article_id: + raise ValueError('Article citing works report requires an article_id.') + + article = sm.Article.objects.get(pk=article_id) + + rows = [['Title', 'Journal', 'Year', 'DOI']] + for citing_work in article.articlelink_set.all(): + rows.append([ + citing_work.article_title, + citing_work.journal_title, + citing_work.year, + citing_work.doi, + ]) + + filepath = report_file_path(task.pk, 'article_citing_works.csv') + write_csv(filepath, rows) + return filepath + +def generate_book_citations_report(task): + from utils import plugins + if not plugins.check_plugin_exists('books'): + raise ValueError('The Books plugin is not installed.') + from plugins.books import models as book_models + + books = book_models.Book.objects.filter(date_published__lte=timezone.now()) + + rows = [['Title', 'DOI', 'Publication Date', 'Citations']] + for book in books: + citation_count = mm.BookLink.objects.filter( + doi=book.doi, + object_type='book', + ).count() + rows.append([book.title, book.doi, book.date_published, citation_count]) + + filepath = report_file_path(task.pk, 'book_citations.csv') + write_csv(filepath, rows) + return filepath + +def generate_book_citing_works_report(task): + from utils import plugins + if not plugins.check_plugin_exists('books'): + raise ValueError('The Books plugin is not installed.') + from plugins.books import models as book_models + + book_id = task.parameters.get('book_id') + if not book_id: + raise ValueError('Book citing works report requires a book_id.') + + book = book_models.Book.objects.get(pk=book_id) + links = mm.BookLink.objects.filter(doi=book.doi, object_type='book') + + rows = [['Title', 'DOI', 'ISBN (Print)', 'ISBN (Electronic)']] + for link in links: + rows.append([link.title, link.doi, link.isbn_print, link.isbn_electronic]) + + filepath = report_file_path(task.pk, 'book_citing_works.csv') + write_csv(filepath, rows) + return filepath + +def write_doi_tsv(filepath, journal=None, crosscheck=False): + from identifiers import models as id_models + + identifiers = id_models.Identifier.objects.filter( + article__isnull=False, + article__stage=sm.STAGE_PUBLISHED, + id_type='doi', + ) + if journal: + identifiers = identifiers.filter(article__journal=journal) + identifiers = identifiers.order_by('article__journal', 'id') + + with open(filepath, 'w', newline='', encoding='utf-8') as f: + writer = csv_module.writer(f, delimiter='\t', lineterminator='\n') + writer.writerow(['DOI', 'URL']) + for identifier in identifiers: + article = identifier.article + if crosscheck and article.pdfs.exists(): + path = reverse( + 'serve_article_pdf', + kwargs={ + 'identifier_type': 'id', + 'identifier': article.id, + }, + ) + url = article.journal.site_url(path) + else: + url = article.url + writer.writerow([identifier.identifier, url]) + + if not crosscheck: + from core import models as core_models_inner + for supp in core_models_inner.SupplementaryFile.objects.filter( + file__article_id=article.pk, + doi__isnull=False, + ): + writer.writerow([supp.doi, supp.url()]) + +def generate_crossref_dois_report(task): + journal = get_journal(task) + filepath = report_file_path(task.pk, 'crossref_doi_urls.tsv') + write_doi_tsv(filepath, journal=journal, crosscheck=False) + return filepath + +def generate_crossref_dois_crosscheck_report(task): + journal = get_journal(task) + filepath = report_file_path(task.pk, 'crossref_crosscheck_urls.tsv') + write_doi_tsv(filepath, journal=journal, crosscheck=True) + return filepath + +def generate_licenses_report(task): + start_date, end_date = parse_dates(task.parameters) + + articles = sm.Article.objects.filter( + date_published__lte=end_date, + date_published__gte=start_date, + ).values( + 'license', 'license__name', 'license__journal__code', + ).annotate( + lcount=Count('license'), + ).order_by('lcount') + + rows = [['License', 'Journal', 'Count']] + for row in articles: + rows.append([ + row.get('license__name'), + row.get('license__journal__code'), + row.get('lcount'), + ]) + + filepath = report_file_path(task.pk, 'licenses.csv') + write_csv(filepath, rows) + return filepath + +def generate_preprints_metrics_report(task): + start_date, end_date = parse_dates(task.parameters) + + from repository import models as repository_models + + preprints = repository_models.Preprint.objects.filter( + preprintaccess__accessed__gte=start_date, + preprintaccess__accessed__lte=end_date, + ).annotate( + total_views=Count( + 'preprintaccess', + filter=Q( + preprintaccess__file=None, + preprintaccess__accessed__date__gte=start_date, + preprintaccess__accessed__date__lte=end_date, + ), + ), + total_downloads=Count( + 'preprintaccess', + filter=Q( + preprintaccess__file__isnull=False, + preprintaccess__accessed__date__gte=start_date, + preprintaccess__accessed__date__lte=end_date, + ), + ), + ) + + rows = [['ID', 'Title', 'Date Published', 'Views', 'Downloads']] + for preprint in preprints: + rows.append([ + preprint.pk, + preprint.title, + preprint.date_published, + preprint.total_views, + preprint.total_downloads, + ]) + + filepath = report_file_path(task.pk, 'preprints_metrics.csv') + write_csv(filepath, rows) + return filepath + +# --------------------------------------------------------------------------- +# Generator registry +# --------------------------------------------------------------------------- + +REPORT_GENERATORS = { + 'press': generate_press_report, + 'articles': generate_articles_report, + 'usage_by_month': generate_usage_by_month_report, + 'production': generate_production_report, + 'geo': generate_geo_report, + 'review': generate_review_report, + 'citations': generate_citations_report, + 'journal_citations': generate_journal_citations_report, + 'journal_citations_detail': generate_journal_citations_detail_report, + 'article_citing_works': generate_article_citing_works_report, + 'book_citations': generate_book_citations_report, + 'book_citing_works': generate_book_citing_works_report, + 'crossref_dois': generate_crossref_dois_report, + 'crossref_dois_crosscheck': generate_crossref_dois_crosscheck_report, + 'licenses': generate_licenses_report, + 'preprints_metrics': generate_preprints_metrics_report, + 'authors': generate_authors_report, + 'reviewers': generate_reviewers_report, + 'author_data': generate_author_data_report, + 'workflow': generate_workflow_report, + 'workflow_stages': generate_workflow_stages_report, + 'yearly_stats': generate_yearly_stats_report, + 'under_review': generate_under_review_report, + 'first_decision': generate_first_decision_report, +} diff --git a/install/settings.json b/install/settings.json new file mode 100644 index 0000000..022cc87 --- /dev/null +++ b/install/settings.json @@ -0,0 +1,54 @@ +[ + { + "group": {"name": "email"}, + "setting": { + "name": "deferred_reports_report_ready", + "pretty_name": "Deferred Reports: Report Ready Email", + "description": "Email sent to a user when their async report is ready to download.", + "type": "rich-text", + "is_translatable": true + }, + "value": { + "default": "
Hello {{ task.user.first_name }},
\nYour report \"{{ task.report_name }}\" has been generated and is ready for download.
\n\n\nThis report was requested on {{ task.created|date:\"Y-m-d H:i\" }}.
" + } + }, + { + "group": {"name": "email_subject"}, + "setting": { + "name": "subject_deferred_reports_report_ready", + "pretty_name": "Deferred Reports: Report Ready Subject", + "description": "Subject line for the report ready email.", + "type": "char", + "is_translatable": true + }, + "value": { + "default": "Your report is ready: {{ task.report_name }}" + } + }, + { + "group": {"name": "email"}, + "setting": { + "name": "deferred_reports_report_failed", + "pretty_name": "Deferred Reports: Report Failed Email", + "description": "Email sent to a user when their async report has permanently failed.", + "type": "rich-text", + "is_translatable": true + }, + "value": { + "default": "Hello {{ task.user.first_name }},
\nUnfortunately, your report \"{{ task.report_name }}\" could not be generated.
\nError: {{ task.error_message }}
\n\nPlease try again or contact support if the problem persists.
" + } + }, + { + "group": {"name": "email_subject"}, + "setting": { + "name": "subject_deferred_reports_report_failed", + "pretty_name": "Deferred Reports: Report Failed Subject", + "description": "Subject line for the report failed email.", + "type": "char", + "is_translatable": true + }, + "value": { + "default": "Report generation failed: {{ task.report_name }}" + } + } +] diff --git a/management/__init__.py b/management/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/management/commands/__init__.py b/management/commands/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/management/commands/cleanup_old_reports.py b/management/commands/cleanup_old_reports.py new file mode 100644 index 0000000..2380e1a --- /dev/null +++ b/management/commands/cleanup_old_reports.py @@ -0,0 +1,46 @@ +import os +from datetime import timedelta + +from django.core.management.base import BaseCommand +from django.utils import timezone + +from plugins.deferred_reports.models import ReportTask + + +class Command(BaseCommand): + help = ( + 'Deletes report tasks and their CSV files older than a given ' + 'number of days (default: 30).' + ) + + def add_arguments(self, parser): + parser.add_argument( + '--days', + type=int, + default=30, + help='Delete reports older than this many days (default: 30).', + ) + + def handle(self, *args, **options): + days = options['days'] + cutoff = timezone.now() - timedelta(days=days) + + old_tasks = ReportTask.objects.filter(created__lt=cutoff) + count = old_tasks.count() + + for task in old_tasks: + if task.file_path and os.path.exists(task.file_path): + try: + os.remove(task.file_path) + except OSError as e: + self.stderr.write( + f'Could not delete {task.file_path}: {e}', + ) + + old_tasks.delete() + + self.stdout.write( + self.style.SUCCESS( + f'Deleted {count} report(s) older than {days} days.', + ), + ) diff --git a/management/commands/install_cron.py b/management/commands/install_cron.py new file mode 100644 index 0000000..b87ce51 --- /dev/null +++ b/management/commands/install_cron.py @@ -0,0 +1,101 @@ +import os + +from django.conf import settings +from django.core.management.base import BaseCommand + +try: + import crontab +except (ImportError, ModuleNotFoundError): + crontab = None + + +def find_job(tab, comment): + for job in tab: + if job.comment == comment: + return job + return None + + +class Command(BaseCommand): + """ + Installs cron jobs for the Deferred Reports plugin. + """ + + help = "Installs cron jobs for the Deferred Reports plugin." + + def add_arguments(self, parser): + parser.add_argument("--action", default="") + + def handle(self, *args, **options): + if not os.path.isfile("/usr/bin/crontab"): + self.stdout.write( + self.style.WARNING( + "WARNING: /usr/bin/crontab not found, skipping crontab config." + ) + ) + return + + if not crontab: + self.stdout.write( + self.style.WARNING( + "WARNING: crontab module is not installed, skipping crontab config." + ) + ) + return + + action = options.get("action") + tab = crontab.CronTab(user=True) + virtualenv = os.environ.get("VIRTUAL_ENV", None) + + cwd = settings.PROJECT_DIR.replace("/", "_") + + jobs = [ + { + "name": "{}_deferred_reports_process".format(cwd), + "time": 5, + "task": "process_pending_reports", + "type": "mins", + }, + { + "name": "{}_deferred_reports_cleanup".format(cwd), + "time": 3, + "task": "cleanup_old_reports", + "type": "daily", + }, + ] + + for job in jobs: + current_job = find_job(tab, job["name"]) + + if not current_job: + django_command = "{0}/manage.py {1}".format( + settings.BASE_DIR, job["task"] + ) + if virtualenv: + command = "%s/bin/python3 %s" % (virtualenv, django_command) + else: + command = "%s" % (django_command) + + cron_job = tab.new(command, comment=job["name"]) + + if job.get("type") == "daily": + cron_job.setall("0 {} * * *".format(job["time"])) + else: + cron_job.minute.every(job["time"]) + + self.stdout.write( + self.style.SUCCESS( + "Installed cron job: {name}".format(name=job["name"]) + ) + ) + else: + self.stdout.write( + "{name} cron job already exists.".format(name=job["name"]) + ) + + if action == "test": + self.stdout.write(tab.render()) + elif action == "quiet": + pass + else: + tab.write() diff --git a/management/commands/process_pending_reports.py b/management/commands/process_pending_reports.py new file mode 100644 index 0000000..ac54e7d --- /dev/null +++ b/management/commands/process_pending_reports.py @@ -0,0 +1,124 @@ +from datetime import timedelta + +from django.core.management.base import BaseCommand +from django.utils import timezone + +from plugins.deferred_reports.models import ReportTask +from plugins.deferred_reports import email as report_email +from plugins.deferred_reports.execute import execute_report + + +class Command(BaseCommand): + help = ( + 'Processes any report tasks in pending state. ' + 'Intended to be run regularly via cron. ' + 'Tasks stuck in processing for longer than --stuck-after minutes ' + 'are assumed to have crashed and are reset to pending.' + ) + + MAX_ATTEMPTS = 3 + + def add_arguments(self, parser): + parser.add_argument( + '--limit', + type=int, + default=5, + help='Maximum number of reports to process per run (default: 5).', + ) + parser.add_argument( + '--stuck-after', + type=int, + default=10, + help=( + 'Minutes a task can be in processing state before it is ' + 'considered crashed and reset to pending (default: 10).' + ), + ) + + def handle(self, *args, **options): + stuck_after = options['stuck_after'] + cutoff = timezone.now() - timedelta(minutes=stuck_after) + + stuck = ReportTask.objects.filter( + status=ReportTask.STATUS_PROCESSING, + created__lt=cutoff, + ) + if stuck.exists(): + count = stuck.count() + stuck.update(status=ReportTask.STATUS_PENDING) + self.stdout.write( + f'Reset {count} task(s) stuck in processing for ' + f'>{stuck_after} minutes.', + ) + + # Mark exhausted tasks as permanently failed before picking up work. + exhausted = ReportTask.objects.filter( + status=ReportTask.STATUS_PENDING, + attempt_count__gte=self.MAX_ATTEMPTS, + ) + if exhausted.exists(): + count = exhausted.count() + exhausted.update(status=ReportTask.STATUS_FAILED) + self.stdout.write( + self.style.WARNING( + f'Marked {count} task(s) as failed after ' + f'{self.MAX_ATTEMPTS} attempts.', + ) + ) + for task in ReportTask.objects.filter( + status=ReportTask.STATUS_FAILED, + attempt_count__gte=self.MAX_ATTEMPTS, + ): + report_email.send_failure_email(task) + + limit = options['limit'] + pending = ReportTask.objects.filter( + status=ReportTask.STATUS_PENDING, + ).order_by('created')[:limit] + count = len(pending) + + total_pending = ReportTask.objects.filter( + status=ReportTask.STATUS_PENDING, + ).count() + + if not count: + self.stdout.write('No pending report tasks.') + return + + self.stdout.write( + f'Found {total_pending} pending report task(s) total; ' + f'processing {count} (limit: {limit}).', + ) + + for task in pending: + journal_label = f' [{task.journal.code}]' if task.journal else '' + params_label = ( + ', '.join(f'{k}={v}' for k, v in task.parameters.items()) + if task.parameters else 'no parameters' + ) + self.stdout.write( + f' [{task.pk}] {task.report_name}{journal_label} ' + f'requested by {task.user.email} ' + f'at {task.created.strftime("%Y-%m-%d %H:%M")} ' + f'({params_label}) [attempt {task.attempt_count + 1}/{self.MAX_ATTEMPTS}]', + ) + execute_report(task.pk) + task.refresh_from_db() + if task.status == ReportTask.STATUS_COMPLETE: + self.stdout.write( + self.style.SUCCESS(f' -> complete: {task.file_path}'), + ) + else: + self.stdout.write( + self.style.ERROR( + f' -> failed (attempt {task.attempt_count}/' + f'{self.MAX_ATTEMPTS}): {task.error_message}' + ), + ) + if task.attempt_count < self.MAX_ATTEMPTS: + task.status = ReportTask.STATUS_PENDING + task.save(update_fields=['status']) + + self.stdout.write( + self.style.SUCCESS(f'Done. Processed {count} report task(s).'), + ) diff --git a/migrations/0001_initial.py b/migrations/0001_initial.py new file mode 100644 index 0000000..a18e836 --- /dev/null +++ b/migrations/0001_initial.py @@ -0,0 +1,73 @@ +from django.conf import settings +from django.db import migrations, models +import django.db.models.deletion +import django.utils.timezone + + +class Migration(migrations.Migration): + + initial = True + + dependencies = [ + migrations.swappable_dependency(settings.AUTH_USER_MODEL), + ('journal', '0001_initial'), + ] + + operations = [ + migrations.CreateModel( + name='ReportTask', + fields=[ + ('id', models.AutoField( + auto_created=True, + primary_key=True, + serialize=False, + verbose_name='ID', + )), + ('report_type', models.CharField(max_length=50)), + ('report_name', models.CharField(max_length=255)), + ('parameters', models.JSONField(blank=True, default=dict)), + ('status', models.CharField( + choices=[ + ('pending', 'Pending'), + ('processing', 'Processing'), + ('complete', 'Complete'), + ('failed', 'Failed'), + ], + default='pending', + max_length=20, + )), + ('created', models.DateTimeField( + default=django.utils.timezone.now, + )), + ('completed', models.DateTimeField( + blank=True, null=True, + )), + ('file_path', models.CharField( + blank=True, default='', max_length=500, + )), + ('error_message', models.TextField( + blank=True, default='', + )), + ('site_url', models.CharField( + blank=True, + default='', + help_text='Base site URL captured at request time for email links.', + max_length=500, + )), + ('journal', models.ForeignKey( + blank=True, + null=True, + on_delete=django.db.models.deletion.CASCADE, + to='journal.journal', + )), + ('user', models.ForeignKey( + on_delete=django.db.models.deletion.CASCADE, + related_name='report_tasks', + to=settings.AUTH_USER_MODEL, + )), + ], + options={ + 'ordering': ['-created'], + }, + ), + ] diff --git a/migrations/0002_reporttask_attempt_count.py b/migrations/0002_reporttask_attempt_count.py new file mode 100644 index 0000000..6dcfcc6 --- /dev/null +++ b/migrations/0002_reporttask_attempt_count.py @@ -0,0 +1,16 @@ +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ('deferred_reports', '0001_initial'), + ] + + operations = [ + migrations.AddField( + model_name='reporttask', + name='attempt_count', + field=models.PositiveSmallIntegerField(default=0), + ), + ] diff --git a/migrations/__init__.py b/migrations/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/models.py b/models.py new file mode 100644 index 0000000..ce2a4a2 --- /dev/null +++ b/models.py @@ -0,0 +1,62 @@ +from django.db import models +from django.utils import timezone + + +class ReportTask(models.Model): + STATUS_PENDING = 'pending' + STATUS_PROCESSING = 'processing' + STATUS_COMPLETE = 'complete' + STATUS_FAILED = 'failed' + + STATUS_CHOICES = ( + (STATUS_PENDING, 'Pending'), + (STATUS_PROCESSING, 'Processing'), + (STATUS_COMPLETE, 'Complete'), + (STATUS_FAILED, 'Failed'), + ) + + user = models.ForeignKey( + 'core.Account', + on_delete=models.CASCADE, + related_name='report_tasks', + ) + journal = models.ForeignKey( + 'journal.Journal', + on_delete=models.CASCADE, + null=True, + blank=True, + ) + report_type = models.CharField(max_length=50) + report_name = models.CharField(max_length=255) + parameters = models.JSONField(default=dict, blank=True) + status = models.CharField( + max_length=20, + choices=STATUS_CHOICES, + default=STATUS_PENDING, + ) + attempt_count = models.PositiveSmallIntegerField(default=0) + created = models.DateTimeField(default=timezone.now) + completed = models.DateTimeField(null=True, blank=True) + file_path = models.CharField(max_length=500, blank=True, default='') + error_message = models.TextField(blank=True, default='') + site_url = models.CharField( + max_length=500, + blank=True, + default='', + help_text='Base site URL captured at request time for email links.', + ) + + class Meta: + ordering = ['-created'] + + def __str__(self): + return f'{self.report_name} ({self.status})' + + @property + def is_downloadable(self): + import os + return ( + self.status == self.STATUS_COMPLETE + and self.file_path + and os.path.exists(self.file_path) + ) diff --git a/plugin_settings.py b/plugin_settings.py new file mode 100644 index 0000000..b06ad51 --- /dev/null +++ b/plugin_settings.py @@ -0,0 +1,376 @@ +from utils import plugins + +# --------------------------------------------------------------------------- +# Permission level constants +# --------------------------------------------------------------------------- + +PERM_EDITOR = 'editor' +PERM_STAFF = 'staff' +PERM_REPOSITORY_MANAGER = 'repository_manager' + +# --------------------------------------------------------------------------- +# Report type registry +# --------------------------------------------------------------------------- + +REPORT_TYPES = { + 'press': { + 'name': 'Press Report', + 'description': ( + 'One row per non-remote journal. The date range filters ' + 'submissions (by submitted date), publications (by published ' + 'date), rejections (by declined date), and views and downloads ' + '(by access date). User counts are a current snapshot and are ' + 'not date-filtered.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'articles': { + 'name': 'Article Metrics', + 'description': ( + 'One row per published article in the journal, covering all ' + 'currently published articles regardless of date. ' + 'The date range filters the access metrics (views and downloads) ' + 'only; it does not affect which articles are included.' + ), + 'form': 'JournalDateReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + }, + 'usage_by_month': { + 'name': 'Journal Usage by Month', + 'description': ( + 'Pivoted table with one row per journal and one column per ' + 'calendar month. The month range determines both which months ' + 'appear as columns and which access events are counted. ' + 'Abstract-only page views (no galley) are excluded. ' + 'Hidden and remote journals are excluded.' + ), + 'form': 'MonthRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'production': { + 'name': 'Production Times', + 'description': ( + 'One row per typesetting task. The date range filters on the ' + 'assignment date. Only tasks where both the accepted and ' + 'completed dates are recorded are included.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'geo': { + 'name': 'Geographical Spread', + 'description': ( + 'Access events (views and downloads) to published articles ' + 'grouped by country. The date range filters the access events. ' + 'Can be scoped to a single journal or run press-wide.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'review': { + 'name': 'Peer Review', + 'description': ( + 'One row per completed review assignment. The date range filters ' + 'on the date the review was requested. Only assignments where ' + 'both the acceptance and completion dates are recorded are ' + 'included. Can be scoped to a single journal or run press-wide.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'citations': { + 'name': 'Article Citations', + 'description': ( + 'Citation counts from Crossref data. The year parameter filters ' + 'which citation records are counted by the year of the citing ' + 'work; select all-time to include citations from every year. ' + 'Only articles with at least one recorded citation are included. ' + 'Can be scoped to a single journal or run press-wide.' + ), + 'form': 'YearReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'journal_citations': { + 'name': 'Journal Citations', + 'description': ( + 'All-time Crossref citation totals per journal. No date ' + 'filtering; covers all citation records regardless of year. ' + 'One row per visible journal.' + ), + 'form': 'ReportRequestForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + 'instant': True, + }, + 'authors': { + 'name': 'Article Authors', + 'description': ( + 'One row per author per published article. The date range ' + 'filters which authors are included: only those with at least ' + 'one article published within the range. All published articles ' + 'for each matched author are then listed, not just those within ' + 'the range. Can be scoped to a single journal or run press-wide.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'reviewers': { + 'name': 'Peer Reviewers Data', + 'description': ( + 'Lifetime review statistics per reviewer for the selected ' + 'journal. No date filtering; covers all assignments on record. ' + 'Only reviewers with at least one assignment in the journal ' + 'are included.' + ), + 'form': 'JournalOnlyReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + }, + 'author_data': { + 'name': 'Author Submission Data', + 'description': ( + 'Submission statistics per author for the selected journal. ' + 'No date filtering; covers all submissions on record. ' + 'Only accounts with the Author role in the journal are included.' + ), + 'form': 'JournalOnlyReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + }, + 'workflow': { + 'name': 'Workflow Report', + 'description': ( + 'Lead-time analysis: submission-to-acceptance, ' + 'acceptance-to-publication, and submission-to-publication. ' + 'The month range filters on publication date; only articles ' + 'published within the selected months are included. ' + 'Opens with summary averages followed by a per-article ' + 'breakdown. Can be scoped to a single journal or run ' + 'press-wide.' + ), + 'form': 'MonthRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'workflow_stages': { + 'name': 'Workflow Stage Completion', + 'description': ( + 'Time spent in each workflow stage per article. The month range ' + 'filters on submission date; only articles submitted within ' + 'the selected months that have since been published are ' + 'included. Stage columns are dynamic, based on the workflow ' + 'elements configured for the journal.' + ), + 'form': 'MonthRangeReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + }, + 'yearly_stats': { + 'name': 'Yearly Statistics', + 'description': ( + 'Year-by-year submission funnel for the selected journal. ' + 'No date parameters; automatically covers all years from the ' + 'earliest submission on record to the current year. Each ' + 'year\'s counts are based on submission date.' + ), + 'form': 'JournalOnlyReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + 'instant': True, + }, + 'under_review': { + 'name': 'Articles Under Review', + 'description': ( + 'Live snapshot of all open review assignments for articles ' + 'currently in the Under Review stage. No date filtering; ' + 'reflects the current state of the journal at the time of ' + 'the request.' + ), + 'form': 'JournalOnlyReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + 'instant': True, + }, + 'first_decision': { + 'name': 'Time to First Decision', + 'description': ( + 'First editorial decision (accept, decline, or revision request) ' + 'for each article. The date range filters on submission date; ' + 'only articles submitted within the range are included.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + }, + 'journal_citations_detail': { + 'name': 'Journal Article Citations', + 'description': ( + 'All-time Crossref citation counts for articles in the selected ' + 'journal. No date filtering; covers all citation records ' + 'regardless of year. Only articles with at least one recorded ' + 'citation are included.' + ), + 'form': 'JournalOnlyReportForm', + 'needs_journal': True, + 'permission': PERM_EDITOR, + 'instant': True, + }, + 'article_citing_works': { + 'name': 'Article Citing Works', + 'description': ( + 'All works recorded in Crossref data as citing a specific ' + 'article. No date filtering; covers all citation records ' + 'on file. Select a journal then pick an article from the list.' + ), + 'form': 'ArticleJournalSelectForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + 'instant': True, + }, + 'book_citations': { + 'name': 'Book Citations', + 'description': ( + 'All-time citation counts per published book from Crossref ' + 'BookLink data, matched by DOI. No date filtering. ' + 'Requires the Books plugin.' + ), + 'form': 'ReportRequestForm', + 'needs_journal': False, + 'permission': PERM_STAFF, + 'instant': True, + }, + 'book_citing_works': { + 'name': 'Book Citing Works', + 'description': ( + 'All works recorded in Crossref BookLink data as citing a ' + 'specific book, matched by DOI. No date filtering. ' + 'Select a book from the list. Requires the Books plugin.' + ), + 'form': 'BookCitingWorksForm', + 'needs_journal': False, + 'permission': PERM_STAFF, + 'instant': True, + }, + 'crossref_dois': { + 'name': 'Crossref DOI URLs', + 'description': ( + 'Tab-separated file mapping every registered DOI to its article ' + 'URL. No date filtering; covers all published articles with a ' + 'registered DOI. Includes supplementary file DOIs where ' + 'available. Can be scoped to a single journal or exported ' + 'press-wide.' + ), + 'form': 'OptionalJournalReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + 'instant': True, + }, + 'crossref_dois_crosscheck': { + 'name': 'Crossref CrossCheck URLs', + 'description': ( + 'Tab-separated file mapping every registered DOI to the direct ' + 'URL of its full-text PDF galley, for submission to Crossref ' + 'CrossCheck (iThenticate). No date filtering; covers all ' + 'published articles with a registered DOI and a PDF galley. ' + 'Can be scoped to a single journal or exported press-wide.' + ), + 'form': 'OptionalJournalReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + 'instant': True, + }, + 'licenses': { + 'name': 'License Report', + 'description': ( + 'Article counts grouped by licence and journal. The date range ' + 'filters on publication date; only articles published within ' + 'the range are counted.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': False, + 'permission': PERM_EDITOR, + }, + 'preprints_metrics': { + 'name': 'Preprints Metrics', + 'description': ( + 'Views and downloads per preprint. The date range filters the ' + 'access events; only preprints with at least one access in ' + 'the range appear in the report. Views are page-level accesses ' + '(no file); downloads are file-level accesses. ' + 'Requires the repository to be active.' + ), + 'form': 'DateRangeReportForm', + 'needs_journal': False, + 'permission': PERM_REPOSITORY_MANAGER, + }, +} + +# --------------------------------------------------------------------------- +# Parameter display labels (used when rendering report parameters in the UI) +# --------------------------------------------------------------------------- + +PARAMETER_LABELS = { + 'start_date': 'From', + 'end_date': 'To', + 'start_month': 'From', + 'end_month': 'To', + 'year': 'Year', + 'all_time': 'All time', + 'article_id': 'Article ID', + 'book_id': 'Book ID', +} + +# --------------------------------------------------------------------------- +# File storage +# --------------------------------------------------------------------------- + +import os +from django.conf import settings + +REPORTS_DIR = os.path.join(settings.BASE_DIR, 'files', 'deferred_reports') + +# --------------------------------------------------------------------------- +# Plugin metadata +# --------------------------------------------------------------------------- + +PLUGIN_NAME = 'deferred_reports' +DESCRIPTION = ( + 'Generates reports as background tasks and notifies users by email ' + 'when their CSV download is ready.' +) +AUTHOR = 'Andy Byers' +VERSION = '1.0' +SHORT_NAME = 'deferred_reports' +DISPLAY_NAME = 'Deferred Reports' +MANAGER_URL = 'deferred_reports_index' +JANEWAY_VERSION = "1.5.1" + + +class ReportingAsyncPlugin(plugins.Plugin): + plugin_name = PLUGIN_NAME + display_name = DISPLAY_NAME + description = DESCRIPTION + author = AUTHOR + short_name = SHORT_NAME + version = VERSION + janeway_version = JANEWAY_VERSION + manager_url = MANAGER_URL + + +def install(): + from utils.install import update_settings + ReportingAsyncPlugin.install() + update_settings(file_path='plugins/deferred_reports/install/settings.json') + + +def hook_registry(): + return {} diff --git a/templates/deferred_reports/configure_article_citing_works.html b/templates/deferred_reports/configure_article_citing_works.html new file mode 100644 index 0000000..6426fa9 --- /dev/null +++ b/templates/deferred_reports/configure_article_citing_works.html @@ -0,0 +1,114 @@ +{% extends "admin/core/base.html" %} + +{% block title %}Configure Report: {{ info.name }}{% endblock %} +{% block title-section %}{{ info.name }}{% endblock %} +{% block title-sub %}{{ info.description }}{% endblock %} + +{% block breadcrumbs %} + {{ block.super }} +Select an article from {{ journal.name }}.
+ + {% if articles %} + + {% else %} +No articles found for {{ journal.name }}.
+No published books found.
+View and download your generated reports.
+ {% if tasks %} +| Report | +Status | +Requested | +Completed | +Actions | +
|---|---|---|---|---|
|
+ {{ task.report_name }}
+ {% if task.journal %}
+ {{ task.journal.name }} + {% endif %} + {% if task.parameters %} + + {% for key, value in task.parameters.items %} + {% if key != 'journal_id' %} + {{ key }}: {{ value }} + {% endif %} + {% endfor %} + {% endif %} + |
+ + {% if task.status == 'pending' %} + Pending + {% elif task.status == 'processing' %} + Processing + {% elif task.status == 'complete' %} + Complete + {% elif task.status == 'failed' %} + Failed + {% endif %} + | +{{ task.created|date:"Y-m-d H:i" }} | ++ {% if task.completed %} + {{ task.completed|date:"Y-m-d H:i" }} + {% else %} + - + {% endif %} + | ++ {% if task.is_downloadable %} + + View + + + Download CSV + + {% endif %} + {% if task.status == 'failed' %} + + Error: {{ task.error_message|truncatewords:10 }} + + {% endif %} + {% if task.status == 'pending' or task.status == 'processing' %} + In progress... + {% endif %} + + Delete + + | +
You have not requested any reports yet. + Request one now.
+ {% endif %} ++ Requested {{ task.created|date:"Y-m-d H:i" }} + {% if task.journal %} — {{ task.journal.name }}{% endif %} +
+ {% if display_parameters %} ++ {% for label, value in display_parameters %} + {% if not forloop.first %} — {% endif %} + {{ label }}: {{ value }} + {% endfor %} +
+ {% endif %} + +