backend.maintenance module

This module contains functions that are not normally needed to run the platform, but that can be useful during development, or to cleanup various things in the database.

These functions are not integrated in the rest of the platform, so running them involves starting a Django shell (“python shell”), importing this module and running the function manually.


Run HTML sanitizing on the abstracts (this is normally done on creation of the papers, but not for old dumps of the database)


Deletes all the names that are not linked to any researcher


Ensures that all researcher_ids in Papers link to actual researchers


Deletes all the researchers who have not authored any paper.


Run HTML sanitizing on all the titles of the papers (this is normally done on creation of the papers, but not for old dumps of the database)

backend.maintenance.enumerate_large_qs(queryset, key=u'pk', batch_size=256)[source]

Enumerates a large queryset (milions of rows) efficiently


Recomputes all the fingerprints and reports those which would be merged by recompute_fingerprints()

backend.maintenance.merge_names(fro, to)[source]

Merges the name object ‘fro’ into ‘to


Recomputes the fingerprints of all papers, merging those who end up having the same fingerprint


Recomputes the publisher policy according to some possibly new criteria


Tries to assign containers to OaiRecords without containers


Tries to assign publishers to OaiRecords without Journals

backend.maintenance.update_index_for_model(model, batch_size=256, batches_per_commit=10, firstpk=0)[source]

More efficient update of the search index for large models such as Paper

  • batch_size – the number of instances to retrieve for each query
  • batches_per_commit – the number of batches after which we should commit to the search engine
  • firstpk – the instance to start with.

Should only be run if something went wrong, the backend is supposed to update the fields by itself