Welcome to Metadator’s documentation!

Code Generator

This tool generates files for different purposes based on the metadata of publications.

At the moment the following formats are supported:

  • BibTeX files and subsequently a formatted suggested citation of publications

  • JSON files to create DOIs on DataCite

  • A PDF page that is prepended to the downloadable chapter files

  • ONIX files for insertion into OJS/OMP (unmaintained as of 2021)

  • XHTML files that show a table of contents (unmaintained as of 2021)

Dependencies

  • texlive-xetex

  • python3-pypdf2

  • python3-lxml

  • python3-bibtexparser

  • pandoc >= 2.11

Config file

The program requires a config file that stores some paths and, if a postgres database is used, the credentials for it. If several publication platforms are used, simply create different config files for each instance.

It should contain at least the following fields:

[output]
output_directory:

[server]
media_dir:
production_url:

Output files are written to output_directory. The entry media_dir in the server section should point to the path of the Django media directory on the server, while production_url points to the root URL of the publication platform.

When using SQLite, you can specify the database file like this:

[sqlite]
database_file:

Insert the postgres database credentials like this:

[postgres]
database_name:
user:
host:
password: ""

If a test repository for DOIs is available, the prefix for testing can be given here:

[doi]
testprefix: "10.80956"

Metadata formats

For all subsequent examples it is assumed that an sqlite database is used and a configuration file called apress. According to the naming scheme of EOA publications, an examplary publication Studies 23 is demonstrated here.

In some cases it will be easier to work locally on a desktop/laptop computer instead of directly on the server. In these cases, the SQLite database can be copied to the local machine.

The config apress.cfg might look like this (fictional values are used here):

[output]
output_directory: "generated_files"

[server]
media_dir: "/var/www/apress/eoapp/media/"
production_url: "http://example.com:9090"

[sqlite]
database_file: "~/apress.db"

[doi]
testprefix: "12.2342"

Suggested Citation

With the configuration in place, formatted citations will be generated like this:

python3 metadator.py --sqlite -f apress.cfg -b studies23

The tool pandoc is used in the background. For further convenience, a bibtex file is created along the way.

DOI creation

JSON files for the generation of DOIs for DataCite are created like so:

python3 metadator.py --sqlite -f apress.cfg -j -t studies23

The -t option will use the test repository. URLs for DataCite are hardcoded in the program code. Two shell scripts are created along the way: studies23_doiupload_test.sh is for batch uploading the JSON files into DataCite, while studies23_test_deletedraft.sh can be used to delete the test DOIs again after checking.

In both cases, the curl option --netrc is used which uses a file called ~/.netrc for storing the credentials. It contains entries like:

machine api.test.example.com
        login LOGIN
        password *****

If the test DOI entries look good, the proper DOIs can be created with either:

python3 metadator.py --sqlite -f apress.cfg -j studies23

or:

python3 metadator.py --sqlite -f apress.cfg -j -i studies23

The -i option will set the state of the DOIs instantly to publish rather than hide. With the first option the state of each entry has to be manually changed through the DataCite web interface. Published DOIs can not be deleted anymore. However, all the metadata can be modified at any time. This can also be done by re-using the JSON files (they should be kept alongside all the other data of the publication in version control), changing the relevant piece of information (stored in the example below as update.json) and a curl command similar to this where we use the URL https://api.datacite.org/dois/10.34663/9783945561577-00 as example:

curl --netrc --request PUT --header "Content-Type: application/vnd.api+json" --url {https://api.datacite.org/dois/10.34663/9783945561577-00}  -d @update.json

Chapter frontmatter

For your convenience, this script will not only create the frontmatter. It will also attach it to the existing chapter PDFs right away. This is in fact a relic of when the tool was first created and the complete backlist needed to be handled. Thus, this tool still requires that the chapter PDFs have been uploaded into the platform. A shell script will then be used to exchange the two PDF files.

The simplest command here is:

python3 metadator.py --sqlite -f apress.cfg -p studies23

which will check the database which of the chapters of that publication have a PDF attached. This will be downloaded and the frontmatter stuck on the front. Also, the PDF will be enriched with meaningful metadata.

Two more options are available:

  • -k will not delete the intermediate LaTeX files in case manual intervention is necessary

  • -r will remove the first page of the downloaded PDF in case an existing frontmatter is to be updated

Based on the information from the config file (the server/media_dir key) a file called studies23_copycommand.sh is created which will back up the existing file and copy the new PDF file into place. Access to the server is necessary here.