Displaying MusicBrainz data in timelines

One feature I would like to be added to the MusicBrainz web interface is the possibility to display timelines, e.g. showing when members arrived or left on a band page, or sorting concerts, albums releases, and so on.

Before suggesting this feature to the MusicBrainz developers (or programing it myself), I want to try and display this kind of timelines in a Jupyter notebook and see how it would look like.

Several JavaScript libraries can do the job; I decided to test the timesheet-advanced.js and visjs libraries.

Setup

In [1]:
%load_ext watermark
%watermark --python -r
CPython 3.7.0b5
IPython 6.4.0
Git repo: git@bitbucket.org:loujine/musicbrainz-dataviz.git
In [2]:
%watermark --date --updated
last updated: 2018-06-10

The setup required to repeat these operations is explained in the introduction notebook. In case graphs do not appear in this page you can refer to the static version.

The next commands may be needed or not depending on your setup (i.e. if you use my docker setup):

In [3]:
import os
from pprint import pprint
import pandas
import sqlalchemy

# your postgres server IP
IP = 'localhost'

def sql(query, **kwargs):
    """helper function for SQL queries using the %(...) syntax
    Parameters defined globally are replaced implicitely"""
    params = globals().copy()
    params.update(kwargs)

    # define DB connection parameters if needed
    PGHOST = os.environ.get('PGHOST', IP)
    PGDATABASE = os.environ.get('PGDATABASE', 'musicbrainz')
    PGUSER = os.environ.get('PGUSER', 'musicbrainz')
    PGPASSWORD = os.environ.get('PGPASSWORD', 'musicbrainz')
    engine = sqlalchemy.create_engine(
       'postgresql+psycopg2://%(PGUSER)s:%(PGPASSWORD)s@%(PGHOST)s/%(PGDATABASE)s' % locals(),
        isolation_level='READ UNCOMMITTED')
    return pandas.read_sql(query, engine, params=params)

# helper functions to generate an HTML link to an entity MusicBrainz URL
def _mb_link(type, mbid):
    return '<a href="https://musicbrainz.org/%(type)s/%(mbid)s">%(mbid)s</a>' % locals()

mb_artist_link = lambda mbid: _mb_link('artist', mbid)

Extraction of band data from the database

Now we can extract the information for the band I want. The SQL query will look for:

  • band name
  • artists linked to this band through the "member of" relationship
  • instrument/vocal role of this relationship

Let's start with some band you probably already know:

In [4]:
band_name = 'The Beatles'

The SQL query is a bit complicated because it uses a lot of different tables, I won't go into details. We store the result in a data structure called a PanDas DataFrame (df).

In [5]:
df = sql("""
SELECT b.name AS band,
       m.name AS member,
       m.gid AS mbid,
       lat.name AS role,
       to_date(to_char(l.begin_date_year, '9999') || '0101', 'YYYYMMDD') AS start,
       to_date(to_char(l.end_date_year, '9999') || '0101', 'YYYYMMDD') AS end
FROM artist              AS b
JOIN l_artist_artist     AS laa ON laa.entity1 = b.id
JOIN artist              AS m   ON laa.entity0 = m.id
JOIN link                AS l   ON l.id = laa.link
JOIN link_attribute      AS la  ON la.link = l.id
JOIN link_attribute_type AS lat ON la.attribute_type = lat.id
JOIN link_type           AS lt  ON l.link_type = lt.id
WHERE lt.name = 'member of band'
  AND b.name = %(band_name)s
  AND lat.name != 'original';
""")
df
Out[5]:
band member mbid role start end
0 The Beatles Pete Best 0d4ab0f9-bbda-4ab1-ae2c-f772ffcfbea9 drums 1960-01-01 1962-01-01
1 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af lead vocals 1957-01-01 1970-01-01
2 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af bass guitar 1957-01-01 1970-01-01
3 The Beatles Ringo Starr 300c4c73-33ac-4255-9d57-4e32627f5e13 drums 1962-01-01 1970-01-01
4 The Beatles Stuart Sutcliffe 49a51491-650e-44b3-8085-2f07ac2986dd bass guitar 1960-01-01 1962-01-01
5 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 lead vocals None 1970-01-01
6 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 guitar None 1970-01-01
7 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 lead vocals 1958-01-01 1970-01-01
8 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 guitar 1958-01-01 1970-01-01

The data is here, we just want to set a start date for Lennon's roles since it is not in the database.

In [6]:
import datetime
df['start'] = df['start'].fillna(datetime.date(1957, 1, 1))
df['mbid'] = df['mbid'].astype(str) # otherwise PanDas uses the UUID data type which will cause problems later.
df
Out[6]:
band member mbid role start end
0 The Beatles Pete Best 0d4ab0f9-bbda-4ab1-ae2c-f772ffcfbea9 drums 1960-01-01 1962-01-01
1 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af lead vocals 1957-01-01 1970-01-01
2 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af bass guitar 1957-01-01 1970-01-01
3 The Beatles Ringo Starr 300c4c73-33ac-4255-9d57-4e32627f5e13 drums 1962-01-01 1970-01-01
4 The Beatles Stuart Sutcliffe 49a51491-650e-44b3-8085-2f07ac2986dd bass guitar 1960-01-01 1962-01-01
5 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 lead vocals 1957-01-01 1970-01-01
6 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 guitar 1957-01-01 1970-01-01
7 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 lead vocals 1958-01-01 1970-01-01
8 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 guitar 1958-01-01 1970-01-01

Display a timeline with timesheet-advanced

The timesheet-advanced package requires the input data for the timeline to be inserted slightly differently from what we have in our dataframe df. Let us first copy our data in a new variable ts and simplify the dates to years.

In [7]:
ts = df.copy()
ts['start'] = ts['start'].apply(lambda date: date.year).astype(str)
ts['end'] = ts['end'].apply(lambda date: date.year).astype(str)

We need a 'label' field (we'll choose the band member name + instrument) and we need a 'type' which is a color. We choose colors to represent all possible roles (vocals, guitar, drums....)

In [8]:
ts['label'] = df['member'] + ' (' + df['role'] + ')'
ts
Out[8]:
band member mbid role start end label
0 The Beatles Pete Best 0d4ab0f9-bbda-4ab1-ae2c-f772ffcfbea9 drums 1960 1962 Pete Best (drums)
1 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af lead vocals 1957 1970 Paul McCartney (lead vocals)
2 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af bass guitar 1957 1970 Paul McCartney (bass guitar)
3 The Beatles Ringo Starr 300c4c73-33ac-4255-9d57-4e32627f5e13 drums 1962 1970 Ringo Starr (drums)
4 The Beatles Stuart Sutcliffe 49a51491-650e-44b3-8085-2f07ac2986dd bass guitar 1960 1962 Stuart Sutcliffe (bass guitar)
5 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 lead vocals 1957 1970 John Lennon (lead vocals)
6 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 guitar 1957 1970 John Lennon (guitar)
7 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 lead vocals 1958 1970 George Harrison (lead vocals)
8 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 guitar 1958 1970 George Harrison (guitar)
In [9]:
colors = dict(zip(sorted(set(ts['role'])), ['red', 'blue', 'yellow', 'green']))
print('Correspondance between colors and roles: {}'.format(colors))
ts['type'] = ts['role'].apply(lambda role: colors[role])
ts
Correspondance between colors and roles: {'bass guitar': 'red', 'drums': 'blue', 'guitar': 'yellow', 'lead vocals': 'green'}
Out[9]:
band member mbid role start end label type
0 The Beatles Pete Best 0d4ab0f9-bbda-4ab1-ae2c-f772ffcfbea9 drums 1960 1962 Pete Best (drums) blue
1 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af lead vocals 1957 1970 Paul McCartney (lead vocals) green
2 The Beatles Paul McCartney ba550d0e-adac-4864-b88b-407cab5e76af bass guitar 1957 1970 Paul McCartney (bass guitar) red
3 The Beatles Ringo Starr 300c4c73-33ac-4255-9d57-4e32627f5e13 drums 1962 1970 Ringo Starr (drums) blue
4 The Beatles Stuart Sutcliffe 49a51491-650e-44b3-8085-2f07ac2986dd bass guitar 1960 1962 Stuart Sutcliffe (bass guitar) red
5 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 lead vocals 1957 1970 John Lennon (lead vocals) green
6 The Beatles John Lennon 4d5447d7-c61c-4120-ba1b-d7f471d385b9 guitar 1957 1970 John Lennon (guitar) yellow
7 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 lead vocals 1958 1970 George Harrison (lead vocals) green
8 The Beatles George Harrison 42a8f507-8412-4611-854f-926571049fa0 guitar 1958 1970 George Harrison (guitar) yellow

We can also add a 'link' columns containing URLs to the MusicBrainz website:

In [10]:
ts['link'] = 'https://musicbrainz.org/artist/' + ts['mbid']
ts.drop('mbid', axis=1, inplace=True)
ts
Out[10]:
band member role start end label type link
0 The Beatles Pete Best drums 1960 1962 Pete Best (drums) blue https://musicbrainz.org/artist/0d4ab0f9-bbda-4...
1 The Beatles Paul McCartney lead vocals 1957 1970 Paul McCartney (lead vocals) green https://musicbrainz.org/artist/ba550d0e-adac-4...
2 The Beatles Paul McCartney bass guitar 1957 1970 Paul McCartney (bass guitar) red https://musicbrainz.org/artist/ba550d0e-adac-4...
3 The Beatles Ringo Starr drums 1962 1970 Ringo Starr (drums) blue https://musicbrainz.org/artist/300c4c73-33ac-4...
4 The Beatles Stuart Sutcliffe bass guitar 1960 1962 Stuart Sutcliffe (bass guitar) red https://musicbrainz.org/artist/49a51491-650e-4...
5 The Beatles John Lennon lead vocals 1957 1970 John Lennon (lead vocals) green https://musicbrainz.org/artist/4d5447d7-c61c-4...
6 The Beatles John Lennon guitar 1957 1970 John Lennon (guitar) yellow https://musicbrainz.org/artist/4d5447d7-c61c-4...
7 The Beatles George Harrison lead vocals 1958 1970 George Harrison (lead vocals) green https://musicbrainz.org/artist/42a8f507-8412-4...
8 The Beatles George Harrison guitar 1958 1970 George Harrison (guitar) yellow https://musicbrainz.org/artist/42a8f507-8412-4...

The last preparation step is to transform this Python data structure into a Javascript one that the timesheet library can read. We're going to use the fact that a Python list and a Javascript array are very close (we could also use JSON format to transform our data into smething JavaScript-compatible).

In [11]:
bubbles = [ts.loc[i].to_dict() for i in range(len(ts))]
print('First bubble:')
pprint(bubbles[0])
First bubble:
{'band': 'The Beatles',
 'end': '1962',
 'label': 'Pete Best (drums)',
 'link': 'https://musicbrainz.org/artist/0d4ab0f9-bbda-4ab1-ae2c-f772ffcfbea9',
 'member': 'Pete Best',
 'role': 'drums',
 'start': '1960',
 'type': 'blue'}

Perfect, bubbles contains our data. Time to do some javascript. The Jupyter notebook can display javascript code in an output cell by using the element.append magic.

To display the timeline inside this notebook we need to load the JS/CSS source of the timesheet-advanced package...

In [12]:
from IPython.display import HTML
HTML("""
<link rel="stylesheet" type="text/css" href="https://cdn.rawgit.com/ntucakovic/timesheet-advanced.js/ea3ee1ad/dist/timesheet.min.css" />
<script type="text/javascript" src="https://cdn.rawgit.com/ntucakovic/timesheet-advanced.js/ea3ee1ad/dist/timesheet-advanced.min.js"></script>
""")
Out[12]:

... and to create an output container for our timeline. This cell be filled when the next cell code (new Timesheet(...)) will be executed.

In [13]:
%%javascript
// this must be executed before the "from IPython.display import Javascript" block
element.append('<div id="timesheet-container" style="width: 100%;height: 100%;"></div>');

Last step: we call the Timesheet javascript command using the CSS/JS libraries loaded above, our input data (bubbles), the cell where we want our graph, and the timeline limit (min and max date). Executing the next cell will fill the output cell just above this block automatically.

In [14]:
from IPython.display import Javascript

Javascript("""
var bubbles = %s;
new Timesheet(bubbles, {
    container: 'timesheet-container',
    type: 'parallel',
    timesheetYearMin: %s,
    timesheetYearMax: %s,
    theme: 'light'
});
""" % (bubbles, ts['start'].min(), ts['end'].max()))
Out[14]:

We have our timeline now! As you can see, the same color is used for the same role consistently. The items on the timeline are clickable links bringing you to the artist page on MusicBrainz. If you can't see the timeline above you can find a static version on github.io

Display a timeline with vis.js

We can try to display the same data with another JavaScript library, vis.js. Again we will need to prepare the data.

In [15]:
v = df.copy()
v['start'] = v['start'].apply(lambda date: date.isoformat())
v['end'] = v['end'].apply(lambda date: date.isoformat())
v.drop('mbid', axis=1, inplace=True)
v['type'] = v['role'].apply(lambda role: colors[role])
v['label'] = v['member'] + ' (' + v['role'] + ')'
v
Out[15]:
band member role start end type label
0 The Beatles Pete Best drums 1960-01-01 1962-01-01 blue Pete Best (drums)
1 The Beatles Paul McCartney lead vocals 1957-01-01 1970-01-01 green Paul McCartney (lead vocals)
2 The Beatles Paul McCartney bass guitar 1957-01-01 1970-01-01 red Paul McCartney (bass guitar)
3 The Beatles Ringo Starr drums 1962-01-01 1970-01-01 blue Ringo Starr (drums)
4 The Beatles Stuart Sutcliffe bass guitar 1960-01-01 1962-01-01 red Stuart Sutcliffe (bass guitar)
5 The Beatles John Lennon lead vocals 1957-01-01 1970-01-01 green John Lennon (lead vocals)
6 The Beatles John Lennon guitar 1957-01-01 1970-01-01 yellow John Lennon (guitar)
7 The Beatles George Harrison lead vocals 1958-01-01 1970-01-01 green George Harrison (lead vocals)
8 The Beatles George Harrison guitar 1958-01-01 1970-01-01 yellow George Harrison (guitar)

This time we are not going to inject the data inside a javascript string executed by the notebook, we are going to attach the data as JSON to the webpage itself (window) so that vis.js can find it.

In [16]:
# Transform into JSON
data = [{'start': line.start,
         'end': line.end,
         'content': line.label,
         'className': line.type
        } for _, line in ts.iterrows()]

# Send to Javascript
import json
from IPython.display import Javascript
Javascript("""window.bandData={};""".format(json.dumps(data, indent=4)))
Out[16]:

We need to load the default CSS (from cdnjs.cloudflare.com) and add our custom CSS on top:

In [17]:
%%html
<link rel="stylesheet" type="text/css" href="https://cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis-timeline-graph2d.min.css" />
In [18]:
%%html
<style type="text/css">
    /* custom styles for individual items, load this after vis.css/vis-timeline-graph2d.min.css */
    .vis-item.red {
      background-color: red;
    }
    .vis-item.blue {
      background-color: blue;
    }
    .vis-item.yellow {
      background-color: yellow;
    }
    .vis-item.green {
      background-color: greenyellow;
    }
    .vis-item.vis-selected {
      background-color: white;
      border-color: black;
      color: black;
      box-shadow: 0 0 10px gray;
    }
</style>

In order to load the JS library itself, we can use the require mechanism inside the notebook:

In [19]:
%%javascript
element.append('<div id="vis-container" style="width: 100%;height: 100%;"></div>');

requirejs.config({
    paths: {
        vis: '//cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis'
    }
});

require(['vis'], function(vis){
  var data = new vis.DataSet(window.bandData);
  var options = {
    editable: false
  };
  // create the timeline
  var container = document.getElementById('vis-container');
  var timeline = new vis.Timeline(container, data, options);
})

And we have our timeline. Note that this time we can zoom/unzoom (with the mouse wheel) thanks to the vis.js library. You can also change the custom CSS above and see the timeline updated automatically.

Conclusion

In this notebook we used two different JS libraries to display the same data, extracted from the MusicBrainz DB. I hope I did not make things look too complicated and I convinced some of you to try and play with the MusicBrainz database :)