Visualizing MusicBrainz instruments

One modern library for data visualization is d3js. You can find a gallery of examples online. I would like to use one of these examples, the Radial Tidy Tree to represent the instruments in MusicBrainz.

Fortunately Jupyter allows to call the d3js library from within a notebook, so that we can continue using Python3 as before:

This page exists as an HTML notebook on github.io and a static copy of the d3 graphs can be found on this page.

Setup

In [1]:
%run startup.ipy
Last notebook update: 2018-06-06
Git repo: git@bitbucket.org:loujine/musicbrainz-dataviz.git
Importing libs
Defining database parameters

Defining *sql* helper function
Last database update: 2018-06-02

Python packages versions:
numpy       1.14.3
pandas      0.23.0
sqlalchemy  1.2.8
CPython 3.7.0b5
IPython 6.4.0

Fetching instruments from the database

There is a table dedicated to instruments in the database and a few relations between instruments.

The list of instruments with their description can be found directly in MusicBrainz. The relations are explained here.

We're going to consider all relationships under the same parent/child umbrella.

In [2]:
df = sql("""
SELECT i0.name AS parent_instrument,
       i1.name AS child_instrument,
       i0.gid  AS parent_mbid,
       i1.gid  AS child_mbid
  FROM link_type               AS lt
  JOIN link                    AS l   ON l.link_type = lt.id
  JOIN l_instrument_instrument AS laa ON laa.link = l.id
  JOIN instrument              AS i0  ON i0.id = laa.entity0
  JOIN instrument              AS i1  ON i1.id = laa.entity1
 WHERE lt.name != 'related to'
   AND lt.name != 'hybrid of'
;""")
In [3]:
df.head()
Out[3]:
parent_instrument child_instrument parent_mbid child_mbid
0 natural brass instruments nabal e5781903-d6ef-4480-a158-60300265577c 4e22ddb3-6908-4a5f-a9ae-b8a7440f6c7c
1 recorder sopranino recorder 3cf4c0c9-160a-4d73-9243-7d0e0df17050 db7a69ea-4cae-44ed-94ab-a112b6bd7a3c
2 recorder subcontrabass recorder 3cf4c0c9-160a-4d73-9243-7d0e0df17050 0385a06d-dbed-4112-bfab-31b78590dd8f
3 recorder tenor recorder 3cf4c0c9-160a-4d73-9243-7d0e0df17050 4a6559f5-cbd3-4f72-8386-af028547ff30
4 double reed crumhorn ee570715-6ded-4cff-ad7e-feef6a5bca44 e1b9fc01-a349-444f-b798-9893b5af83f4

Preparing data for visualization

d3js requires a lot of preparation to display our data as a radial tree. I will not explain in detail how I do it, only the rough idea.

The first step is to create a tree-like structure (with dictionaries) to organize all instruments:

In [4]:
# create subdictionaries with relations {parent: child}
rels = {}
for t in df[['parent_instrument', 'child_instrument']].itertuples():
    rels.setdefault(t.parent_instrument, {})[t.child_instrument] = rels.setdefault(t.child_instrument, {})
    
# find instruments without parents
s = set(df.parent_instrument.tolist()).difference(df.child_instrument.tolist())

# create a main 'tree' dictionary as a global parent to those instruments
tree = {}
for el in s:
    tree[el] = rels[el]

# display only part of the tree, e.g. the bowed string instruments
from pprint import pprint
pprint(tree['strings']['bowed string instruments'])
{'Blaster Beam': {},
 'Cretan lyra': {},
 'alto violin': {},
 'arpeggione': {},
 'baryton': {},
 'bass violin': {'cello': {'electric cello': {}}},
 'bowed psaltery': {},
 'crwth': {},
 'djoza': {'kamancheh': {},
           'rebab': {'rebec': {}, 'rubab': {'sarod': {}}, 'sarod': {}}},
 'double bass': {'electric upright bass': {}},
 'gudok': {'gadulka': {}},
 'haegeum': {},
 'huqin': {'banhu': {},
           'chuurqin': {'morin khuur': {}},
           'cizhonghu': {},
           'diyingehu': {},
           'erhu': {'gaohu': {}, 'zhonghu': {}},
           'gaohu': {},
           'gehu': {},
           "jing'erhu": {},
           'jinghu': {},
           'yehu': {},
           'zhonghu': {},
           'zhuihu': {}},
 'igil': {},
 'jouhikko': {},
 'lirone': {},
 'nyckelharpa': {},
 'ravanahatha': {},
 'rebec': {},
 'sarangi': {'dilruba': {}, 'esraj': {}},
 'saw sam sai': {},
 'saw u': {'yehu': {}},
 'shichepshin': {},
 'soprano violin': {},
 'talharpa': {},
 'tenor violin': {},
 'treble violin': {},
 'tromba marina': {},
 "viola d'amore": {},
 'viola da gamba': {},
 'viola organista': {},
 'violin family': {'cello': {'electric cello': {}},
                   'double bass': {'electric upright bass': {}},
                   'viola': {'electric viola': {}, 'violoncello piccolo': {}},
                   'violin': {'alto violin': {},
                              'electric violin': {},
                              'soprano violin': {},
                              'tenor violin': {},
                              'treble violin': {},
                              'vielle': {},
                              'violotta': {}},
                   'violoncello piccolo': {}},
 'violino piccolo': {},
 'viololyra': {},
 'violoncello piccolo': {},
 'violone': {'double bass': {'electric upright bass': {}}},
 'yaylı tanbur': {},
 'ģīga': {}}

This looks ok. Now, knowing that d3 expects data formatted like {'id': 'root.violins.violin', 'value': ''}, we can prepare the data to convert to JavaScript. Let's try first by taking only the bowed string instruments in order to have relatively few data to display:

In [5]:
data = [{'id': 'strings', 'value': ''}]

def parse(subdict, prefix):
    for instrument_name, children in subdict.items():
        data.append({'id': prefix + instrument_name, 'value': 1000})
        if children != {}:
            parse(children, prefix=prefix + instrument_name + '.')

parse(tree['strings']['bowed string instruments'], prefix ='strings.')
pprint(data[:10])
[{'id': 'strings', 'value': ''},
 {'id': 'strings.jouhikko', 'value': 1000},
 {'id': 'strings.rebec', 'value': 1000},
 {'id': 'strings.violoncello piccolo', 'value': 1000},
 {'id': 'strings.saw u', 'value': 1000},
 {'id': 'strings.saw u.yehu', 'value': 1000},
 {'id': 'strings.haegeum', 'value': 1000},
 {'id': 'strings.nyckelharpa', 'value': 1000},
 {'id': 'strings.lirone', 'value': 1000},
 {'id': 'strings.violin family', 'value': 1000}]

Now we are ready to transfer the data to d3. The easiest is to use the notebook magic to store our data structure as a JavaScript array on the window global object.

In [6]:
# convert our Python dict to JavaScript array
from IPython.display import Javascript
Javascript("""window.stringsData=%s;""" % data)
Out[6]:

And now we copy the CSS and JS we need from the Radial Tidy Tree example

In [7]:
%%html
<style>

.node circle {
  fill: #999;
}

.node text {
  font: 10px sans-serif;
}

.node--internal circle {
  fill: #555;
}

.node--internal text {
  text-shadow: 0 1px 0 #fff, 0 -1px 0 #fff, 1px 0 0 #fff, -1px 0 0 #fff;
}

.link {
  fill: none;
  stroke: #555;
  stroke-opacity: 0.4;
  stroke-width: 1.5px;
}

</style>
In [8]:
%%javascript
element.append('<svg id="radial-string" width="1000" height="800"></svg>');

requirejs.config({
    paths: { 
        'd3': ['//cdnjs.cloudflare.com/ajax/libs/d3/4.7.4/d3.min'], 
    },
});

// from https://bl.ocks.org/mbostock/4063550

require(['d3'], function(d3) {

    var svg = d3.select("svg#radial-string"),
        width = +svg.attr("width"),
        height = +svg.attr("height"),
        g = svg.append("g").attr("transform", "translate(" + (width / 2) + "," + (height / 2) + ")");

    var stratify = d3.stratify()
        .parentId(function(d) { return d.id.substring(0, d.id.lastIndexOf(".")); });

    var tree = d3.tree()
        .size([360, 500])
        .separation(function(a, b) { return (a.parent == b.parent ? 1 : 2) / a.depth; });

    var root = tree(stratify(window.stringsData));

    var link = g.selectAll(".link")
    .data(root.descendants().slice(1))
    .enter().append("path")
      .attr("class", "link")
      .attr("d", function(d) {
        return "M" + project(d.x, d.y)
            + "C" + project(d.x, (d.y + d.parent.y) / 2)
            + " " + project(d.parent.x, (d.y + d.parent.y) / 2)
            + " " + project(d.parent.x, d.parent.y);
      });

    var node = g.selectAll(".node")
    .data(root.descendants())
    .enter().append("g")
      .attr("class", function(d) { return "node" + (d.children ? " node--internal" : " node--leaf"); })
      .attr("transform", function(d) { return "translate(" + project(d.x, d.y) + ")"; });

    node.append("circle")
        .attr("r", 2.5);

    node.append("text")
      .attr("dy", ".31em")
      .attr("x", function(d) { return d.x < 180 === !d.children ? 6 : -6; })
      .style("text-anchor", function(d) { return d.x < 180 === !d.children ? "start" : "end"; })
      .attr("transform", function(d) { return "rotate(" + (d.x < 180 ? d.x - 90 : d.x + 90) + ")"; })
      .text(function(d) { return d.id.substring(d.id.lastIndexOf(".") + 1); });

    function project(x, y) {
        var angle = (x - 90) / 180 * Math.PI, radius = y;
        return [radius * Math.cos(angle), radius * Math.sin(angle)];
    }
    
    return {};
});

Looks nice, doesn't it? If you can't see it I copied the result on github.io.

All instruments

What if we try the same with all instruments?

In [9]:
data = [{'id': 'root', 'value': ''}]

def parse(subdict, prefix):
    for instrument_name, children in subdict.items():
        data.append({'id': prefix + instrument_name, 'value': 1000})
        if children != {}:
            parse(children, prefix=prefix + instrument_name + '.')

parse(tree, prefix ='root.')
In [10]:
# convert our Python dict to JavaScript array
from IPython.display import Javascript
Javascript("""window.data={};""".format(data))
Out[10]:
In [11]:
%%javascript
element.append('<svg id="radial" width="1000" height="800"></svg>');

requirejs.config({
    paths: { 
        'd3': ['//cdnjs.cloudflare.com/ajax/libs/d3/4.7.4/d3.min'], 
    },
});

// from https://bl.ocks.org/mbostock/4063550

require(['d3'], function(d3) {

    var svg = d3.select("svg#radial"),
        width = +svg.attr("width"),
        height = +svg.attr("height"),
        g = svg.append("g").attr("transform", "translate(" + (width / 2) + "," + (height / 2) + ")");

    var stratify = d3.stratify()
        .parentId(function(d) { return d.id.substring(0, d.id.lastIndexOf(".")); });

    var tree = d3.tree()
        .size([360, 500])
        .separation(function(a, b) { return (a.parent == b.parent ? 1 : 2) / a.depth; });

    console.log(data);
    var root = tree(stratify(data));

    var link = g.selectAll(".link")
    .data(root.descendants().slice(1))
    .enter().append("path")
      .attr("class", "link")
      .attr("d", function(d) {
        return "M" + project(d.x, d.y)
            + "C" + project(d.x, (d.y + d.parent.y) / 2)
            + " " + project(d.parent.x, (d.y + d.parent.y) / 2)
            + " " + project(d.parent.x, d.parent.y);
      });

    var node = g.selectAll(".node")
    .data(root.descendants())
    .enter().append("g")
      .attr("class", function(d) { return "node" + (d.children ? " node--internal" : " node--leaf"); })
      .attr("transform", function(d) { return "translate(" + project(d.x, d.y) + ")"; });

    node.append("circle")
        .attr("r", 2.5);

    node.append("text")
      .attr("dy", ".31em")
      .attr("x", function(d) { return d.x < 180 === !d.children ? 6 : -6; })
      .style("text-anchor", function(d) { return d.x < 180 === !d.children ? "start" : "end"; })
      .attr("transform", function(d) { return "rotate(" + (d.x < 180 ? d.x - 90 : d.x + 90) + ")"; })
      .text(function(d) { return d.id.substring(d.id.lastIndexOf(".") + 1); });

    function project(x, y) {
        var angle = (x - 90) / 180 * Math.PI, radius = y;
        return [radius * Math.cos(angle), radius * Math.sin(angle)];
    }
    
    return {};
});

Copy on github.io.

This time we have probably too much data to display :)