Visualizing a piano teaching "family tree"

I would like in this notebook to create the data structure necessary for a "teaching family tree", i.e. a graph of relations "X taught Y".

Setup

The setup required to repeat these operations is explained in the introduction notebook. In case graphs do not appear in this page you can refer to the static version.

In [1]:
%run startup.ipy
Last notebook update: 2018-06-06
Git repo: git@bitbucket.org:loujine/musicbrainz-dataviz.git
Importing libs
Defining database parameters

Defining *sql* helper function
Last database update: 2018-06-02

Python packages versions:
numpy       1.14.3
pandas      0.23.0
sqlalchemy  1.2.8
CPython 3.7.0b5
IPython 6.4.0

Basic example

Let's start simply by finding Heinrich Neuhaus's teachers and students

In [2]:
artist_name = 'Heinrich Neuhaus'
In [3]:
artist = sql("""
SELECT a.id,
       a.gid AS mbid,
       a.name,
       a.sort_name,
       to_date(to_char(a.begin_date_year, '9999') || 
               to_char(a.begin_date_month, '99') || 
               to_char(a.begin_date_day, '99'), 'YYYY MM DD') AS start,
       to_date(to_char(a.end_date_year, '9999') || 
               to_char(a.end_date_month, '99') || 
               to_char(a.end_date_day, '99'), 'YYYY MM DD') AS end
  FROM artist       AS a
  JOIN artist_alias AS aa ON aa.artist=a.id
 WHERE aa.name = %(artist_name)s;
""", artist_name=artist_name)

artist_mbid = str(artist['mbid'][0])
artist['url'] = artist.mbid.apply(mb_artist_link)
artist.drop('mbid', axis=1, inplace=True)

iplot(ff.create_table(artist[['sort_name', 'start', 'end', 'url']]))

Let's explore MusicBrainz relations to find the proper "X taught Y":

In [4]:
relation = sql("""
SELECT id, name, description
  FROM link_type
 WHERE description ILIKE '%%teach%%';
""")
relation_id = relation['id'][0]
print(relation.iloc[0].id, relation.iloc[0].description)
print(relation.iloc[1].id, relation.iloc[1].description)
847 This relationship indicates that a person was another's teacher/student.
893 This relationship indicates that a person was a teacher at an event (e.g. a masterclass).

So the one we need is relation id=847 (the other one is for events).

Looking for students

Let's now find the direct students:

In [5]:
students = sql("""
SELECT student.name,
       student.sort_name,
       student.gid AS mbid
  FROM artist          AS student
  JOIN l_artist_artist AS laa     ON laa.entity1 = student.id
  JOIN artist          AS teacher ON laa.entity0 = teacher.id
  JOIN link                       ON link.id = laa.link
 WHERE teacher.gid = %(artist_mbid)s
   AND link.link_type = 847;
""", artist_mbid=artist_mbid)

students['url'] = students.mbid.apply(mb_artist_link)
students.mbid = students.mbid.astype(str)
iplot(ff.create_table(students[['name', 'sort_name', 'url']]))

So we have a few of them in MusicBrainz.

Looking for teachers

In [6]:
teachers = sql("""
SELECT teacher.name,
       teacher.sort_name,
       teacher.gid AS mbid
  FROM artist AS student
  JOIN l_artist_artist AS laa     ON laa.entity1 = student.id
  JOIN artist          AS teacher ON laa.entity0 = teacher.id
  JOIN link                       ON link.id = laa.link
 WHERE student.gid = %(artist_mbid)s
   AND link.link_type = 847;
""", artist_mbid=artist_mbid)

teachers['url'] = teachers.mbid.apply(mb_artist_link)
teachers.mbid = teachers.mbid.astype(str)
iplot(ff.create_table(teachers[['name', 'sort_name', 'url']]))

Building a family tree

Now I want to use the visjs library to display a graph. My graph requires two javascript arrays:

  • 'nodes', to store the node labels and identifiers (MBIDs are obvious identifiers)
  • 'edges', to store the links between two MBIDs corresponding to one 'X taught Y' relation

First we create these structures in Python:

In [7]:
from pprint import pprint
nodes = []
nodes.append({'id': artist_mbid, 'label' : artist_name})
nodes.extend([{'id': student.mbid, 'label' : student.sort_name} 
              for student in students.itertuples()])
nodes.extend([{'id': teacher.mbid, 'label' : teacher.sort_name} 
              for teacher in teachers.itertuples()])
print('5 first nodes:')
pprint(nodes[:5])

edges = []
edges.extend([{'from': artist_mbid, 'to' : student.mbid} 
              for student in students.itertuples()])
edges.extend([{'from': teacher.mbid, 'to' : artist_mbid}
              for teacher in teachers.itertuples()])
print('5 first edges:')
pprint(edges[:5])
5 first nodes:
[{'id': '2b075237-6e90-4e78-a4f8-a66170c682fe', 'label': 'Heinrich Neuhaus'},
 {'id': '2014bfbb-c65d-45dc-9973-35bead3833fa',
  'label': 'Richter, Sviatoslav Teofilovich'},
 {'id': '88b4ad33-63ba-4923-947e-26a720631156',
  'label': 'Gilels, Emil Grigoryevich'},
 {'id': 'bec02ec4-34be-43db-9be1-006606174fd2', 'label': 'Lupu, Radu'},
 {'id': '8f3c027c-cd46-4df4-bda2-58c2f3e8ad1b', 'label': 'Khrennikov, Tikhon'}]
5 first edges:
[{'from': '2b075237-6e90-4e78-a4f8-a66170c682fe',
  'to': '2014bfbb-c65d-45dc-9973-35bead3833fa'},
 {'from': '2b075237-6e90-4e78-a4f8-a66170c682fe',
  'to': '88b4ad33-63ba-4923-947e-26a720631156'},
 {'from': '2b075237-6e90-4e78-a4f8-a66170c682fe',
  'to': 'bec02ec4-34be-43db-9be1-006606174fd2'},
 {'from': '2b075237-6e90-4e78-a4f8-a66170c682fe',
  'to': '8f3c027c-cd46-4df4-bda2-58c2f3e8ad1b'},
 {'from': '2b075237-6e90-4e78-a4f8-a66170c682fe',
  'to': '385cb39b-f06c-4635-84fd-ecc8fff56c97'}]

...and we convert them as JavaScript array stored on the global object:

In [8]:
# Send to Javascript
import json
from IPython.display import Javascript
Javascript("""
window.edgeData1=%s;
window.nodeData1=%s;
""" % (json.dumps(edges), json.dumps(nodes)))
Out[8]:
In [9]:
%%javascript
element.append('<div id="vis-container1" style="width: 100%; height: 500px;"></div>');

requirejs.config({
    paths: {
        vis: '//cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis'
    }
});

require(['vis'], function(vis) {
  // create the timeline
  var container = document.getElementById('vis-container1');
    var data = {
            nodes: new vis.DataSet(nodeData1),
            edges: new vis.DataSet(edgeData1)
        };
        var options = {
            layout: {
                hierarchical: {
                  sortMethod: 'directed'
                }
            },
            edges: {
                smooth: true,
                arrows: {to: true}
            }
        };
        var network = new vis.Network(container, data, options);
})

If you cannot see the graph above, a static version is available on github.io.

Developing the tree: can we make it recursive?

It would be nice to have our code recursively, to go back to the "original" teacher up to the latest student of this "family" (as far as MusicBrainz knows).

In [10]:
def teachers_from_mbid(mbid):
    df = sql("""
SELECT teacher.name,
       teacher.sort_name,
       teacher.gid        AS mbid
FROM artist          AS student
JOIN l_artist_artist AS laa     ON laa.entity1 = student.id
JOIN artist          AS teacher ON laa.entity0 = teacher.id
  JOIN link                     ON link.id = laa.link
 WHERE student.gid = %(mbid)s
   AND link.link_type = 847;
    """, mbid=mbid)
    df.mbid = df.mbid.astype(str)
    return df
In [11]:
def students_from_mbid(mbid):
    df = sql("""
SELECT student.name,
       student.sort_name,
       student.gid        AS mbid
  FROM artist          AS student
  JOIN l_artist_artist AS laa     ON laa.entity1 = student.id
  JOIN artist          AS teacher ON laa.entity0 = teacher.id
  JOIN link                       ON link.id = laa.link
 WHERE teacher.gid = %(mbid)s
   AND link.link_type = 847;
    """, mbid=mbid)
    df.mbid = df.mbid.astype(str)
    return df
In [12]:
artist_mbid = '2b075237-6e90-4e78-a4f8-a66170c682fe' # Neuhaus

We try to apply the 'find new teachers' function on known teachers. We of course avoid calling the function on teachers we already analyzed.

In [13]:
mbids_to_do = [artist_mbid]
nodes = {artist_mbid: {'id': artist_mbid, 'label': artist_name}}
edges = []
while mbids_to_do:
    mbid = mbids_to_do.pop()
    df = teachers_from_mbid(mbid)
    for artist in df.itertuples():
        mbids_to_do.append(artist.mbid)
        if artist.mbid not in nodes:
            nodes[artist.mbid] = {'id': artist.mbid, 'label': artist.name}
            edges.append({'from': artist.mbid, 'to': mbid})
            
print("Nodes (persons): ", len(nodes))
print("Edges (links): ", len(edges))
Nodes (persons):  55
Edges (links):  54
In [14]:
# Send to Javascript
import json
from IPython.display import Javascript
Javascript("""
window.edgeData2=%s;
window.nodeData2=%s;
""" % (json.dumps(edges), json.dumps(list(nodes.values()))))
Out[14]:
In [15]:
%%javascript
element.append('<div id="vis-container2" style="width: 100%; height: 1000px;"></div>');

requirejs.config({
    paths: {
        vis: '//cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis'
    }
});

require(['vis'], function(vis) {
  // create the timeline
  var container = document.getElementById('vis-container2');
    var data = {
            nodes: new vis.DataSet(nodeData2),
            edges: new vis.DataSet(edgeData2)
        };
        var options = {
            layout: {
                hierarchical: {
                  sortMethod: 'directed'
                }
            },
            edges: {
                smooth: true,
                arrows: {to: true}
            }
        };
        var network = new vis.Network(container, data, options);
})

So if we look for all indirect teachers of Neuhaus, we reach around 50 persons (nodes), once we remove all possible duplicates. The resulting graph (if you don't see it above) is on github.io.

Neuhaus is of course the last node. Looks like Bach, Beethoven, Mozart, Chopin and Liszt are here...

What about students?

In [16]:
mbids_to_do = [artist_mbid]
nodes = {artist_mbid: {'id': artist_mbid, 'label' : artist_name}}
edges = []
while mbids_to_do:
    mbid = mbids_to_do.pop()
    df = students_from_mbid(mbid)
    for artist in df.itertuples():
        mbids_to_do.append(artist.mbid)
        if artist.mbid not in nodes:
            nodes[artist.mbid] = {'id': artist.mbid, 'label' : artist.name}
            edges.append({'from': mbid, 'to' : artist.mbid})            

print("Nodes (persons): ", len(nodes))
print("Edges (links): ", len(edges))
Nodes (persons):  68
Edges (links):  67
In [17]:
# Send to Javascript
import json
from IPython.display import Javascript
Javascript("""
window.edgeData3=%s;
window.nodeData3=%s;
""" % (json.dumps(edges), json.dumps(list(nodes.values()))))
Out[17]:
In [18]:
%%javascript
element.append('<div id="vis-container3" style="width: 100%; height: 400px;"></div>');

requirejs.config({
    paths: {
        vis: '//cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis'
    }
});

require(['vis'], function(vis) {
  // create the timeline
  var container = document.getElementById('vis-container3');
    var data = {
            nodes: new vis.DataSet(nodeData3),
            edges: new vis.DataSet(edgeData3)
        };
        var options = {
            layout: {
                hierarchical: {
                  sortMethod: 'directed'
                }
            },
            edges: {
                smooth: true,
                arrows: {to: true}
            }
        };
        var network = new vis.Network(container, data, options);
})

Also available is on github.io.

We could continue by looking at both teachers and students at the same time... but in this case we have to be very careful about loops that would make our code look for the same artists several times or even infinitely (edit: of course my code had a bug and ran in an infinite loop regarding Martinů and another artist who both taught and were taught by each other).

Another example

Let's look at a very influent french piano teacher:

In [19]:
artist_name = 'Alfred Cortot'
In [20]:
artist = sql("""
SELECT a.id,
       a.gid AS mbid,
       a.name,
       to_date(to_char(a.begin_date_year, '9999') || 
               to_char(a.begin_date_month, '99') || 
               to_char(a.begin_date_day, '99'), 'YYYY MM DD') AS start,
       to_date(to_char(a.end_date_year, '9999') || 
               to_char(a.end_date_month, '99') || 
               to_char(a.end_date_day, '99'), 'YYYY MM DD') AS end
  FROM artist       AS a
 WHERE a.name = %(artist_name)s;
""", artist_name=artist_name)

artist_mbid = str(artist['mbid'][0])
artist['url'] = artist.mbid.apply(mb_artist_link)
artist.drop('mbid', axis=1, inplace=True)

iplot(ff.create_table(artist[['name', 'start', 'end', 'url']]))
In [21]:
mbids_to_do = [artist_mbid]
nodes = {artist_mbid: {'id': artist_mbid, 'label': artist_name}}
edges = []
while mbids_to_do:
    mbid = mbids_to_do.pop()
    df = teachers_from_mbid(mbid)
    for artist in df.itertuples():
        mbids_to_do.append(artist.mbid)
        if artist.mbid not in nodes:
            nodes[artist.mbid] = {'id': artist.mbid, 'label': artist.name}
            edges.append({'from': artist.mbid, 'to': mbid})
mbids_to_do = [artist_mbid]
while mbids_to_do:
    mbid = mbids_to_do.pop()
    df = students_from_mbid(mbid)
    for artist in df.itertuples():
        if artist.mbid not in nodes:
            mbids_to_do.append(artist.mbid)
            nodes[artist.mbid] = {'id': artist.mbid, 'label': artist.name}
            edges.append({'from': mbid, 'to': artist.mbid})            

            
print("Nodes (persons): ", len(nodes))
print("Edges (links): ", len(edges))

# Send to Javascript
import json
from IPython.display import Javascript
Javascript("""
window.edgeData4=%s;
window.nodeData4=%s;
""" % (json.dumps(edges), json.dumps(list(nodes.values()))))
Nodes (persons):  174
Edges (links):  173
Out[21]:
In [22]:
%%javascript
element.append('<div id="vis-container4" style="width: 100%; height: 600px;"></div>');

requirejs.config({
    paths: {
        vis: '//cdnjs.cloudflare.com/ajax/libs/vis/4.20.1/vis'
    }
});

require(['vis'], function(vis) {
  // create the timeline
  var container = document.getElementById('vis-container4');
    var data = {
            nodes: new vis.DataSet(nodeData4),
            edges: new vis.DataSet(edgeData4)
        };
        var options = {
            layout: {
                hierarchical: {
                  sortMethod: 'directed'
                }
            },
            edges: {
                smooth: true,
                arrows: {to: true}
            }
        };
        var network = new vis.Network(container, data, options);
})

Also available is on github.io. Looks like Cortot belongs to the Chopin school of piano...