Displaying a composer MusicBrainz data

In this notebook and the next ones we are going to explore datasets centered around a MusicBrainz entity (an artist, a work, a recording...) and try to display visualisations around certain themes (work composition dates, tonalities, types, languages...). We will organize our exploration around simple questions we want answered; some would be simpler by using directly the MusicBrainz website, others would be impossible to easily there.

Setup

The setup required to reproduce these operations is explained in the introduction notebook. In case graphs do not appear in this page you can refer to the static version of this notebook.

In order to simplify this notebook the required Python commands to set everything up are located in a startup.ipy file in the same repository.

In [1]:
%run startup.ipy
Last notebook update: 2018-06-05
Git repo: git@bitbucket.org:loujine/musicbrainz-dataviz.git
Importing libs
Defining database parameters

Defining *sql* helper function
Last database update: 2018-06-02

Python packages versions:
numpy       1.14.3
pandas      0.23.0
sqlalchemy  1.2.8
CPython 3.7.0b5
IPython 6.4.0

Visualizing data with plotly

Data visualization can be performed directly with the PanDas library we already showed in previous notebooks. For this one I decided to introduce the plot.ly library that provides interactive graphs with a Python or JavaScript interface. We need to import a few commands (these commands are also included in startup.ipy) in order to use this library with our PanDas dataframes:

In [2]:
import plotly.graph_objs as go
import plotly.figure_factory as ff

# offline mode for plot.ly, no need to store the graphs in their cloud
from plotly.offline import iplot

# special mode to display plotly graphs inside Jupyter notebooks
from plotly.offline import init_notebook_mode
init_notebook_mode()

Fetching a composer attributes

For this example I chose the French composer Gabriel Fauré:

In [3]:
composer_name = 'Gabriel Fauré'
In [4]:
query_composer = """
SELECT name,
       to_date(to_char(begin_date_year, '9999') || 
               to_char(begin_date_month, '99') || 
               to_char(begin_date_day, '99'), 'YYYY MM DD') AS date_of_birth,
       to_date(to_char(end_date_year, '9999') || 
               to_char(end_date_month, '99') || 
               to_char(end_date_day, '99'), 'YYYY MM DD') AS date_of_death,
       gid AS mbid
  FROM artist 
 WHERE name = %(composer_name)s;
"""

composer = sql(query_composer, composer_name=composer_name)
composer_mbid = composer.mbid[0]

composer['url'] = composer.mbid.apply(mb_artist_link)
composer.drop('mbid', axis=1, inplace=True)

composer
Out[4]:
name date_of_birth date_of_death url
0 Gabriel Fauré 1845-05-12 1924-11-04 <a href="https://musicbrainz.org/artist/fa19a8...

The output above uses the default view of a PanDas dataframe. plot.ly can also display data as tables with the create_table command... one advantage compared to the method above is that the HTML link defined in column url will be a real link to the composer page on MusicBrainz website.

In [5]:
iplot(ff.create_table(composer))

Note that I extracted the composer unique identifier (composer_mbid) that is needed future queries.

Let's try to answer a few questions about Fauré now.

Q: How many works did Fauré compose?

The answer could be obtained by looking at the work page or the relationships page of the composer, or we can directly query the work and artist tables of the database:

In [6]:
works = sql("""
SELECT w.name,
       w.gid AS mbid
  FROM work          AS w
  JOIN l_artist_work AS law ON w.id = law.entity1
  JOIN artist        AS a   ON a.id = law.entity0
 WHERE a.gid = %(composer_mbid)s;
""", composer_mbid=composer_mbid)
print('Answer: %d works' % len(works))
print('First works (alphabetical order): ')

works['url'] = works.mbid.apply(mb_work_link)
iplot(ff.create_table(works[['name', 'url']].sort_values(by='name').head(15)))
Answer: 456 works
First works (alphabetical order): 

Of course this list contains parent works as well as subworks/movements. If we reduce the selection to the "main" works (i.e. the ones not "part of" another work):

In [7]:
main_works = sql("""
SELECT w.name,
       w.gid AS mbid
  FROM work          AS w
  JOIN l_artist_work AS law ON w.id = law.entity1
  JOIN artist        AS a   ON a.id = law.entity0
LEFT OUTER JOIN l_work_work AS lww ON w.id = lww.entity1
 WHERE a.gid = %(composer_mbid)s
   AND lww.entity1 IS NULL;
""", composer_mbid=composer_mbid)
print('Answer: %d works' % len(main_works))
print('First works (alphabetical order): ')

main_works['url'] = main_works.mbid.apply(mb_work_link)

iplot(ff.create_table(main_works[['name', 'url']].sort_values(by='name').head(10)))
Answer: 168 works
First works (alphabetical order): 

So 168 works in MusicBrainz. Some of them might be transcriptions/arrangements of Fauré's works and we might want to filter them out, but let's leave that question aside for the moment.

Q: How many works did Fauré compose each year?

Same question, but we we want to fetch the composition dates (attribute of the composer-work relation) this time. For simplicity we keep only the year of the composition date:

In [8]:
works_with_dates = sql("""
SELECT w.name, 
       to_date(to_char(l.begin_date_year, '9999') || '0101', 'YYYYMMDD') AS start,
       w.gid                                                             AS mbid
  FROM work                 AS w
  JOIN l_artist_work        AS law ON w.id = law.entity1
  JOIN artist               AS a   ON a.id = law.entity0
  JOIN link                 AS l   ON l.id = law.link
LEFT OUTER JOIN l_work_work AS lww ON w.id = lww.entity1
 WHERE a.gid = %(composer_mbid)s
   AND lww.entity1 IS NULL
ORDER BY start;
""", composer_mbid=composer_mbid)
works_with_dates['url'] = works_with_dates.mbid.apply(mb_work_link)

iplot(ff.create_table(works_with_dates.drop('mbid', 1).sort_values(by='start').head(10)))

We can display in a bar chart how many works were composed each year (works without date do not appear), using PanDas groupby capabilities:

In [9]:
df = works_with_dates.groupby('start').count()['name']
# concatenate work names composed on the same year
text = works_with_dates[['start', 'name']].groupby('start').agg(lambda s: '<br>'.join(s)).name

iplot(go.Figure(
    data=[go.Bar(x=df.index, y=df.values, text=text)],
    layout=go.Layout(title="Fauré's compositions by year")
))

Q: Is Fauré's opus catalogue in MusicBrainz?

MuscBrainz stores composers catalogue using the Series data type, an easy way to see only the main works of a composer (without subworks or arrangements).

In [10]:
catalogue = sql("""
SELECT s.name,
       s.gid AS mbid
  FROM series          AS s
  JOIN series_type     AS st  ON s.type = st.id
  JOIN l_artist_series AS las ON s.id = las.entity1
  JOIN artist          AS a   ON a.id = las.entity0
 WHERE st.name = 'Catalogue'
   AND a.gid = %(composer_mbid)s;
""", composer_mbid=composer_mbid)

catalogue_mbid = catalogue.mbid[0]
catalogue['url'] = catalogue.mbid.apply(mb_series_link)
iplot(ff.create_table(catalogue[['name', 'url']]))

So the answer is yes.

Q: Are there works missing from the catalogue?

In [11]:
works_in_catalogue = sql("""
SELECT lt.text_value AS catalog_no,
       w.name        AS work_name,
       w.gid         AS mbid
  FROM work                      AS w
  JOIN l_series_work             AS lsw ON w.id = lsw.entity1
  JOIN series                    AS s   ON s.id = lsw.entity0
  JOIN link_attribute_text_value AS lt  ON lt.link = lsw.link
  WHERE s.gid = %(catalogue_mbid)s
  ORDER BY lsw.link_order ASC;
""", catalogue_mbid=catalogue_mbid)

print('Answer: %d works' % len(works_in_catalogue))

works_in_catalogue['url'] = works_in_catalogue.mbid.apply(mb_work_link)
iplot(ff.create_table(works_in_catalogue.drop('mbid', 1).head(10)))
Answer: 88 works

So it looks like 88 of the ~121 opus numbers are correctly associated in MusicBrainz. Other works are either missing or not linked properly, so there is still work to do for contributors willing to complete data.

Q: Which languages did Fauré use for vocal works?

We can add to the previous query the work languages (restricted to works present in the catalogue)

In [12]:
works_with_language = sql("""
SELECT w.name AS work,
       l.name AS language
  FROM work                      AS w
  JOIN l_series_work             AS lsw ON w.id = lsw.entity1
  JOIN series                    AS s   ON s.id = lsw.entity0
  JOIN link_attribute_text_value AS lt  ON lt.link = lsw.link
  JOIN work_language             AS wl  ON wl.work = w.id
  JOIN language                  AS l   ON l.id = wl.language
  WHERE s.gid = %(catalogue_mbid)s
  ORDER BY lsw.link_order ASC;
""", catalogue_mbid=catalogue_mbid)
lang = works_with_language.groupby('language').count()

lang
Out[12]:
work
language
French 19
Latin 5
No linguistic content 64

As could be expected, a large majority of songs in french and some (religious works) in latin. We can visualize the results as a pie chart with plotly:

In [13]:
iplot([go.Pie(labels=lang.index, values=lang.work,
              direction="clockwise")])

Q: Which tonalities did Fauré use most often?

Let us move to something more subtle: some composers have a special affinity with certain tonalities (Beethoven and C minor, anyone?). Is it the case for Fauré?

In [14]:
works_with_key = sql("""
SELECT w.name, 
       wat.value
  FROM work           AS w
  JOIN work_attribute AS wa  ON wa.work = w.id
  JOIN work_attribute_type_allowed_value AS wat ON wat.id = wa.work_attribute_type_allowed_value
  JOIN l_artist_work  AS law ON w.id = law.entity1
  JOIN artist         AS a   ON a.id = law.entity0
 WHERE a.gid = %(composer_mbid)s
   AND wa.work_attribute_type = 1;
""", composer_mbid=composer_mbid)
tonalities = works_with_key.groupby('value').count().sort_values(by='name', ascending=False)
tonalities.head(10)
Out[14]:
name
value
D-flat major 8
A-flat major 7
E minor 7
D minor 6
A minor 5
F-sharp minor 4
E-flat major 4
C minor 4
C-sharp minor 4
G minor 3
In [15]:
iplot([go.Pie(labels=tonalities.index,
              values=tonalities.name,
              direction="clockwise")])

D-flat major, A-flat major, D minor and E minor seem to be Fauré's favorite tonalities.

Q: Which work types did Fauré mostly write?

That one is tricky since MuscBrainz doesn't give a lot of choices for work types. Another possibility would be to look at tags (that are not restricted to predefined values).

In [16]:
works_with_type = sql("""
SELECT w.name, 
       wt.name AS type
  FROM work           AS w
  JOIN work_type      AS wt  ON w.type = wt.id
  JOIN l_artist_work  AS law ON w.id = law.entity1
  JOIN artist         AS a   ON a.id = law.entity0
 WHERE a.gid = %(composer_mbid)s;
""", composer_mbid=composer_mbid)
types = works_with_type.groupby('type').count().sort_values(by='name', ascending=False)
types
Out[16]:
name
type
Song 116
Suite 8
Song-cycle 7
Mass 4
Sonata 4
Quartet 3
Cantata 1
Incidental music 1
Madrigal 1
Opera 1
Overture 1
Symphony 1
In [17]:
iplot([go.Pie(labels=types.index,
              values=types.name,
              direction="clockwise")])

So in the standard work types, the Song seems to be a large majority. Note that specific chamber music work types (quartet, quintet...) do not exist in MusicBrainz, so most chamber music will not have a specific type and not appear in this visualization.

Apply to other composers

Of course the same treatment could be applied easily to other composers. I created another ready-for-use notebook called report-composer to combine all queries above into one. If you have the necessary setup to run the notebook, just fill the name of the composer you want to study and run all cells!

I'd love to have some feedback: did I miss something interesting? are some steps missing in the workflow shown above? Tell me!

To be continued...