Extracting data from AcousticBrainz

Let's take date for Beethoven's "Moonlight" sonata again

In [1]:
%run startup.ipy
Last notebook update: 2018-06-06
Git repo: git@bitbucket.org:loujine/musicbrainz-dataviz.git
Importing libs
Defining database parameters

Defining *sql* helper function
Last database update: 2018-06-02

Python packages versions:
numpy       1.14.3
pandas      0.23.0
sqlalchemy  1.2.8
CPython 3.7.0b5
IPython 6.4.0

This time we restrict data to recordings with a performer relation and a recording date, so that we have around 100 recordings.

In [2]:
moonlight_mbid = '11e7e520-f430-306c-90b8-183cbf3cc761'

recordings = sql("""
SELECT a.name AS artist,
       to_date(to_char(l.begin_date_year, '9999') || 
               to_char(l.begin_date_month, '99') || 
               to_char(l.begin_date_day, '99'), 'YYYY MM DD') AS start_date,
       to_date(to_char(l.end_date_year, '9999') || 
               to_char(l.end_date_month, '99') || 
               to_char(l.end_date_day, '99'), 'YYYY MM DD') AS end_date,
       r.gid    AS mbid,
       r.length * interval '1ms' AS duration
  FROM work               AS w
  JOIN l_recording_work   AS lrw ON w.id = lrw.entity1
  JOIN recording          AS r   ON r.id = lrw.entity0
  JOIN l_artist_recording AS lar ON r.id = lar.entity1
  JOIN artist             AS a   ON a.id = lar.entity0
  JOIN link               AS l   ON l.id = lar.link
 WHERE w.gid = %(moonlight_mbid)s
   AND l.begin_date_year > 0
ORDER BY start_date;
""", moonlight_mbid=moonlight_mbid)

Checking whether AcousticBrainz has data

We can query AcousticBrainz with one request from the recordings MBIDs.

First we check which recordings have information stored at AcousticBrainz:

In [3]:
import requests
resp = requests.get('https://acousticbrainz.org/api/v1/count?recording_ids=' 
                    + ';'.join(recordings.mbid.astype(str)))
In [4]:
resp.json()
Out[4]:
{'07506da2-0baf-4dde-9eaa-41a9e49b2cfe': {'count': 2},
 '1accc10e-43fa-44b8-9f0f-4de6e23583e8': {'count': 1},
 '1ebd9e16-22e1-4fc8-9391-d99b21e069ba': {'count': 3},
 '2eef9100-c2e2-4d9f-91d4-935571df9b7c': {'count': 1},
 '305f1386-82a7-4461-ad1a-7eadc12a8e86': {'count': 2},
 '3a214643-9516-493c-bd95-30e8155dca6d': {'count': 1},
 '3fca39d0-7246-4213-bc1e-06598a587472': {'count': 2},
 '41bc961a-5c7d-4e96-a914-92ffeab731bd': {'count': 1},
 '4d92bdaa-f94e-43a7-9e9d-cf9245504db3': {'count': 1},
 '4ea82cee-e79f-4721-8d5e-adeab3f43a2b': {'count': 1},
 '51e84e36-bc8e-46fb-a3af-7772c6fa9b56': {'count': 3},
 '59c5841b-1fb7-481e-9b96-247719756ee2': {'count': 1},
 '5ad9513f-7304-459a-aeed-2fbee0018d29': {'count': 3},
 '5b528b86-f506-49d7-bcb3-31b5a8066e52': {'count': 1},
 '65699400-1167-4d22-bf31-053d1d039a47': {'count': 1},
 '6d324eff-ca09-49bc-80d2-6becb230197e': {'count': 2},
 '742fb064-9d46-4c08-a1ed-41a71b6b759a': {'count': 2},
 '77dd899e-587f-4828-b3a3-aefa63a30c22': {'count': 1},
 '784f91ab-5a33-45b2-925e-9f15354da37a': {'count': 5},
 '789dfa26-24b2-40aa-9dfa-efee324946c5': {'count': 1},
 '7b783dcf-4771-4f3d-8c5d-2a8ab068a846': {'count': 1},
 '7d0ab8ca-acf0-4d0a-b5ed-f37640c42116': {'count': 1},
 '85ab80aa-bcd6-44e8-acdc-76d2c9bc3a68': {'count': 1},
 '880094bf-b308-4923-ae9d-96d1a15f294a': {'count': 3},
 '8812d99e-cce0-4251-9263-f412512f46a2': {'count': 8},
 '88547983-66ab-480b-9b84-8c77cef7d2d9': {'count': 8},
 '9b4bfb50-178d-49e1-8e09-50ae575cb091': {'count': 3},
 'a520a391-72f5-47da-bd87-8d75d0e6c908': {'count': 1},
 'a6b0bbce-4085-46d0-a6bc-6d6c5826fac6': {'count': 1},
 'b11143bf-231e-4017-afb3-b5f38e997bcb': {'count': 6},
 'be7831d9-5020-4c1a-917c-1cd84f6a8d31': {'count': 1},
 'c0156794-dbe2-4402-b175-03b7335d51f9': {'count': 2},
 'c1f4ea30-1f77-470c-91a8-25925865de53': {'count': 2},
 'c8341e1a-6fa6-403f-a020-50451cc8c818': {'count': 1},
 'e1539e24-789b-4f37-9818-d5f595dcf71b': {'count': 9},
 'e53edf62-2996-47c5-8ee2-61e4b044f168': {'count': 1},
 'e87e3aa2-339f-4227-a3a0-9ff7584a530d': {'count': 4},
 'ec927b34-eef2-4229-8051-f505d34e22fe': {'count': 1},
 'f6520aaf-7f21-443c-abec-5e6360d875ab': {'count': 1},
 'f6e74829-30a8-44be-b586-dd26fb327f18': {'count': 1},
 'f9dabfc9-8470-4314-b174-f1b6e5fb044d': {'count': 3}}
In [5]:
recordings['acousticbrainz'] = recordings.mbid.apply(
    lambda mbid: resp.json().get(str(mbid), {'count': 0})['count'])
In [6]:
recordings.sort_values('acousticbrainz', ascending=False).head(10)
Out[6]:
artist start_date end_date mbid duration acousticbrainz
19 Arthur Rubinstein 1963-01-25 1963-01-30 e1539e24-789b-4f37-9818-d5f595dcf71b 00:06:10.933000 9
21 Anthony Salvatore 1963-01-25 1963-01-30 e1539e24-789b-4f37-9818-d5f595dcf71b 00:06:10.933000 9
22 Anthony Salvatore 1963-01-25 1963-01-30 e1539e24-789b-4f37-9818-d5f595dcf71b 00:06:10.933000 9
16 Arthur Rubinstein 1962-04-06 1962-04-06 e1539e24-789b-4f37-9818-d5f595dcf71b 00:06:10.933000 9
15 Arthur Rubinstein 1962-04-06 1962-04-06 e1539e24-789b-4f37-9818-d5f595dcf71b 00:06:10.933000 9
20 Arthur Rubinstein 1963-01-25 1963-01-30 e1539e24-789b-4f37-9818-d5f595dcf71b 00:06:10.933000 9
101 Klaus Scheibe None None 8812d99e-cce0-4251-9263-f412512f46a2 00:06:01.666000 8
100 Wolfgang Lohse None None 8812d99e-cce0-4251-9263-f412512f46a2 00:06:01.666000 8
99 Wilhelm Kempff None None 8812d99e-cce0-4251-9263-f412512f46a2 00:06:01.666000 8
35 Jenő Jandó 1987-04-21 1987-04-23 88547983-66ab-480b-9b84-8c77cef7d2d9 00:05:17.706000 8

Fetching low-level data on one recording

In [7]:
mbid = '0317b896-ee9a-40a0-8e94-31e13fd723f2'
resp = requests.get('https://acousticbrainz.org/api/v1/{}/low-level'.format(mbid)).json()
In [8]:
resp['tonal']['tuning_frequency']
Out[8]:
446.91595459
In [9]:
print(resp['tonal']['key_key'])
print(resp['tonal']['key_scale'])
print(resp['tonal']['chords_histogram'])
C#
minor
[24.9526939392, 6.45893764496, 4.23867797852, 8.40166473389, 0.769521892071, 2.63655853271, 0.391068488359, 5.70203113556, 0.769521892071, 12.6024980545, 0.176611587405, 0.189226686954, 0, 0.971363723278, 0.126151129603, 1.41289269924, 0.983978807926, 2.207644701, 5.55064964294, 1.18582057953, 3.30515956879, 2.69963407516, 9.99116897583, 4.27652311325]
In [10]:
# http://essentia.upf.edu/documentation/reference/streaming_ChordsDescriptors.html
chords = 'C, Em, G, Bm, D, F#m, A, C#m, E, G#m, B, D#m, F#, A#m, C#, Fm, G#, Cm, D#, Gm, A#, Dm, F, Am'.split(', ')
main_chord = resp['tonal']['key_key']
if resp['tonal']['key_scale']=='minor':
    main_chord += 'm'
idx = chords.index(main_chord)
chords = chords[idx:] + chords[:idx]
In [11]:
iplot(go.Figure(
    data=[go.Bar(x=chords, y=resp['tonal']['chords_histogram'])],
    layout=go.Layout(title="Chords histogram")
))

df = pandas.DataFrame({'chords': chords, 
                       'val': resp['tonal']['chords_histogram']}).sort_values('val', ascending=False)

iplot(go.Figure(
    data=[go.Bar(x=df.chords, y=df.val)],
    layout=go.Layout(title="Chords histogram")
))

Looks like tonic / dominant / subdominant are the main chords, as expected

In [12]:
resp['rhythm']['beats_count']
Out[12]:
857
In [13]:
resp['rhythm']['beats_position'][:10]
Out[13]:
[0.49922901392,
 0.96362811327,
 1.39319729805,
 1.79954648018,
 2.20589566231,
 2.61224484444,
 3.01859402657,
 3.42494320869,
 3.83129239082,
 4.2492518425]
In [14]:
import numpy as np
positions = np.array(resp['rhythm']['beats_position'])
cnt = resp['rhythm']['beats_count']
newpos = (positions - positions[0])
ref = np.linspace(0, newpos[-1], cnt)
rubato = 100 * (ref - newpos)/ref
rubato[0] = 0
/home/chrom/.virtualenvs/dataviz-py37/local/lib/python3.7/site-packages/ipykernel_launcher.py:6: RuntimeWarning:

invalid value encountered in true_divide

In [15]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(rubato)
plt.title('Shift from mean value on each beat (%)')
Out[15]:
Text(0.5,1,'Shift from mean value on each beat (%)')

Comparing rubatos

In [16]:
mbid1 = '0317b896-ee9a-40a0-8e94-31e13fd723f2'
mbid2 = 'b11143bf-231e-4017-afb3-b5f38e997bcb'
resp1 = requests.get('https://acousticbrainz.org/api/v1/{}/low-level'.format(mbid1)).json()
resp2 = requests.get('https://acousticbrainz.org/api/v1/{}/low-level'.format(mbid2)).json()
In [17]:
df = pandas.DataFrame({'chords': chords, 
                       'Gilels': resp1['tonal']['chords_histogram'],
                       'Gulda': resp2['tonal']['chords_histogram']})

iplot(go.Figure(
    data=[go.Bar(x=df.chords, y=df.Gilels),
          go.Bar(x=df.chords, y=df.Gulda)],
    layout=go.Layout(title="Chords histogram")
))
In [18]:
def rubato(resp):
    positions = np.array(resp['rhythm']['beats_position'])
    cnt = resp['rhythm']['beats_count']
    newpos = (positions - positions[0])
    ref = np.linspace(0, newpos[-1], cnt)
    rubato = 100 * (ref - newpos)/ref
    rubato[0] = 0
    return rubato

rubato_Gilels = rubato(resp1)
rubato_Gulda = rubato(resp2)
/home/chrom/.virtualenvs/dataviz-py37/local/lib/python3.7/site-packages/ipykernel_launcher.py:6: RuntimeWarning:

invalid value encountered in true_divide

In [19]:
plt.title('Shift from mean value on each beat (%)')
plt.plot(rubato_Gilels)
plt.plot(rubato_Gulda)
Out[19]:
[<matplotlib.lines.Line2D at 0x7f8207dc8fd0>]

Interesting, both pianists have a completely different behavior at the beginning and become very close by beat 400. So... what happens around beat 200 to explain?

In [20]:
#http://imslp.org/wiki/Piano_Sonata_No.14,_Op.27%20No.2_(Beethoven,_Ludwig_van)