Culture sector pipeline, stage 1

The purpose of this stage is to process raw funding data into a candidate longlist broken down by local authority and source.

Dependencies:

Downloaded funding data (see get-data.py)
PYTHONPATH environment variable must include <PROJECT ROOT>/pipelines.

Setup

Import some libraries and local utility functions. Ensure that PYTHONPATH is set correctly in the environment running this code.

import petl as etl
from config import RAW, FUNDED_ORGS_LIST, la_names

Configure a date parser

date_parser = etl.dateparser('%Y-%m-%d 00:00:00')

Load data

Arts Council England Investment Programme

ace_investment_programme = (
    etl.fromcsv(RAW / 'arts-council-investment-programme.csv')
    .rename({
            'Applicant Name': 'organisation',
            "Type of organisation\n(NPO/IPSO/Transfer)": 'Source'
            })
    .addfield('Period', '2023/26')
    .cut('organisation', 'Source', 'Local authority', 'Period')
)

Arts Council England Project Grants

ace_project_grants = (
    etl.fromcsv(RAW / 'arts-council-project-grants.csv')
    .convert('Award date', lambda d: date_parser(d).year)
    .rename({
            'Recipient': 'organisation',
            'Award date': 'Period'
            })
    .addfield('Source', 'Project Grant')
    .cut('organisation', 'Source', 'Local authority', 'Period')
)

Combine data

Combine all data into a single table, then select just the local authorities in the region. Filter out some missing organisations and finally aggregate by organisation / Local authority / Source.

data = etl.cat(
    ace_investment_programme,
    ace_project_grants
).selectin(
    'Local authority', la_names
).selectnotin(
    'organisation', ['-']
).convert(
    'organisation', lambda x: x.strip()
).aggregate(
    ('organisation', 'Local authority', 'Source'),
    len,
    field="Number"
).sort(
    ['organisation', 'Local authority']
)

Save the funded organisation list.

data.tocsv(FUNDED_ORGS_LIST)