Skip to content

beta.tpp schema🔗

Available on backends: TPP

This defines all the data (both primary care and externally linked) available in the OpenSAFELY-TPP backend. For more information about this backend, see the SystmOne Primary Care section.

To use this schema in an ehrQL file:
from ehrql.tables.beta.tpp import (
    addresses,
    apcs_cost,
    appointments,
    clinical_events,
    ec_cost,
    emergency_care_attendances,
    hospital_admissions,
    household_memberships_2020,
    isaric_raw,
    medications,
    occupation_on_covid_vaccine_record,
    ons_deaths,
    opa_cost,
    opa_diag,
    opa_proc,
    open_prompt,
    patients,
    practice_registrations,
    sgss_covid_all_tests,
    vaccinations,
)

many rows per patient

addresses🔗

Columns
address_id 🔗 integer
start_date 🔗 date
end_date 🔗 date
address_type 🔗 integer
rural_urban_classification 🔗 integer
imd_rounded 🔗 integer
  • Always >= 0, <= 32800, and a multiple of 100
msoa_code 🔗 string
  • Matches regular expression: E020[0-9]{5}
has_postcode 🔗 boolean
care_home_is_potential_match 🔗 boolean
care_home_requires_nursing 🔗 boolean
care_home_does_not_require_nursing 🔗 boolean
Methods
for_patient_on(date) 🔗

Return each patient's registered address as it was on the supplied date.

Where there are multiple registered addresses we prefer any which have a known postcode (though we never have access to this postcode) as this is used by TPP to cross-reference other data associated with the address, such as the MSOA or index of multiple deprevation.

Where there are multiple of these we prefer the most recently registered address and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the address ID.

View method definition
spanning_addrs = addresses.where(addresses.start_date <= date).except_where(
    addresses.end_date < date
)
ordered_addrs = spanning_addrs.sort_by(
    case(when(addresses.has_postcode).then(1), default=0),
    addresses.start_date,
    addresses.end_date,
    addresses.address_id,
)
return ordered_addrs.last_for_patient()

many rows per patient

apcs_cost🔗

Columns
apcs_ident 🔗 integer

TODO

  • Never NULL
grand_total_payment_mff 🔗 float

TODO

tariff_initial_amount 🔗 float

TODO

tariff_total_payment 🔗 float

TODO

admission_date 🔗 date

TODO

discharge_date 🔗 date

TODO

many rows per patient

appointments🔗

Appointments in primary care.

You can find out more about this table in the short data report. To view it, you will need a login for OpenSAFELY Jobs and the Project Collaborator or Project Developer role for the project. The workspace shows when the code that comprises the report was run; the code itself is in the appointments-short-data-report repository on GitHub.

Tip

Querying this table is similar to using Cohort Extractor's patients.with_gp_consultations function. However, that function filters by the status of the appointment. To achieve a similar result with this table:

appointments.where(
    appointments.status.is_in([
        "Arrived",
        "In Progress",
        "Finished",
        "Visit",
        "Waiting",
        "Patient Walked Out",
    ])
)
Columns
booked_date 🔗 date

The date the appointment was booked

start_date 🔗 date

The date the appointment was due to start

status 🔗 string

The status of the appointment

  • Possible values: Booked, Arrived, Did Not Attend, In Progress, Finished, Requested, Blocked, Visit, Waiting, Cancelled by Patient, Cancelled by Unit, Cancelled by Other Service, No Access Visit, Cancelled Due To Death, Patient Walked Out

many rows per patient

clinical_events🔗

Columns
date 🔗 date
snomedct_code 🔗 SNOMED-CT code
ctv3_code 🔗 CTV3 (Read v3) code
numeric_value 🔗 float

many rows per patient

ec_cost🔗

Columns
ec_ident 🔗 integer

TODO

  • Never NULL
grand_total_payment_mff 🔗 float

TODO

tariff_total_payment 🔗 float

TODO

arrival_date 🔗 date

TODO

ec_decision_to_admit_date 🔗 date

TODO

ec_injury_date 🔗 date

TODO

many rows per patient

emergency_care_attendances🔗

Columns
id 🔗 integer
arrival_date 🔗 date
discharge_destination 🔗 SNOMED-CT code
diagnosis_01 🔗 SNOMED-CT code
diagnosis_02 🔗 SNOMED-CT code
diagnosis_03 🔗 SNOMED-CT code
diagnosis_04 🔗 SNOMED-CT code
diagnosis_05 🔗 SNOMED-CT code
diagnosis_06 🔗 SNOMED-CT code
diagnosis_07 🔗 SNOMED-CT code
diagnosis_08 🔗 SNOMED-CT code
diagnosis_09 🔗 SNOMED-CT code
diagnosis_10 🔗 SNOMED-CT code
diagnosis_11 🔗 SNOMED-CT code
diagnosis_12 🔗 SNOMED-CT code
diagnosis_13 🔗 SNOMED-CT code
diagnosis_14 🔗 SNOMED-CT code
diagnosis_15 🔗 SNOMED-CT code
diagnosis_16 🔗 SNOMED-CT code
diagnosis_17 🔗 SNOMED-CT code
diagnosis_18 🔗 SNOMED-CT code
diagnosis_19 🔗 SNOMED-CT code
diagnosis_20 🔗 SNOMED-CT code
diagnosis_21 🔗 SNOMED-CT code
diagnosis_22 🔗 SNOMED-CT code
diagnosis_23 🔗 SNOMED-CT code
diagnosis_24 🔗 SNOMED-CT code

many rows per patient

hospital_admissions🔗

Columns
id 🔗 integer
admission_date 🔗 date
discharge_date 🔗 date
admission_method 🔗 string
all_diagnoses 🔗 string
patient_classification 🔗 string
days_in_critical_care 🔗 integer
primary_diagnoses 🔗 string

one row per patient

household_memberships_2020🔗

Inferred household membership as of 2020-02-01, as determined by TPP using an as yet undocumented algorithm.

Columns
household_pseudo_id 🔗 integer
household_size 🔗 integer

many rows per patient

isaric_raw🔗

A subset of the ISARIC data.

These columns are deliberately all taken as strings while in a preliminary phase. They will later change to more appropriate data types.

Descriptions taken from: CCP_REDCap_ISARIC_data_dictionary_codebook.pdf

Columns
age 🔗 string

Age

age_factor 🔗 string

TODO

calc_age 🔗 string

Calculated age (comparing date of birth with date of enrolment). May be inaccurate if a date of February 29 is used.

sex 🔗 string

Sex at birth.

ethnic___1 🔗 string

Ethnic group: Arab.

ethnic___2 🔗 string

Ethnic group: Black.

ethnic___3 🔗 string

Ethnic group: East Asian.

ethnic___4 🔗 string

Ethnic group: South Asian.

ethnic___5 🔗 string

Ethnic group: West Asian.

ethnic___6 🔗 string

Ethnic group: Latin American.

ethnic___7 🔗 string

Ethnic group: White.

ethnic___8 🔗 string

Ethnic group: Aboriginal/First Nations.

ethnic___9 🔗 string

Ethnic group: Other.

ethnic___10 🔗 string

Ethnic group: N/A.

covid19_vaccine 🔗 string

Has the patient received a Covid-19 vaccine (open label licenced product)?

covid19_vaccined 🔗 date

Date first vaccine given (Covid-19) if known.

covid19_vaccine2d 🔗 date

Date second vaccine given (Covid-19) if known.

covid19_vaccined_nk 🔗 string

First vaccine given (Covid-19) but date not known.

corona_ieorres 🔗 string

Suspected or proven infection with pathogen of public health interest.

coriona_ieorres2 🔗 string

Proven or high likelihood of infection with pathogen of public health interest.

coriona_ieorres3 🔗 string

Proven infection with pathogen of public health interest.

inflammatory_mss 🔗 string

Adult or child who meets case definition for inflammatory multi-system syndrome (MIS-C/MIS-A).

cestdat 🔗 date

Onset date of first/earliest symptom.

chrincard 🔗 string

Chronic cardiac disease, including congenital heart disease (not hypertension).

  • Possible values: YES, NO, Unknown
hypertension_mhyn 🔗 string

Hypertension (physician diagnosed).

  • Possible values: YES, NO, Unknown
chronicpul_mhyn 🔗 string

Chronic pulmonary disease (not asthma).

  • Possible values: YES, NO, Unknown
asthma_mhyn 🔗 string

Asthma (physician diagnosed).

  • Possible values: YES, NO, Unknown
renal_mhyn 🔗 string

Chronic kidney disease.

  • Possible values: YES, NO, Unknown
mildliver 🔗 string

Mild liver disease.

  • Possible values: YES, NO, Unknown
modliv 🔗 string

Moderate or severe liver disease

  • Possible values: YES, NO, Unknown
chronicneu_mhyn 🔗 string

Chronic neurological disorder.

  • Possible values: YES, NO, Unknown
malignantneo_mhyn 🔗 string

Malignant neoplasm.

  • Possible values: YES, NO, Unknown
chronichaemo_mhyn 🔗 string

Chronic haematologic disease.

  • Possible values: YES, NO, Unknown
aidshiv_mhyn 🔗 string

AIDS/HIV.

  • Possible values: YES, NO, Unknown
obesity_mhyn 🔗 string

Obesity (as defined by clinical staff).

  • Possible values: YES, NO, Unknown
diabetes_type_mhyn 🔗 string

Diabetes and type.

  • Possible values: NO, 1, 2, N/K
diabetescom_mhyn 🔗 string

Diabetes with complications.

  • Possible values: YES, NO, Unknown
diabetes_mhyn 🔗 string

Diabetes without complications.

  • Possible values: YES, NO, Unknown
rheumatologic_mhyn 🔗 string

Rheumatologic disorder.

  • Possible values: YES, NO, Unknown
dementia_mhyn 🔗 string

Dementia.

  • Possible values: YES, NO, Unknown
malnutrition_mhyn 🔗 string

Malnutrition.

  • Possible values: YES, NO, Unknown
smoking_mhyn 🔗 string

Smoking.

  • Possible values: Yes, Never Smoked, Former Smoker, N/K
hostdat 🔗 date

Admission date at this facility.

hooccur 🔗 string

Transfer from other facility?

hostdat_transfer 🔗 date

Admission date at previous facility.

hostdat_transfernk 🔗 string

Admission date at previous facility not known.

readm_cov19 🔗 string

Is the patient being readmitted with Covid-19?

dsstdat 🔗 date

Date of enrolment.

dsstdtc 🔗 date

Outcome date.

many rows per patient

medications🔗

Columns
date 🔗 date
dmd_code 🔗 dm+d code

many rows per patient

occupation_on_covid_vaccine_record🔗

Columns
is_healthcare_worker 🔗 boolean

many rows per patient

ons_deaths🔗

Columns
date 🔗 date

Patient's date of death. Only deaths registered from February 2019 are recorded.

place 🔗 string
  • Possible values: Care Home, Elsewhere, Home, Hospice, Hospital, Other communal establishment
underlying_cause_of_death 🔗 ICD-10 code
cause_of_death_01 🔗 ICD-10 code
cause_of_death_02 🔗 ICD-10 code
cause_of_death_03 🔗 ICD-10 code
cause_of_death_04 🔗 ICD-10 code
cause_of_death_05 🔗 ICD-10 code
cause_of_death_06 🔗 ICD-10 code
cause_of_death_07 🔗 ICD-10 code
cause_of_death_08 🔗 ICD-10 code
cause_of_death_09 🔗 ICD-10 code
cause_of_death_10 🔗 ICD-10 code
cause_of_death_11 🔗 ICD-10 code
cause_of_death_12 🔗 ICD-10 code
cause_of_death_13 🔗 ICD-10 code
cause_of_death_14 🔗 ICD-10 code
cause_of_death_15 🔗 ICD-10 code

many rows per patient

opa_cost🔗

Columns
opa_ident 🔗 integer

TODO

  • Never NULL
tariff_opp 🔗 float

TODO

grand_total_payment_mff 🔗 float

TODO

tariff_total_payment 🔗 float

TODO

appointment_date 🔗 date

TODO

referral_request_received_date 🔗 date

TODO

many rows per patient

opa_diag🔗

Columns
opa_ident 🔗 integer

TODO

  • Never NULL
primary_diagnosis_code 🔗 ICD-10 code

TODO

primary_diagnosis_code_read 🔗 CTV3 (Read v3) code

TODO

secondary_diagnosis_code_1 🔗 ICD-10 code

TODO

secondary_diagnosis_code_1_read 🔗 CTV3 (Read v3) code

TODO

appointment_date 🔗 date

TODO

referral_request_received_date 🔗 date

TODO

many rows per patient

opa_proc🔗

Columns
opa_ident 🔗 integer

TODO

  • Never NULL
primary_procedure_code 🔗 OPCS-4 code

TODO

primary_procedure_code_read 🔗 CTV3 (Read v3) code

TODO

procedure_code_1 🔗 OPCS-4 code

TODO

procedure_code_2_read 🔗 CTV3 (Read v3) code

TODO

appointment_date 🔗 date

TODO

referral_request_received_date 🔗 date

TODO

many rows per patient

open_prompt🔗

This table contains responses to questions from the OpenPROMPT project.

You can find out more about this table in the associated short data report. To view it, you will need a login for Level 4. The workspace shows when the code that comprises the report was run; the code itself is in the airmid-short-data-report repository on GitHub.

Columns
ctv3_code 🔗 CTV3 (Read v3) code

The response to the question, as a CTV3 code. Alternatively, if the question does not admit a CTV3 code as the response, then the question, as a CTV3 code.

  • Never NULL
snomedct_code 🔗 SNOMED-CT code

The response to the question, as a SNOMED CT code. Alternatively, if the question does not admit a SNOMED CT code as the response, then the question, as a SNOMED CT code.

creation_date 🔗 date

The date the survey was administered

  • Never NULL
consultation_date 🔗 date

The response to the question, as a date, if the question admits a date as the response. Alternatively, the date the survey was administered.

  • Never NULL
consultation_id 🔗 integer

The ID of the survey

  • Never NULL
numeric_value 🔗 float

The response to the question, as a number

one row per patient

patients🔗

Patients in primary care.

Representativeness🔗

You can find out more about the representativeness of these data in the OpenSAFELY-TPP backend in:

The OpenSAFELY Collaborative, Colm D. Andrews, Anna Schultze, Helen J. Curtis, William J. Hulme, John Tazare, Stephen J. W. Evans, et al. 2022. "OpenSAFELY: Representativeness of Electronic Health Record Platform OpenSAFELY-TPP Data Compared to the Population of England." Wellcome Open Res 2022, 7:191. https://doi.org/10.12688/wellcomeopenres.18010.1

Orphan records🔗

If a practice becomes aware that a patient has moved house, then the practice deducts, or removes, the patient's records from their register. If the patient doesn't register with a new practice within a given amount of time (normally from four to eight weeks), then the patient's records are permanently deducted and are orphan records. There are roughly 1.6 million orphan records.

Columns
date_of_birth 🔗 date

Patient's date of birth.

  • Always the first day of a month
  • Never NULL
sex 🔗 string

Patient's sex.

  • Possible values: female, male, intersex, unknown
  • Never NULL
date_of_death 🔗 date

Patient's date of death.

Methods
age_on(date) 🔗

Patient's age as an integer, in whole elapsed calendar years, as it would be on the given date.

This method takes no account of whether the patient is alive on the given date. In particular, it may return negative values if the given date is before the patient's date of birth.

View method definition
return (date - patients.date_of_birth).years

many rows per patient

practice_registrations🔗

Each record corresponds to a patient's registration with a practice.

Only patients with a full GMS (General Medical Services) registration are included.

We have registration history for:

  • all patients currently registered at a TPP practice
  • all patients registered at a TPP practice any time from 1 Jan 2009 onwards:
    • who have since de-registered
    • who have since died

A patient can be registered with zero, one, or more than one practices at a given time. For instance, students are often registered with a practice at home and a practice at university.

Columns
start_date 🔗 date

Date patient joined practice.

  • Never NULL
end_date 🔗 date

Date patient left practice.

practice_pseudo_id 🔗 integer

Pseudonymised practice identifier.

  • Never NULL
practice_stp 🔗 string

ONS code of practice's STP (Sustainability and Transformation Partnership). STPs have been replaced by ICBs (Integrated Care Boards), and ICB codes will be available soon.

  • Matches regular expression: E540000[0-9]{2}
practice_nuts1_region_name 🔗 string

Name of the NUTS level 1 region of England to which the practice belongs. For more information see: https://www.ons.gov.uk/methodology/geography/ukgeographies/eurostat

  • Possible values: North East, North West, Yorkshire and The Humber, East Midlands, West Midlands, East, London, South East, South West
Methods
for_patient_on(date) 🔗

Return each patient's practice registration as it was on the supplied date.

Where a patient is registered with multiple practices we prefer the most recent registration and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the practice ID.

View method definition
spanning_regs = practice_registrations.where(practice_registrations.start_date <= date).except_where(
    practice_registrations.end_date < date
)
ordered_regs = spanning_regs.sort_by(
    practice_registrations.start_date,
    practice_registrations.end_date,
    practice_registrations.practice_pseudo_id,
)
return ordered_regs.last_for_patient()

many rows per patient

sgss_covid_all_tests🔗

Columns
specimen_taken_date 🔗 date
is_positive 🔗 boolean

many rows per patient

vaccinations🔗

Columns
vaccination_id 🔗 integer
date 🔗 date
target_disease 🔗 string
product_name 🔗 string