Science

ForgeScore: How One Number Captures Five Domains of Health

How fit are you? The question seems simple. The answer is not — because fitness is not one thing, and no single metric captures it. A look at the science behind composite health scoring.

March 4, 2026·9 min read

The single-domain problem What sports science already knows The five domains Why orthogonality matters When domains talk to each other The limits of composite scores Readiness scoring in context References

How fit are you? The question seems simple. The answer is not, because fitness is not one thing, and no single metric captures it.

Step count does not know about your sleep. Heart rate variability does not know about your nutrition. Your medication adherence log does not know whether you trained today. Each metric lives in its own silo, offering a sharp view of one dimension and complete blindness to the rest. The result is a fragmented picture that can be misleading precisely when it matters most, when you need to make a decision about what to do today.

The sports science literature has been circling this problem for over a decade. The consensus is clear: multi-domain monitoring outperforms any single biomarker for predicting performance, recovery status, and health outcomes.^[1] But translating that research into a practical, daily-use framework requires answering harder questions. Which domains? How do they interact? What are the limits? This article examines those questions.

The single-domain problem.

Most health and fitness tracking today operates in a single domain. A workout app tracks sets and reps. A nutrition app counts calories and macros. A sleep tracker reports time in bed and sleep stages. A medication app sends you a reminder and records whether you took your dose. Each tool does its job well in isolation. Your body does not.

Consider a simple scenario. You slept four hours, skipped two meals, missed your morning medication, and then walked into the gym for a heavy squat session. Your workout app will dutifully record your sets. It might even detect that your estimated one-rep max dropped. But it cannot tell you why it dropped, because it has no visibility into the other three domains that contributed to the decline.

This limitation shows up in the data. Saw et al. (2016) conducted a systematic review of athlete monitoring and found that subjective, multi-domain self-reported measures were more sensitive to training-induced changes than commonly used objective measures taken in isolation.^[1] The implication is striking: asking an athlete “how do you feel across several dimensions?” outperformed standalone biomarkers like HRV, cortisol, or CRP at detecting meaningful shifts in readiness and recovery.

The single-domain approach also creates a dangerous form of false confidence. If the only metric you track is green, you assume everything is fine. But a green training score with a red sleep score and a yellow nutrition score is not fine, it is a system under strain that happens to be performing well in one dimension. The signal you need is the one no single domain can provide: the composite picture.

What sports science already knows.

Elite sport has grappled with the multi-domain monitoring problem for decades. The 2017 International Olympic Committee consensus statement on monitoring athlete training loads explicitly recommended integrating multiple monitoring tools rather than relying on any single measure.^[5] Bourdon et al. argued that no individual metric, whether internal load (RPE, heart rate), external load (GPS, power output), or recovery marker (HRV, sleep quality), provides sufficient information on its own to guide training decisions.

Halson (2014) reinforced this in a comprehensive review of training load monitoring, concluding that fatigue is a multi-factorial phenomenon that manifests differently across physiological, psychological, and biochemical domains.^[4] An athlete can be physiologically recovered (normal resting heart rate, adequate HRV) but psychologically depleted (low motivation, poor mood, high perceived effort at submaximal loads). Or the reverse. Monitoring only one domain misses the other.

Kellmann et al. (2018) extended this further in a consensus statement focused specifically on recovery, stating that effective recovery assessment requires multi-domain evaluation spanning physiological, psychological, social, and behavioral factors.^[6] The takeaway from the research is unambiguous: if you want to understand readiness, you need to look at more than one thing.

What is less often discussed is how to operationalize this for non-elite populations. Professional sports teams have staff, lab access, and custom dashboards. The recreational trainee on a health protocol has a phone. Bridging that gap, making multi-domain assessment practical for daily use, is the core design challenge.

The five domains.

If a composite health score is to be meaningful, its constituent domains must satisfy two criteria. First, each domain must capture something genuinely important to health outcomes. Second, the domains must be sufficiently independent of each other that each one adds information the others cannot provide. Based on the sports science literature and clinical evidence, five domains emerge as the minimum viable set for comprehensive health assessment.

Training

Training encompasses structured physical activity, resistance training, cardiovascular exercise, and active recovery. The metrics here include volume (sets, reps, tonnage), frequency (sessions per week), intensity (percentage of max, RPE), and consistency (adherence to a planned program). Training is the most visible domain because it is the one people actively control. But it is also the domain most susceptible to the “more is better” fallacy. Without context from the other four domains, high training volume can be a sign of progress or a sign of overreaching.

The research is clear that training load in isolation is a poor predictor of outcomes. Plews et al. (2013) demonstrated that even HRV, often cited as a recovery metric, is most useful when contextualized alongside training load data, not when used alone.^[3] The training domain tells you what stimulus you applied. It does not tell you whether your body can absorb it.

Volume without context is noise. A 20-set leg day after 8 hours of sleep and 150 grams of protein produces a different adaptation signal than the same 20 sets after 5 hours of sleep and 70 grams. The training domain records the input. Whether that input produces growth or breakdown depends entirely on the other four.

Nutrition

Nutrition covers caloric intake, macronutrient distribution (protein, carbohydrates, fat), micronutrient sufficiency, and hydration. For health-optimizing populations, especially those on medication protocols, nutrition carries additional weight because of drug–nutrient interactions. A GLP-1 agonist that suppresses appetite can silently erode protein intake. Testosterone replacement may increase zinc and magnesium demands. The nutrition domain captures whether the raw materials for recovery, adaptation, and health maintenance are actually arriving.

Nutritional adequacy is also one of the domains most likely to deteriorate silently. Sleep deprivation is felt immediately. A missed workout is noticed. But a gradual protein deficit, 80 grams when you need 130, produces no acute symptom. It manifests weeks later as stalled progress, increased fatigue, and impaired recovery. Without explicit tracking, it is invisible until the damage is done.

Sleep

Sleep encompasses duration, quality (time in deep and REM stages), consistency (regular sleep and wake times), and efficiency (percentage of time in bed actually spent asleep). It is the single domain most strongly associated with recovery capacity, cognitive performance, and long-term health outcomes across the clinical literature.

Consumer wearables have made sleep data more accessible than ever, but the data requires interpretation. Total sleep time alone is a poor indicator, seven hours of fragmented sleep with minimal deep stages may be less restorative than six hours of consolidated, high-quality sleep. The architecture of sleep matters as much as the duration, and a useful composite score must account for both.

Vitals

Vitals include resting heart rate, heart rate variability, blood pressure, body weight trends, body composition, and, when available, lab markers like hematocrit, lipid panels, and metabolic panels. This domain captures the physiological state of the body at rest. It is the closest thing to a hardware diagnostic: not what you did (training) or what you consumed (nutrition), but what your body is actually doing in response.

HRV has received particular attention in the recovery literature. Plews et al. (2013) showed that HRV-guided training, adjusting daily training intensity based on morning HRV readings, was associated with superior endurance adaptations compared to pre-planned training.^[3] But HRV is noisy, affected by hydration, alcohol, stress, and measurement conditions. It is most valuable as one input among many, not as a standalone oracle.

Medication Adherence

For populations on health protocols, testosterone replacement therapy, GLP-1 agonists, thyroid medication, statins, antihypertensives, or any chronic prescription, medication adherence is a distinct domain that the other four cannot capture. Taking your prescribed medication on time, at the correct dose, with appropriate consistency directly affects every other domain. Missed TRT doses alter vitals. Skipped GLP-1 injections change appetite and nutrition patterns. Inconsistent thyroid medication disrupts sleep and energy.

Preliminary research suggests that AI-assisted health insights and gamification strategies may improve medication adherence in chronic illness management, though the evidence base is still developing.^[7] The adherence domain is unique because it functions as both an input (medications affect all other domains) and an outcome (a behavior worth tracking and optimizing in its own right). For anyone on a protocol, a composite health score without an adherence component is incomplete by definition.

Why orthogonality matters.

The five domains described above are not just different, they are orthogonal. In mathematical terms, orthogonal dimensions are independent: knowing the value in one dimension tells you nothing about the value in another. This property is essential for a composite score to be meaningful rather than redundant.

Orthogonal domains prevent this. You can have excellent training and terrible sleep. You can have perfect nutrition and zero exercise. You can be fully adherent to your medication protocol and still be under-recovering because of a sleep deficit. Each domain captures a genuinely different dimension of health, and the composite score reflects the actual breadth of your status rather than amplifying a single signal.

The practical test of orthogonality: can Domain A be red while Domain B is green? If yes, they are measuring different things. If they always move together, one of them is redundant.

This is the same principle that Esteva et al. (2019) identified in cross-domain healthcare data: integrating signals from independent data sources produces insights that no single source can generate on its own.^[2] Redundant inputs add noise. Orthogonal inputs add resolution. The distinction determines whether a composite score is genuinely useful or merely a dressed-up version of a single metric.

When domains talk to each other.

The real power of multi-domain assessment is not in the individual scores, it is in the interactions between them. Single-domain monitoring can tell you what happened. Cross-domain analysis can suggest why.

When training output drops, the cause could be muscular fatigue, neural fatigue, caloric deficit, sleep debt, medication non-adherence, or psychological stress. A training-only tracker sees the drop and can only guess at the cause. A multi-domain system sees the drop alongside the context: sleep was 4.5 hours last night, protein intake has been below target for three days, and the last GLP-1 dose was missed. The diagnosis becomes actionable rather than speculative.

•Training + Nutrition: A strength plateau during a caloric deficit that resolves when protein intake increases from 80g to 120g daily. The training data showed stagnation. The nutrition data explained why. Neither domain had the full picture alone.
•Sleep + Vitals: Declining HRV that correlates with reduced deep sleep percentage over two weeks. The vital signs look like overtraining. The sleep data reveals the actual cause, a schedule change, a new medication, or a stressor unrelated to the gym.
•Adherence + Nutrition: A GLP-1 dose taken on Monday suppresses appetite through Wednesday. Protein intake drops below the floor for 72 hours. The adherence log shows compliance. The nutrition log shows the downstream cost. The interaction is invisible to either domain in isolation.
•Vitals + Training + Adherence: Rising hematocrit on TRT, combined with increasing training intensity and consistent testosterone dosing. All three data streams are necessary to assess whether a dose adjustment is warranted. A vitals-only view sees the number. The full picture reveals the clinical decision.

These cross-domain patterns are where composite scoring moves beyond a simple average and toward genuine insight. The interactions are not speculative, they are documented in the sports science and clinical literature. Bourdon et al. (2017) specifically recommended that monitoring systems capture these between-domain relationships rather than evaluating metrics in isolation.^[5]

The challenge is presenting this complexity without overwhelming the user. A composite score compresses five domains and their interactions into a single number that can be glanced at in seconds. The detail is available for those who want it, but the top-level signal is immediate: are you trending up, trending down, or holding steady?

The limits of composite scores.

Any honest discussion of composite health scoring must address what it cannot do. The limitations are real, and understanding them is essential for interpreting the output correctly.

Consumer wearable data has accuracy limitations. Sleep staging from a wrist-worn device is an approximation, not a polysomnography-grade measurement. HRV readings vary by device, measurement position, and time of day. Step counts can be inflated by arm movements or missed during cycling. The inputs are useful but imperfect, and any composite score inherits those imperfections. The score is only as accurate as the data feeding it.

Weighting is inherently subjective. How much should sleep matter relative to training? Does nutrition outweigh vitals? There is no universal answer. The optimal weighting depends on the individual, their goals, their health status, and their life circumstances. A competitive athlete preparing for a meet may weight training most heavily. A person recovering from surgery may weight sleep and vitals. Any fixed weighting scheme is a compromise, and users should understand which trade-offs were made.

Correlation is not causation. A composite score can reveal associations, when sleep drops, training output tends to follow, but it cannot establish causality. The score observes patterns; it does not diagnose. This distinction matters because it determines how the output should be used: as a prompt for investigation, not as a medical conclusion.

A composite health score is not a medical diagnostic tool. It does not replace physician guidance for medication decisions. It is a pattern-recognition aid that surfaces trends and interactions, a dashboard, not a doctor.

Missing data degrades the signal. A five-domain score with only two domains populated is not a five-domain score, it is a two-domain score with three unknowns. The score must communicate its own confidence level: how many domains contributed data today? How recent is that data? A transparent system shows its gaps rather than papering over them with defaults.

Readiness scoring in context.

The concept of a “readiness score” has gained traction in consumer fitness technology, but most implementations are narrow. Wearable-based readiness scores typically rely on HRV, resting heart rate, and sleep data, at most three inputs, all from the same device. They are useful, but they are not multi-domain in the sense the sports science literature recommends.

Kellmann et al. (2018) argued that recovery assessment should span physiological, psychological, and behavioral dimensions.^[6] A readiness score that only looks at autonomic nervous system markers (HRV, RHR) captures the physiological dimension but misses everything else. Did you eat enough to support today's planned training? Did you take your prescribed medications on time? Have you been training consistently, or is today a comeback session after a week off? These questions are invisible to a wearable.

The five-domain model addresses this by drawing data from multiple sources: wearable (sleep, vitals), manual logging (training, nutrition), and app-tracked behavior (medication adherence). No single device provides all five domains. The composite score is inherently multi-source, which is both its strength, broader visibility, and its challenge, more inputs to collect and validate.

For populations on health protocols, multi-domain readiness scoring is especially relevant. A person on TRT who sees their readiness drop can investigate whether the cause is training overload, nutritional deficit, poor sleep, unfavorable vital signs, or inconsistent dosing. That differential diagnosis is impossible with a wearable-only readiness score. The protocol context changes the requirements for what “readiness” means, and a score that ignores medication adherence is structurally incomplete for this population.

The future direction is clear from the literature. As Esteva et al. (2019) noted, the integration of heterogeneous health data sources, wearable, self-reported, clinical, produces insights that no single source can generate independently.^[2] The question is not whether multi-domain assessment is better than single-domain. The research settled that. The question is how to make it practical, accurate, and actionable for daily use.

References.

[1] Saw AE, et al. “Monitoring the athlete training response: subjective self-reported measures trump commonly used objective measures.” Br J Sports Med. 2016;50(5):281–291.
[2] Esteva A, et al. “A guide to deep learning in healthcare.” Nature Medicine. 2019;25:24–29.
[3] Plews DJ, et al. “Training Adaptation and Heart Rate Variability in Elite Endurance Athletes.” Int J Sports Physiol Perform. 2013;8(6):688–694.
[4] Halson SL. “Monitoring Training Load to Understand Fatigue in Athletes.” Sports Med. 2014;44(Suppl 2):S139–147.
[5] Bourdon PC, et al. “Monitoring Athlete Training Loads: Consensus Statement.” Int J Sports Physiol Perform. 2017;12(Suppl 2):S2161–S2170.
[6] Kellmann M, et al. “Recovery and Performance in Sport: Consensus Statement.” Int J Sports Physiol Perform. 2018;13(2):240–245.
[7] Fadhil A, Villafiorita A. “An Adaptive Learning with Gamification & My Health Avatar (AGHA) for Chronic Illness Management.” 2017 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE). 2017:137–144.

Medical disclaimer: Composite health scores are not medical diagnostic tools. They are pattern-recognition aids designed to surface trends and interactions across health domains. Consumer wearable data has accuracy limitations, sleep staging, HRV, and other metrics from wrist-worn devices are approximations, not clinical-grade measurements.

A composite score does not replace physician guidance for medication decisions, training prescriptions, or clinical diagnoses. If you are on a health protocol, all dosing and medication decisions should be made in consultation with your prescribing provider.

Keep reading.

Recovery8 min

Why Your Fitness App Can’t Tell You When to Rest

Your watch says you’re recovered. Your legs disagree. The problem isn’t the device — it’s that recovery lives in the gap between what any single app can see.

Mar 18, 2026

Recovery8 min

What Your Sleep Data Is Actually Telling You

You slept eight hours and woke up wrecked. Your watch says everything was fine. The disconnect isn’t broken hardware — it’s that you’re reading the wrong numbers.

Feb 3, 2026

Science8 min

VO2 Max: The Single Best Predictor of How Long You’ll Live

122,007 patients. 8.4 years of follow-up. No upper limit of benefit. The Cleveland Clinic study that changed the conversation about cardiorespiratory fitness and mortality.

Mar 25, 2026

Recovery9 min

HRV Demystified: What the Number Actually Means

Your watch shows a number every morning and calls it HRV. But what is it measuring, why does it change, and when should you actually listen to it? A no-nonsense guide to the metric everyone tracks and few understand.

Apr 15, 2026