GitHub热门项目启发:基于Spark的医院体检数据可视化分析系统实现
计算机编程果茶熊
2025年08月27日 13:05
收录于文集
共46篇

一、个人简介

💖💖作者:计算机编程果茶熊​

💙💙个人简介:曾长期从事计算机专业培训教学,担任过编程老师,同时本人也热爱上课教学,擅长Java、微信小程序、Python、Golang、安卓Android等多个IT方向。会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。平常喜欢分享一些自己开发中遇到的问题的解决办法,也喜欢交流技术,大家有技术代码这一块的问题可以问我!

💛💛想说的话:感谢大家的关注与支持!

💕💕文末获取源码联系 计算机编程果茶熊​

二、系统介绍

大数据框架:Hadoop+Spark(Hive需要定制修改)

开发语言:Java+Python(两个版本都支持)

数据库:MySQL

后端框架:SpringBoot(Spring+SpringMVC+Mybatis)+Django(两个版本都支持)

前端:Vue+Echarts+HTML+CSS+JavaScript+jQuery

基于Spark的医院体检数据可视化分析系统是一套专门针对医疗机构体检数据处理与分析的大数据应用平台。该系统采用Hadoop+Spark大数据架构作为底层技术支撑,结合Python数据科学生态和Java企业级开发框架,构建了完整的体检数据处理流水线。系统前端采用Vue+ElementUI+Echarts技术栈,为用户提供直观友好的数据可视化界面,支持多维度的体检数据展示与交互操作。核心功能涵盖体检数据管理、体检人群画像分析、多维因素关联分析、高发健康问题分析以及关键生理指标分析等模块,能够有效处理大规模体检数据的存储、清洗、分析和可视化呈现。通过Spark SQL和Pandas等工具进行数据预处理和特征工程,利用NumPy进行数值计算,最终将分析结果以图表形式展现给医护人员和管理者,为医院体检业务的数字化转型和智能化决策提供技术支持。

三、基于Spark的医院体检数据可视化分析系统-视频解说

四、基于Spark的医院体检数据可视化分析系统-功能展示

五、基于Spark的医院体检数据可视化分析系统-代码展示

代码块
Python
自动换行
复制代码

from pyspark.sql import SparkSession

from pyspark.sql.functions import col, count, avg, sum, when, desc, asc

import pandas as pd

import numpy as np

from datetime import datetime

import json



spark = SparkSession.builder.appName("HospitalHealthDataAnalysis").config("spark.sql.adaptive.enabled", "true").getOrCreate()



def health_portrait_analysis(exam_data_df):

    age_group_df = exam_data_df.withColumn("age_group", 

        when(col("age") <= 25, "青年组(<=25)")

        .when((col("age") > 25) & (col("age") <= 40), "中青年组(26-40)")

        .when((col("age") > 40) & (col("age") <= 60), "中年组(41-60)")

        .otherwise("老年组(>60)"))

    gender_age_stats = age_group_df.groupBy("gender", "age_group").agg(

        count("*").alias("total_count"),

        avg("systolic_pressure").alias("avg_systolic"),

        avg("diastolic_pressure").alias("avg_diastolic"),

        avg("blood_sugar").alias("avg_blood_sugar"),

        avg("cholesterol").alias("avg_cholesterol"),

        avg("bmi").alias("avg_bmi")

    ).orderBy("gender", "age_group")

    health_risk_df = exam_data_df.withColumn("hypertension_risk",

        when((col("systolic_pressure") >= 140) | (col("diastolic_pressure") >= 90), 1).otherwise(0))

    health_risk_df = health_risk_df.withColumn("diabetes_risk",

        when(col("blood_sugar") >= 7.0, 1).otherwise(0))

    health_risk_df = health_risk_df.withColumn("obesity_risk",

        when(col("bmi") >= 28.0, 1).otherwise(0))

    risk_summary = health_risk_df.groupBy("gender", "age_group").agg(

        sum("hypertension_risk").alias("hypertension_count"),

        sum("diabetes_risk").alias("diabetes_count"),

        sum("obesity_risk").alias("obesity_count"),

        count("*").alias("total_examinees")

    )

    risk_percentage = risk_summary.withColumn("hypertension_rate",

        (col("hypertension_count") / col("total_examinees") * 100).cast("decimal(5,2)"))

    risk_percentage = risk_percentage.withColumn("diabetes_rate",

        (col("diabetes_count") / col("total_examinees") * 100).cast("decimal(5,2)"))

    risk_percentage = risk_percentage.withColumn("obesity_rate",

        (col("obesity_count") / col("total_examinees") * 100).cast("decimal(5,2)"))

    occupation_health_df = exam_data_df.groupBy("occupation").agg(

        count("*").alias("occupation_count"),

        avg("systolic_pressure").alias("avg_systolic"),

        avg("stress_level").alias("avg_stress"),

        avg("exercise_frequency").alias("avg_exercise")

    ).orderBy(desc("occupation_count"))

    final_portrait = gender_age_stats.join(risk_percentage, ["gender", "age_group"], "inner")

    portrait_result = final_portrait.collect()

    return {"demographic_stats": portrait_result, "occupation_analysis": occupation_health_df.collect()}



def multidimensional_correlation_analysis(exam_data_df):

    correlation_features = ["age", "bmi", "systolic_pressure", "diastolic_pressure", 

                          "blood_sugar", "cholesterol", "exercise_frequency", "sleep_hours"]

    feature_df = exam_data_df.select(*correlation_features)

    pandas_df = feature_df.toPandas()

    correlation_matrix = pandas_df.corr()

    strong_correlations = []

    for i in range(len(correlation_matrix.columns)):

        for j in range(i+1, len(correlation_matrix.columns)):

            corr_value = correlation_matrix.iloc[i, j]

            if abs(corr_value) > 0.3:

                strong_correlations.append({

                    "feature1": correlation_matrix.columns[i],

                    "feature2": correlation_matrix.columns[j],

                    "correlation": round(corr_value, 4),

                    "strength": "强正相关" if corr_value > 0.5 else "强负相关" if corr_value < -0.5 else "中等相关"

                })

    lifestyle_health_df = exam_data_df.groupBy("exercise_frequency", "smoking_status").agg(

        avg("systolic_pressure").alias("avg_systolic"),

        avg("cholesterol").alias("avg_cholesterol"),

        avg("bmi").alias("avg_bmi"),

        count("*").alias("group_count")

    ).filter(col("group_count") >= 10)

    bmi_pressure_analysis = exam_data_df.withColumn("bmi_category",

        when(col("bmi") < 18.5, "偏瘦")

        .when((col("bmi") >= 18.5) & (col("bmi") < 24), "正常")

        .when((col("bmi") >= 24) & (col("bmi") < 28), "超重")

        .otherwise("肥胖"))

    bmi_pressure_stats = bmi_pressure_analysis.groupBy("bmi_category").agg(

        avg("systolic_pressure").alias("avg_systolic"),

        avg("diastolic_pressure").alias("avg_diastolic"),

        count("*").alias("category_count")

    ).orderBy("bmi_category")

    age_multifactor_df = exam_data_df.withColumn("age_decade", (col("age") / 10).cast("int") * 10)

    age_factor_analysis = age_multifactor_df.groupBy("age_decade", "gender").agg(

        avg("blood_sugar").alias("avg_blood_sugar"),

        avg("cholesterol").alias("avg_cholesterol"),

        avg("liver_function").alias("avg_liver_function")

    ).orderBy("age_decade", "gender")

    return {

        "correlation_matrix": correlation_matrix.to_dict(),

        "strong_correlations": strong_correlations,

        "lifestyle_analysis": lifestyle_health_df.collect(),

        "bmi_pressure_analysis": bmi_pressure_stats.collect(),

        "age_factor_analysis": age_factor_analysis.collect()

    }



def high_frequency_health_issues_analysis(exam_data_df):

    health_indicators_df = exam_data_df.withColumn("hypertension",

        when((col("systolic_pressure") >= 140) | (col("diastolic_pressure") >= 90), 1).otherwise(0))

    health_indicators_df = health_indicators_df.withColumn("hyperglycemia",

        when(col("blood_sugar") >= 6.1, 1).otherwise(0))

    health_indicators_df = health_indicators_df.withColumn("hyperlipidemia",

        when(col("cholesterol") >= 5.7, 1).otherwise(0))

    health_indicators_df = health_indicators_df.withColumn("fatty_liver",

        when(col("liver_function") >= 40, 1).otherwise(0))

    health_indicators_df = health_indicators_df.withColumn("anemia",

        when((col("gender") == "男" & col("hemoglobin") < 120) | 

             (col("gender") == "女" & col("hemoglobin") < 110), 1).otherwise(0))

    total_examinees = exam_data_df.count()

    issue_prevalence = health_indicators_df.agg(

        sum("hypertension").alias("hypertension_cases"),

        sum("hyperglycemia").alias("hyperglycemia_cases"),

        sum("hyperlipidemia").alias("hyperlipidemia_cases"),

        sum("fatty_liver").alias("fatty_liver_cases"),

        sum("anemia").alias("anemia_cases")

    ).collect()[0]

    prevalence_rates = {

        "hypertension": {"cases": issue_prevalence["hypertension_cases"], 

                        "rate": round(issue_prevalence["hypertension_cases"] / total_examinees * 100, 2)},

        "hyperglycemia": {"cases": issue_prevalence["hyperglycemia_cases"],

                         "rate": round(issue_prevalence["hyperglycemia_cases"] / total_examinees * 100, 2)},

        "hyperlipidemia": {"cases": issue_prevalence["hyperlipidemia_cases"],

                          "rate": round(issue_prevalence["hyperlipidemia_cases"] / total_examinees * 100, 2)},

        "fatty_liver": {"cases": issue_prevalence["fatty_liver_cases"],

                       "rate": round(issue_prevalence["fatty_liver_cases"] / total_examinees * 100, 2)},

        "anemia": {"cases": issue_prevalence["anemia_cases"],

                  "rate": round(issue_prevalence["anemia_cases"] / total_examinees * 100, 2)}

    }

    age_gender_issues = health_indicators_df.groupBy("age_group", "gender").agg(

        sum("hypertension").alias("hypertension_count"),

        sum("hyperglycemia").alias("hyperglycemia_count"),

        sum("hyperlipidemia").alias("hyperlipidemia_count"),

        count("*").alias("group_total")

    )

    comorbidity_analysis = health_indicators_df.withColumn("comorbidity_count",

        col("hypertension") + col("hyperglycemia") + col("hyperlipidemia") + 

        col("fatty_liver") + col("anemia"))

    comorbidity_stats = comorbidity_analysis.groupBy("comorbidity_count").agg(

        count("*").alias("patient_count")

    ).orderBy("comorbidity_count")

    seasonal_trends_df = exam_data_df.withColumn("exam_month", 

        month(col("exam_date")))

    seasonal_analysis = seasonal_trends_df.groupBy("exam_month").agg(

        avg("systolic_pressure").alias("avg_systolic"),

        avg("blood_sugar").alias("avg_blood_sugar"),

        count("*").alias("monthly_exams")

    ).orderBy("exam_month")

    return {

        "prevalence_rates": prevalence_rates,

        "age_gender_distribution": age_gender_issues.collect(),

        "comorbidity_analysis": comorbidity_stats.collect(),

        "seasonal_trends": seasonal_analysis.collect(),

        "total_examinees": total_examinees

    }

复制成功

六、基于Spark的医院体检数据可视化分析系统-文档展示

七、END

💛💛想说的话:感谢大家的关注与支持!

💕💕文末获取源码联系  计算机编程果茶熊​