初次接触网络分析的“文科”小白没学过任何代码知识,也看到好多视频中的弹幕或评论对如何构建矩阵表格这个步骤不知所措。当前好多社会网络分析软件需要付费,不甘心作为韭菜的我,初次尝试了ChatGpt-4o来辅助生成矩阵网络。由于第一次使用Python,过程中存在诸多bug,不过均可以让Gpt帮助debug,只要问题准确且有耐心。
下面附上代码:
1矩阵:需要提前安装好pandas、numpy、Counter(b站搜索即可)
import pandas as pd
import numpy as np
from collections import Counter
file_path = 'G:\\A研究\\文化产业政策网络\\政策\\2016-2024.xlsx'
df = pd.read_excel(file_path)
texts = df['text'].tolist()
tokenized_texts = [text.split() for text in texts]
all_tokens = [token for tokens in tokenized_texts for token in tokens]
word_counts = Counter(all_tokens)
vocab = [word for word, count in word_counts.items() if count >= 1]
word2idx = {word: idx for idx, word in enumerate(vocab)}
co_occurrence_matrix = np.zeros((len(vocab), len(vocab)))
window_size = 2
for tokens in tokenized_texts:
for i, word in enumerate(tokens):
if word in word2idx:
word_idx = word2idx[word]
for j in range(max(0, i - window_size), min(len(tokens), i + window_size + 1)):
if i != j and tokens[j] in word2idx:
co_occurrence_matrix[word_idx][word2idx[tokens[j]]] += 1
co_occurrence_df = pd.DataFrame(co_occurrence_matrix, index=vocab, columns=vocab)
output_path = 'G:\\A研究\\文化产业政策网络\\政策\\co_occurrence_matrix.xlsx'
co_occurrence_df.to_excel(output_path)
print(f"共现矩阵已保存为 {output_path}")
2频次
import pandas as pd
file_path = 'G:\\A研究\\文化产业政策网络\\政策\\2016-2024.xlsx'
df = pd.read_excel(file_path)
texts = df['text'].tolist()
tokenized_texts = [text.split() for text in texts]
freq_df = pd.DataFrame(all_tokens, columns=['词语'])
freq_df['频次'] = 1
freq_df = freq_df.groupby('词语').count().reset_index()
freq_df = freq_df.sort_values(by='频次', ascending=False)
print(freq_df)
output_path = 'G:\\A研究\\文化产业政策网络\\政策\\word_frequencies_pandas.xlsx'
freq_df.to_excel(output_path, index=False)
print(f"词频表已导出为 {output_path}")
最后,欢迎计算机专业或做过类似研究的同志指正