综合练习：词频统计-白红宇

强烈建议你试试无所不能的chatGPT，快点击我

综合练习：词频统计

阅读量：5035 次

发布时间：2019-06-12

本文共 2991 字，大约阅读时间需要 9 分钟。

1.英文词频统计：

下载一首英文的歌词或文章

song = ''' Passion is sweetLove makes weakYou said you cherised freedom soYou refused to let it goFollow your faith Love and hatenever failed to seize the dayDon't give yourself awayOh when the night fallsAnd your all aloneIn your deepest sleep What are you dreeeming ofMy skin's still burning from your touchOh I just can't get enough I said I wouldn't ask for muchBut your eyes are dangerousSo the tought keeps spinning in my headCan we drop this masqueradeI can't predict where it endsIf you're the rock I'll crush againstTrapped in a crowdMusic's loudI said I loved my freedom tooNow im not so sure i doAll eyes on youWings so trueBetter quit while your aheadNow im not so sure i amOh when the night fallsAnd your all aloneIn your deepest sleepWhat are you dreaming ofMy skin's still burning from your touchOh I just can't get enoughI said I wouldn't ask for muchBut your eyes are dangerousSo the thought keeps spinning in my headCan we drop this masquerade I can't predict where it endsIf you're the rock I'll crush againstMy soul, my heartIf your near or if your farMy life, my loveYou can have it allOh when the night fallsAnd your all aloneIn your deepest sleepWhat are you dreaming ofMy skin's still burning from your touchOh I just can't get enoughI said I wouldn't ask for muchBut your eyes are dangerous So the thought keeps spinning in my headCan we drop this masqueradeI can't predict where it endsIf you're the rock I'll crush againstIf you're the rock i'll crush against '''

将所有,.？！’:等分隔符全部替换为空格

sep = ''',.?';'"'''for i in sep:    song.replace(i," ")

将所有大写转换为小写，生成单词列表

songList =  song.lower().split()

生成词频统计

countdict = {}songset = set(songList)for i in songset:    countdict[i] = songList.count(i)for i in countdict:    print(i,countdict[i])

排序

dictList = list(countdict.items())dictList.sort(key = lambda x:x[1],reverse = True)

排除语法型词汇，代词、冠词、连词

delList = {
   "the","a"，"an"}songset = set(songList) - delList

输出词频最大TOP20

for i in range(20):    print(dictList[i])

将分析对象存为utf-8编码的文件，通过文件读取的方式获得词频分析内容。

读取歌词：

f = open("F:/study/大三/大数据/song.txt","r")song = f.read();f.close()

保存分析结果：

f = open("F:/study/大三/大数据/resulet.txt","a")for i in range(20):    f.write('\n'+dictList[i][0]+" "+str(dictList[i][1]))f.close()

实验结果：

2.中文词频统计：

下载一长篇中文文章。

从文件读取待分析文本。

news = open('gzccnews.txt','r',encoding = 'utf-8')

安装与使用jieba进行中文分词。

pip install jieba

import jieba

list(jieba.lcut(news))

生成词频统计

排序

排除语法型词汇，代词、冠词、连词

输出词频最大TOP20（或把结果存放到文件里）

import jiebaf = open("F:\study\大三\大数据\中文词频.txt","r")str1 = f.read()stringList =list(jieba.cut(str1))delset = {
   "，","。","：","“","”","？"," ","；","！","、"}stringset = set(stringList) - delsetcountdict = {}for i in stringset:    countdict[i] = stringList.count(i)dictList = list(countdict.items())dictList.sort(key = lambda x:x[1],reverse = True)f = open("F:/study/大三/大数据/resulet.txt", "a")for i in range(20): f.write('\n' + dictList[i][0] + " " + str(dictList[i][1]))f.close()

转载于:https://www.cnblogs.com/Ming-jay/p/8658462.html

你可能感兴趣的文章

Oracle T4-2 使用ILOM CLI升级Firmware

数据分析 -- 白话一下什么是决策树模型(转载)

Java SPI机制原理和使用场景

web前端java script学习2017.7.18

删除TXPlatform

LaTex：图片排版

并发访问超时的问题可能性(引用)

中小团队基于Docker的Devops实践

利用python打开摄像头并保存

System函数的使用说明

Selenium-测试对象操作之：获取浏览器滚动条滚动距离

Linux下MySQL数据库安装与配置

Extjs String转Json

oracle入门（4）——少而常用的命令

打印机设置(PrintDialog)、页面设置(PageSetupDialog) 及 RDLC报表如何选择指定打印机...

Java 虚拟机部分面试题

JS中 String/JSON 方法总结

二叉树的遍历问题总结

3.14-3.20周总结

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！-- 愿君每日到此一游！

当前时间: 2024-11-14 18:12:41 当前IP: 18.226.186.225 联系邮箱:javaeecc@qq.com Copyright © 2020 - 2022 baihongyu.com 京ICP备2021015314号-2

强烈建议你试试无所不能的CHAT-GPT，快点击我

强烈建议你试试无所不能的CHAT-GPT，快点击我