python如何读取一个大于10G的txt文件
这篇文章给大家分享的是有关python如何读取一个大于10G的txt文件的内容。小编觉得挺实用的,因此分享给大家做个参考,一起跟随小编过来看看吧。
前言
用python 读取一个大于10G 的文件,自己电脑只有8G内存,一运行就报内存溢出:MemoryErrorpython 如何用open函数读取大文件呢?
读取大文件
首先可以自己先制作一个大于10G的txt文件
a=''' 2021-02-0221:33:31,678[django.request:93][base:get_response][WARNING]-NotFound:/http:/123.125.114.144/ 2021-02-0221:33:31,679[django.server:124][basehttp:log_message][WARNING]-"HEADhttp://123.125.114.144/HTTP/1.1"4041678 2021-02-0222:14:04,121[django.server:124][basehttp:log_message][INFO]-code400,messageBadrequestversion('HTTP') 2021-02-0222:14:04,122[django.server:124][basehttp:log_message][WARNING]-"GET../../mnt/custom/ProductDefinitionHTTP"400- 2021-02-0222:16:21,052[django.server:124][basehttp:log_message][INFO]-"GET/api/loginHTTP/1.1"3010 2021-02-0222:16:21,123[django.server:124][basehttp:log_message][INFO]-"GET/api/login/HTTP/1.1"2003876 2021-02-0222:16:21,192[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/img/main_bg.pngHTTP/1.1"2002801 2021-02-0222:16:21,196[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/iconfont/style.cssHTTP/1.1"2001638 2021-02-0222:16:21,229[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/img/bg.jpgHTTP/1.1"200135990 2021-02-0222:16:21,307[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/iconfont/fonts/icomoon.ttf?u4m6fyHTTP/1.1"2006900 2021-02-0222:16:23,525[django.server:124][basehttp:log_message][INFO]-"POST/api/login/HTTP/1.1"3020 2021-02-0222:16:23,618[django.server:124][basehttp:log_message][INFO]-"GET/api/index/HTTP/1.1"20018447 2021-02-0222:16:23,709[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/js/commons.jsHTTP/1.1"20013209 2021-02-0222:16:23,712[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/css/admin.cssHTTP/1.1"20019660 2021-02-0222:16:23,712[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/css/common.cssHTTP/1.1"2001004 2021-02-0222:16:23,714[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/js/app.jsHTTP/1.1"20020844 2021-02-0222:16:26,509[django.server:124][basehttp:log_message][INFO]-"GET/api/report_list/1/HTTP/1.1"20014649 2021-02-0222:16:51,496[django.server:124][basehttp:log_message][INFO]-"GET/api/test_list/1/HTTP/1.1"20024874 2021-02-0222:16:51,721[django.server:124][basehttp:log_message][INFO]-"POST/api/add_case/HTTP/1.1"2000 2021-02-0222:16:59,707[django.server:124][basehttp:log_message][INFO]-"GET/api/test_list/1/HTTP/1.1"20024874 2021-02-0322:16:59,909[django.server:124][basehttp:log_message][INFO]-"POST/api/add_case/HTTP/1.1"2000 2021-02-0322:17:01,306[django.server:124][basehttp:log_message][INFO]-"GET/api/edit_case/1/HTTP/1.1"20036504 2021-02-0322:17:06,265[django.server:124][basehttp:log_message][INFO]-"GET/api/add_project/HTTP/1.1"20017737 2021-02-0322:17:07,825[django.server:124][basehttp:log_message][INFO]-"GET/api/project_list/1/HTTP/1.1"20029789 2021-02-0322:17:13,116[django.server:124][basehttp:log_message][INFO]-"GET/api/add_config/HTTP/1.1"20024816 2021-02-0322:17:19,671[django.server:124][basehttp:log_message][INFO]-"GET/api/config_list/1/HTTP/1.1"20019532 ''' whileTrue: withopen("xxx.log","a",encoding="utf-8")asfp: fp.write(a)
循环写入到 xxx.log 文件,运行 3-5 分钟,pycharm 打开查看文件大小大于 10G
于是我用open函数 直接读取
f=open("xxx.log",'r') print(f.read()) f.close()
抛出内存溢出异常:MemoryError
Traceback (most recent call last):File "D:/2021kecheng06/demo/txt.py", line 35, in <module>print(f.read())MemoryError
运行的时候可以看下自己电脑的内存已经占了100%, cpu高达91% ,不挂掉才怪了!
这种错误的原因在于,read()方法执行操作是一次性的都读入内存中,显然文件大于内存就会报错。
read() 的几种方法
1.read() 方法可以带参数 n, n 是每次读取的大小长度,也就是可以每次读一部分,这样就不会导致内存溢出
f=open("xxx.log",'r') print(f.read(2048)) f.close()
运行结果
2019-10-24 21:33:31,678 [django.request:93] [base:get_response] [WARNING]- Not Found: /http:/123.125.114.144/2019-10-24 21:33:31,679 [django.server:124] [basehttp:log_message] [WARNING]- "HEAD http://123.125.114.144/ HTTP/1.1" 404 16782019-10-24 22:14:04,121 [django.server:124] [basehttp:log_message] [INFO]- code 400, message Bad request version ('HTTP')2019-10-24 22:14:04,122 [django.server:124] [basehttp:log_message] [WARNING]- "GET ../../mnt/custom/ProductDefinition HTTP" 400 -2019-10-24 22:16:21,052 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login HTTP/1.1" 301 02019-10-24 22:16:21,123 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login/ HTTP/1.1" 200 38762019-10-24 22:16:21,192 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/img/main_bg.png HTTP/1.1" 200 28012019-10-24 22:16:21,196 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/style.css HTTP/1.1" 200 16382019-10-24 22:16:21,229 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/img/bg.jpg HTTP/1.1" 200 1359902019-10-24 22:16:21,307 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/fonts/icomoon.ttf?u4m6fy HTTP/1.1" 200 69002019-10-24 22:16:23,525 [django.server:124] [basehttp:log_message] [INFO]- "POST /api/login/ HTTP/1.1" 302 02019-10-24 22:16:23,618 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/index/ HTTP/1.1" 200 184472019-10-24 22:16:23,709 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/commons.js HTTP/1.1" 200 132092019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/admin.css HTTP/1.1" 200 196602019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/common.css HTTP/1.1" 200 10042019-10-24 22:16:23,714 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/app.js HTTP/1.1" 200 208442019-10-24 22:16:26,509 [django.server:124] [basehttp:log_message] [I
这样就只读取了2048个字符,全部读取的话,循环读就行
f=open("xxx.log",'r') whileTrue: block=f.read(2048) print(block) ifnotblock: break f.close()
2.readline():每次读取一行,这个方法也不会报错
f=open("xxx.log",'r') whileTrue: line=f.readline() print(line,end="") ifnotline: break f.close()
3.readlines():读取全部的行,生成一个list,通过list来对文件进行处理,显然这种方式依然会造成:MemoyError
真正 Pythonic 的方法
真正 Pythonci 的方法,使用 with 结构打开文件,fp 是一个可迭代对象,可以用 for 遍历读取每行的文件内容
withopen("xxx.log",'r')asfp: forlineinfp: print(line,end="")
yield 生成器读取大文件
前面一篇讲yield 生成器的时候提到读取大文件,函数返回一个可迭代对象,用next()方法读取文件内容
defread_file(fpath): BLOCK_SIZE=1024 withopen(fpath,'rb')asf: whileTrue: block=f.read(BLOCK_SIZE) ifblock: yieldblock else: return if__name__=='__main__': a=read_file("xxx.log") print(a)#generatorobjec print(next(a))#bytes类型 print(next(a).decode("utf-8"))#str
运行结果
<generator object read_file at 0x00000226B3005258>b'\r\n2019-10-24 21:33:31,678 [django.request:93] [base:get_response] [WARNING]- Not Found: /http:/123.125.114.144/\r\n2019-10-24 21:33:31,679 [django.server:124] [basehttp:log_message] [WARNING]- "HEAD http://123.125.114.144/ HTTP/1.1" 404 1678\r\n2019-10-24 22:14:04,121 [django.server:124] [basehttp:log_message] [INFO]- code 400, message Bad request version (\'HTTP\')\r\n2019-10-24 22:14:04,122 [django.server:124] [basehttp:log_message] [WARNING]- "GET ../../mnt/custom/ProductDefinition HTTP" 400 -\r\n2019-10-24 22:16:21,052 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login HTTP/1.1" 301 0\r\n2019-10-24 22:16:21,123 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login/ HTTP/1.1" 200 3876\r\n2019-10-24 22:16:21,192 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/img/main_bg.png HTTP/1.1" 200 2801\r\n2019-10-24 22:16:21,196 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/style.css HTTP/1.1" 200 1638\r\n2019-10-24 22:16:21,229 [django.server:124] '[basehttp:log_message] [INFO]- "GET /static/assets/img/bg.jpg HTTP/1.1" 200 1359902019-10-24 22:16:21,307 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/fonts/icomoon.ttf?u4m6fy HTTP/1.1" 200 69002019-10-24 22:16:23,525 [django.server:124] [basehttp:log_message] [INFO]- "POST /api/login/ HTTP/1.1" 302 02019-10-24 22:16:23,618 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/index/ HTTP/1.1" 200 184472019-10-24 22:16:23,709 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/commons.js HTTP/1.1" 200 132092019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/admin.css HTTP/1.1" 200 196602019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/common.css HTTP/1.1" 200 10042019-10-24 22:16:23,714 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/app.js HTTP/1.1" 200 208442019-10-24 22:16:26,509 [django.server:124] [basehtt
感谢各位的阅读!关于“python如何读取一个大于10G的txt文件”这篇文章就分享到这里了,希望以上内容可以对大家有一定的帮助,让大家可以学到更多知识,如果觉得文章不错,可以把它分享出去让更多的人看到吧!
推荐阅读
-
Python中怎么动态声明变量赋值
这篇文章将为大家详细讲解有关Python中怎么动态声明变量赋值,文章内容质量较高,因此小编分享给大家做个参考,希望大家阅读完这篇文...
-
python中变量的存储原理是什么
-
Python中怎么引用传递变量赋值
这篇文章将为大家详细讲解有关Python中怎么引用传递变量赋值,文章内容质量较高,因此小编分享给大家做个参考,希望大家阅读完这篇文...
-
python中怎么获取程序执行文件路径
python中怎么获取程序执行文件路径,很多新手对此不是很清楚,为了帮助大家解决这个难题,下面小编将为大家详细讲解,有这方面需求的...
-
Python中如何获取文件系统的使用率
Python中如何获取文件系统的使用率,针对这个问题,这篇文章详细介绍了相对应的分析和解答,希望可以帮助更多想解决这个问题的小伙伴...
-
Python中怎么获取文件的创建和修改时间
这篇文章将为大家详细讲解有关Python中怎么获取文件的创建和修改时间,文章内容质量较高,因此小编分享给大家做个参考,希望大家阅读...
-
python中怎么获取依赖包
今天就跟大家聊聊有关python中怎么获取依赖包,可能很多人都不太了解,为了让大家更加了解,小编给大家总结了以下内容,希望大家根据...
-
python怎么实现批量文件加密功能
-
python中怎么实现threading线程同步
小编给大家分享一下python中怎么实现threading线程同步,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!...
-
python下thread模块创建线程的方法
本篇内容介绍了“python下thread模块创建线程的方法”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来...