python如何读取一个大于10G的txt文件

这篇文章给大家分享的是有关python如何读取一个大于10G的txt文件的内容。小编觉得挺实用的,因此分享给大家做个参考,一起跟随小编过来看看吧。

前言

用python 读取一个大于10G 的文件,自己电脑只有8G内存,一运行就报内存溢出:MemoryErrorpython 如何用open函数读取大文件呢?

读取大文件

首先可以自己先制作一个大于10G的txt文件

a='''
2021-02-0221:33:31,678[django.request:93][base:get_response][WARNING]-NotFound:/http:/123.125.114.144/
2021-02-0221:33:31,679[django.server:124][basehttp:log_message][WARNING]-"HEADhttp://123.125.114.144/HTTP/1.1"4041678
2021-02-0222:14:04,121[django.server:124][basehttp:log_message][INFO]-code400,messageBadrequestversion('HTTP')
2021-02-0222:14:04,122[django.server:124][basehttp:log_message][WARNING]-"GET../../mnt/custom/ProductDefinitionHTTP"400-
2021-02-0222:16:21,052[django.server:124][basehttp:log_message][INFO]-"GET/api/loginHTTP/1.1"3010
2021-02-0222:16:21,123[django.server:124][basehttp:log_message][INFO]-"GET/api/login/HTTP/1.1"2003876
2021-02-0222:16:21,192[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/img/main_bg.pngHTTP/1.1"2002801
2021-02-0222:16:21,196[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/iconfont/style.cssHTTP/1.1"2001638
2021-02-0222:16:21,229[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/img/bg.jpgHTTP/1.1"200135990
2021-02-0222:16:21,307[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/iconfont/fonts/icomoon.ttf?u4m6fyHTTP/1.1"2006900
2021-02-0222:16:23,525[django.server:124][basehttp:log_message][INFO]-"POST/api/login/HTTP/1.1"3020
2021-02-0222:16:23,618[django.server:124][basehttp:log_message][INFO]-"GET/api/index/HTTP/1.1"20018447
2021-02-0222:16:23,709[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/js/commons.jsHTTP/1.1"20013209
2021-02-0222:16:23,712[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/css/admin.cssHTTP/1.1"20019660
2021-02-0222:16:23,712[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/css/common.cssHTTP/1.1"2001004
2021-02-0222:16:23,714[django.server:124][basehttp:log_message][INFO]-"GET/static/assets/js/app.jsHTTP/1.1"20020844
2021-02-0222:16:26,509[django.server:124][basehttp:log_message][INFO]-"GET/api/report_list/1/HTTP/1.1"20014649
2021-02-0222:16:51,496[django.server:124][basehttp:log_message][INFO]-"GET/api/test_list/1/HTTP/1.1"20024874
2021-02-0222:16:51,721[django.server:124][basehttp:log_message][INFO]-"POST/api/add_case/HTTP/1.1"2000
2021-02-0222:16:59,707[django.server:124][basehttp:log_message][INFO]-"GET/api/test_list/1/HTTP/1.1"20024874
2021-02-0322:16:59,909[django.server:124][basehttp:log_message][INFO]-"POST/api/add_case/HTTP/1.1"2000
2021-02-0322:17:01,306[django.server:124][basehttp:log_message][INFO]-"GET/api/edit_case/1/HTTP/1.1"20036504
2021-02-0322:17:06,265[django.server:124][basehttp:log_message][INFO]-"GET/api/add_project/HTTP/1.1"20017737
2021-02-0322:17:07,825[django.server:124][basehttp:log_message][INFO]-"GET/api/project_list/1/HTTP/1.1"20029789
2021-02-0322:17:13,116[django.server:124][basehttp:log_message][INFO]-"GET/api/add_config/HTTP/1.1"20024816
2021-02-0322:17:19,671[django.server:124][basehttp:log_message][INFO]-"GET/api/config_list/1/HTTP/1.1"20019532
'''
whileTrue:
withopen("xxx.log","a",encoding="utf-8")asfp:
fp.write(a)

循环写入到 xxx.log 文件,运行 3-5 分钟,pycharm 打开查看文件大小大于 10G

python如何读取一个大于10G的txt文件

于是我用open函数 直接读取

f=open("xxx.log",'r')
print(f.read())
f.close()

抛出内存溢出异常:MemoryError

Traceback (most recent call last):File "D:/2021kecheng06/demo/txt.py", line 35, in <module>print(f.read())MemoryError

运行的时候可以看下自己电脑的内存已经占了100%, cpu高达91% ,不挂掉才怪了!

python如何读取一个大于10G的txt文件

这种错误的原因在于,read()方法执行操作是一次性的都读入内存中,显然文件大于内存就会报错。

read() 的几种方法

1.read() 方法可以带参数 n, n 是每次读取的大小长度,也就是可以每次读一部分,这样就不会导致内存溢出

f=open("xxx.log",'r')
print(f.read(2048))
f.close()

运行结果

2019-10-24 21:33:31,678 [django.request:93] [base:get_response] [WARNING]- Not Found: /http:/123.125.114.144/2019-10-24 21:33:31,679 [django.server:124] [basehttp:log_message] [WARNING]- "HEAD http://123.125.114.144/ HTTP/1.1" 404 16782019-10-24 22:14:04,121 [django.server:124] [basehttp:log_message] [INFO]- code 400, message Bad request version ('HTTP')2019-10-24 22:14:04,122 [django.server:124] [basehttp:log_message] [WARNING]- "GET ../../mnt/custom/ProductDefinition HTTP" 400 -2019-10-24 22:16:21,052 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login HTTP/1.1" 301 02019-10-24 22:16:21,123 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login/ HTTP/1.1" 200 38762019-10-24 22:16:21,192 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/img/main_bg.png HTTP/1.1" 200 28012019-10-24 22:16:21,196 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/style.css HTTP/1.1" 200 16382019-10-24 22:16:21,229 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/img/bg.jpg HTTP/1.1" 200 1359902019-10-24 22:16:21,307 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/fonts/icomoon.ttf?u4m6fy HTTP/1.1" 200 69002019-10-24 22:16:23,525 [django.server:124] [basehttp:log_message] [INFO]- "POST /api/login/ HTTP/1.1" 302 02019-10-24 22:16:23,618 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/index/ HTTP/1.1" 200 184472019-10-24 22:16:23,709 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/commons.js HTTP/1.1" 200 132092019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/admin.css HTTP/1.1" 200 196602019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/common.css HTTP/1.1" 200 10042019-10-24 22:16:23,714 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/app.js HTTP/1.1" 200 208442019-10-24 22:16:26,509 [django.server:124] [basehttp:log_message] [I

这样就只读取了2048个字符,全部读取的话,循环读就行

f=open("xxx.log",'r')
whileTrue:
block=f.read(2048)
print(block)
ifnotblock:
break
f.close()

2.readline():每次读取一行,这个方法也不会报错

f=open("xxx.log",'r')

whileTrue:
line=f.readline()
print(line,end="")
ifnotline:
break
f.close()

3.readlines():读取全部的行,生成一个list,通过list来对文件进行处理,显然这种方式依然会造成:MemoyError

真正 Pythonic 的方法

真正 Pythonci 的方法,使用 with 结构打开文件,fp 是一个可迭代对象,可以用 for 遍历读取每行的文件内容

withopen("xxx.log",'r')asfp:
forlineinfp:
print(line,end="")

yield 生成器读取大文件

前面一篇讲yield 生成器的时候提到读取大文件,函数返回一个可迭代对象,用next()方法读取文件内容

defread_file(fpath):
BLOCK_SIZE=1024
withopen(fpath,'rb')asf:
whileTrue:
block=f.read(BLOCK_SIZE)
ifblock:
yieldblock
else:
return
if__name__=='__main__':
a=read_file("xxx.log")
print(a)#generatorobjec
print(next(a))#bytes类型
print(next(a).decode("utf-8"))#str

运行结果

<generator object read_file at 0x00000226B3005258>b'\r\n2019-10-24 21:33:31,678 [django.request:93] [base:get_response] [WARNING]- Not Found: /http:/123.125.114.144/\r\n2019-10-24 21:33:31,679 [django.server:124] [basehttp:log_message] [WARNING]- "HEAD http://123.125.114.144/ HTTP/1.1" 404 1678\r\n2019-10-24 22:14:04,121 [django.server:124] [basehttp:log_message] [INFO]- code 400, message Bad request version (\'HTTP\')\r\n2019-10-24 22:14:04,122 [django.server:124] [basehttp:log_message] [WARNING]- "GET ../../mnt/custom/ProductDefinition HTTP" 400 -\r\n2019-10-24 22:16:21,052 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login HTTP/1.1" 301 0\r\n2019-10-24 22:16:21,123 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/login/ HTTP/1.1" 200 3876\r\n2019-10-24 22:16:21,192 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/img/main_bg.png HTTP/1.1" 200 2801\r\n2019-10-24 22:16:21,196 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/style.css HTTP/1.1" 200 1638\r\n2019-10-24 22:16:21,229 [django.server:124] '[basehttp:log_message] [INFO]- "GET /static/assets/img/bg.jpg HTTP/1.1" 200 1359902019-10-24 22:16:21,307 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/iconfont/fonts/icomoon.ttf?u4m6fy HTTP/1.1" 200 69002019-10-24 22:16:23,525 [django.server:124] [basehttp:log_message] [INFO]- "POST /api/login/ HTTP/1.1" 302 02019-10-24 22:16:23,618 [django.server:124] [basehttp:log_message] [INFO]- "GET /api/index/ HTTP/1.1" 200 184472019-10-24 22:16:23,709 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/commons.js HTTP/1.1" 200 132092019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/admin.css HTTP/1.1" 200 196602019-10-24 22:16:23,712 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/css/common.css HTTP/1.1" 200 10042019-10-24 22:16:23,714 [django.server:124] [basehttp:log_message] [INFO]- "GET /static/assets/js/app.js HTTP/1.1" 200 208442019-10-24 22:16:26,509 [django.server:124] [basehtt

感谢各位的阅读!关于“python如何读取一个大于10G的txt文件”这篇文章就分享到这里了,希望以上内容可以对大家有一定的帮助,让大家可以学到更多知识,如果觉得文章不错,可以把它分享出去让更多的人看到吧!

发布于 2021-05-30 14:08:45
收藏
分享
海报
0 条评论
165
上一篇:Canvas中globalCompositeOperation详解 下一篇:如何利用三角函数在canvas上画虚线
目录

    推荐阅读

    0 条评论

    本站已关闭游客评论,请登录或者注册后再评论吧~

    忘记密码?

    图形验证码