I am working with a very large text file (tsv) around 200 million entries. One of the column is date and records are sorted on date. Now I want to start reading the record from a given date. Currently I was just reading from start which is very slow since I need to read almost 100-150 million records just to reach that record. I was thinking if I can use binary search to speed it up, I can do away in just max 28 extra record reads (log(200 million)). Does python allow to read nth line without caching or reading lines before it?
Aucun commentaire:
Enregistrer un commentaire