![]() Check hash, and if it changed - check string length.ĭownload the content and create a hash checksum using SHA512 hash of content, keep it in the db and compare it each time. I think better way - combine hash and content length solution. How would you do this? Would you look at the Kb size of the HTML? Would you look at the string length and check if for example the length has changed more than 5%, the content has been "changed"? Or is there some kind of hashing algorithm where the hashes stay the same if only small parts of the string/content has been changed?Ībout last-modified - unfortunately not all servers return this date correctly. So for example if the page display the current date on the page, every single time the hash would be different and tell me that the content has been updated. I'm thinking of comparing hashes, the problem with this is that if the page has changed a single byte or character, the hash would be different. Basically I'm trying to run some code (Python 2.7) if the content on a website changes, otherwise wait for a bit and check it later.
0 Comments
Leave a Reply. |