How do Anti-virus software scan through the whole file system so fast and find potential viruses efficiently?
The anti-virus software scans all the existing files in your directory and cross-checks with known malware. If they match, they are blocked.
If its a modified version of an already existing virus, the behavioural pattern or content of these files would match on some level with already know virus.
If the malware is something that has not been discovered or is a new virus, then, the anti-virus checks for patterns in the file that may indicate if its a virus or not, but more often than not, these viruses are caught when you are downloading the file itself.
Todd
Software Security TechLead
It's funny that you asked this question exactly when I left for a conference this past week, otherwise I would have gotten on this answer sooner!
In broad terms, AV does a number of things:
*SPECIAL NOTE: You may wonder how on earth the AV could scan a huge file and check it for virus signatures so quickly. One way this is done is using hashes. I can't explain hashes entirely here (look them up if you don't know), but basically, each malware is run through a hashing algorithm like SHA-256 for example. This algorithm creates a special unique number (the hash) for the bytes that make up the file. It is very important to note that the hash has absolutely nothing to do with the file name but instead, the entire contents of the file are quickly fed through the algorithm to create the unique hash. What this means is that known bad malware will always have the same hash unless the file bytes change. So often, a quick way for AV to determine if a file is bad is to hash it and then check that hash number with a database. If the hash is in the bad file database, it will be flagged as bad without even having to go through all of the above detailed checks which could require more time for the CPU to complete.
It is important to note that if an exe's hash matches a bad file, the file is 99.99% chance the bad file. However, just because a file's hash doesn't match any database entries, doesn't mean the file is clean.. It just means that this particular byte arrangement in this file is different from previously submitted ones. This means that if a virus can change itself and spread, it will create a unique hash every single time and then won't be picked up by AV which only checks hashes.