Limits of Webbkoll

Limits of Webbkoll

Webbkoll is sometimes considered as the ultimate privacy tool: You only have to enter a domain name and click on “Check”. Then, you hopefully see three zeros, a green lock and a green umbrella. Everything is fine, isn’t it?—No, because Webbkoll (like other online scanning tools) only scans a limited subset of privacy- and security-relevant configuration.

In this article, we show you reasons why Webbkoll can’t be used as a primary source to classify how privacy-friendly a website is.

Contents

  1. What is Webbkoll?
  2. Webbkoll’s limits
  3. Examples of inconsistent scanning results
  4. External scanning is limited in general
  5. Summary
  6. Sources
  7. Changelog

Always stay in the loop!
Subscribe to our RSS/Atom feeds.

What is Webbkoll?

Webbkoll is part of dataskydd.net. “Dataskydd” is Swedish and means “data protection”, while “webbkoll” is a combination of “webb” and “koll” which basically means “web check”. Dataskydd.net is a Swedish NGO founded by former Pirate Party EU parliamentarian Amelia Andersdotter.

Webbkoll is a scanning tool. You can enter a domain name and Webbkoll visits the website like “normal” people do. Then, Webbkoll monitors and logs whether the website uses third-party content, sends cookies or its publicly-accessible security configuration. Results are prominently shown and somewhat explained.

Webbkoll’s limits

First of all, Webbkoll only scans exactly the URL you entered. Subdomains or even other websites of the same domain aren’t scanned. This means that the single website scanned by Webbkoll can be free of cookies, third-party content and so on, while subdomains or other websites actually send cookies, use third-party content etc. In order to get complete feedback, you must scan every website and all subdomains. Of course, this isn’t really possible when this website uses dynamic content or requires you to be logged in (this can’t be simulated by Webbkoll, too). We show several examples below.

Secondly, even a website which is rated totally privacy-friendly (lots of zeros and green) can use databases or other servers (e.g. mail servers) which aren’t accessed by your client but by the server which hosts the scanned website. For instance, a database which stores personal data of forum members can be easily accessible by criminals due to weak passwords while the website seems to be absolutely locked down. Webbkoll also doesn’t check if software used by the website operator (like CMS) is up-to-date or contains known security vulnerabilities.

Thirdly, Webbkoll can’t check whether personal data collected by the scanned website is lawfully collected. For instance, a form on a website can force you to enter your religion although there is no reason for the website operator to know this data. Furthermore, Webbkoll can’t check whether personal data already stored by the website operator is sufficiently protected against attackers and isn’t illegally processed. For example, website operators can analyze personal data or sell it to third parties while Webbkoll still says that everything is fine.

Fourthly, Webbkoll doesn’t check forms or privacy policies. Forms can contain hidden fields which you can’t see but their content is transmitted to the server. Besides, private website operators tend to include privacy policies only because they heard somewhere that they need one. These policies are often generated with the help of online tools and people simply adopt them. Webbkoll can’t check privacy policies at all—neither their contents nor their presence or absence.

Fifthly, Webbkoll doesn’t check for widely-known tracking techniques. For instance, there are no checks for JavaScript or CSS. Both can be used to clearly identify website visitors. Even PHP and CSS are sufficient for user tracking, however, since PHP is processed by the server and not by the client, external scans can’t evaluate this.

Lastly, servers can generate lots of log files. These files can contain your IP address (already personal data), your user agent and other information. External scans normally can’t access these files. They even can’t check whether website operators log your visits at all. Only using the log files, website operators can:

  • see your approximate location (for example, Bratislava in Slovakia)
  • see your client information (for example, you use Chrome 65 on Windows 10)
  • see how often you access their website (for example, mostly in the evening between 7 and 8 p.m. on workdays)
  • see how long you stay on their website (time between first and last access)
  • see your interests (for example, you only read articles about the GDPR)

Of course, website operators can easily combine this data with other data like your comments or personal data collected from you.

Examples of inconsistent scanning results

During testing we often found that different websites of the same domain show different results. We added the following three examples of well-known websites (at least for Czechs) to show you differences:

  • Example 1: Observatory by Mozilla
    • Scanning the homepage (https://observatory.mozilla.org/) results in “Secure”, “Referrers partially leaked”, “0 cookies”, “6 third-party requests” and “3 third-parties contacted”.
    • When you entered a website and clicked “Scan Me” the website changes to https://observatory.mozilla.org/analyze/. Scanning this website results in “Insecure”, “Referrers partially leaked”, “0 cookies”, “5 third-party requests” and “2 third-parties contacted”.
    • The second website (analyze) doesn’t embed “fonts.googleapis.com”. That’s the difference here. However, without scanning the second website Webbkoll you wouldn’t know this.
  • Example 2: iDNES.cz
    • Scanning the homepage (https://www.idnes.cz/) results in “Secure”, “45 cookies”, “258 third-party requests” and “43 third-parties contacted”.
    • Scanning another website (https://praha.idnes.cz/praha-zpravy.aspx) results in “Secure”, “52 cookies”, “129 third-party requests” and “37 third-parties contacted”.
    • Interestingly, Webbkoll doesn’t show information concerning referrers on the top of the page. Another aspect is HSTS which is active when you are on the homepage and disabled when you visit the second website.
  • Example 3: The Office for Personal Data Protection
    • Scanning the Czech homepage (https://www.uoou.cz) results in “Secure”, “Referrers leaked”, “6 cookies”, “5 third-party requests” and “2 third-parties contacted”.
    • Scanning the English homepage (https://www.uoou.cz/en/) results in “Secure”, “Referrers leaked”, “3 cookies”, “0 third-party requests” and “0 third-parties contacted”.
    • This example shows that even different versions of the same website can make a difference when you scan them.

You can see that Webbkoll shows different results although we didn’t change the domain. Why are these observations important? Because normally you wouldn’t notice that there are new cookies set or additional third-party requests. It is possible that you only check the homepage of a domain, find nothing and all other pages are totally different.

External scanning is limited in general

One can argue that there are even more scanning tools like the Observatory by Mozilla, privacyscore.org etc. However, all scanning tools are somehow limited and don’t offer a holistic view of the security or privacy level of a web server. The problem is that most configuration files and most types of processing of personal data remain hidden for external scanning. Webbkoll and other tools won’t be able to access or evaluate these files or processes in future purely due to technical reasons. Our article “Pros and cons of online assessment tools for web server security” shows the most important pros and cons of these online assessment tools (there are more, of course).

Moreover, you have to keep in mind that privacy isn’t only about one single web server or even website. Imagine a company, where your personal data is printed and accessible by everyone including cleaning staff. Their website could still be fine …

Follow us on Mastodon:
@infosechandbook

Summary

Results of Webbkoll and other scanning tools are only one of many indicators to classify how privacy-friendly or secure websites are. On the one hand these tools don’t include functionality to assess all important aspects of a website, on the other hand there are technical limitations which require internal scanning. Webbkoll can tell you that everything is fine, while website operators collect your personal data using log files and JavaScript and sell it to third parties. Keep this in mind when you read that someone scanned a website using Webbkoll and it showed lots of zeros and green icons next time.

Sources

Changelog

  • May 23, 2018: Changed introduction due to feedback of Webbkoll developers/operators.
  • May 8, 2018: Added examples to show differences in Webbkoll when analyzing different websites of the same domain.

See also