Limits of Webbkoll

Limits of Webbkoll

Webbkoll is sometimes considered as the ultimate privacy tool: You only have to enter a domain name and click on “Check”. Then, you hopefully see three green check marks and two zeros. Everything is fine, isn’t it?—No, because Webbkoll (like other online scanning tools) only scans a limited subset of privacy- and security-relevant configuration.

In this article, we show you reasons why Webbkoll can’t be used as a primary source to classify how privacy-friendly a website is.

Contents

  1. What is Webbkoll?
  2. Webbkoll’s limits
  3. Examples of inconsistent scanning results
  4. External scanning is limited in general
  5. Summary
  6. Sources
  7. Changelog

Always stay in the loop!
Subscribe to our RSS/Atom feeds.

What is Webbkoll?

Webbkoll is part of dataskydd.net. “Dataskydd” is Swedish and means “data protection”, while “webbkoll” is a combination of “webb” and “koll” which basically means “web check”. Dataskydd.net is a Swedish NGO founded by former Pirate Party EU parliamentarian Amelia Andersdotter.

Webbkoll is a scanning tool. You can enter a domain name and Webbkoll visits the website like “normal” people do. Then, Webbkoll presents you information about HTTPS configuration, HSTS, CSP, Referrer Policy, SRI, localStorage and other security-relevant HTTP response headers. Moreover, it lists third-party requests and the approximate server location. Additionally, Webbkoll shows its users GDPR-relevant information.

Webbkoll’s limits

First of all, Webbkoll only scans exactly the URL you entered. Subdomains or even other websites of the same domain aren’t scanned. This means that the single website scanned by Webbkoll can be free of cookies, third-party content and so on, while subdomains or other websites actually send cookies, use third-party content etc. In order to get complete feedback, you must scan every website and all subdomains. Of course, this isn’t really possible when this website uses dynamic content or requires you to be logged in (this can’t be simulated by Webbkoll, too). We show several examples below.

Secondly, even a website which is rated totally privacy-friendly (lots of zeros and green) can use databases or other servers (e.g., mail servers) which aren’t accessed by your client but by the server which hosts the scanned website. For instance, a database which stores personal data of forum members can be easily accessible by criminals due to weak passwords while the website seems to be absolutely locked down. Webbkoll also doesn’t check if software used by the website operator (like CMS) is up-to-date or contains known security vulnerabilities.

Thirdly, Webbkoll can’t check whether personal data collected by the scanned website is lawfully collected. For instance, a form on a website can force you to enter your religion although there is no reason for the website operator to know this data. Furthermore, Webbkoll can’t check whether personal data already stored by the website operator is sufficiently protected against attackers and isn’t illegally processed. For example, website operators can analyze personal data or sell it to third parties while Webbkoll still says that everything is fine.

Fourthly, Webbkoll doesn’t check forms or privacy policies. Forms can contain hidden fields which you can’t see but their content is transmitted to the server. Besides, private website operators tend to include privacy policies only because they heard somewhere that they need one. These policies are often generated with the help of online tools and people simply adopt them. Webbkoll can’t check privacy policies at all—neither their contents nor their presence or absence.

Fifthly, Webbkoll doesn’t check for widely-known tracking techniques. For instance, there are no checks for JavaScript or CSS. Both can be used to clearly identify website visitors. Even PHP and CSS are sufficient for user tracking, however, since PHP is processed by the server and not by the client, external scans can’t evaluate this.

Lastly, servers can generate lots of log files. These files can contain your IP address (already personal data), your user agent and other information. External scans normally can’t access these files. They even can’t check whether website operators log your visits at all. Only using the log files, website operators can:

  • see your approximate location (for example, Bratislava in Slovakia)
  • see your client information (for example, you use Chrome 65 on Windows 10)
  • see how often you access their website (for example, mostly in the evening between 7 and 8 p.m. on workdays)
  • see how long you stay on their website (time between first and last access)
  • see your interests (for example, you only read articles about the GDPR)

Of course, website operators can easily combine this data with other data like your comments or personal data collected from you.

Examples of inconsistent scanning results

During testing we often found that different websites of the same domain show different results. We added the following three examples of well-known websites (at least for Czechs) to show you differences:

Example 1: Observatory by Mozilla

We scanned the following pages:

TopicResult for page 1aResult for page 1b
HTTPSYesNo; insecure
CSPGood policyGood policy
Referrer PolicyReferrers partially leakedReferrers partially leaked
Cookies00
Third-party requests8 requests to 4 unique hosts5 requests to 2 unique hosts
Server locationUSA – 34.206.189.101USA – 52.20.90.255

While page 1a embeds Google Fonts (fonts.googleapis.com), page 1b doesn’t.

Example 2: iDNES.cz

We scanned the following pages:

TopicResult for page 2aResult for page 2b
HTTPSYesYes
CSPNot implementedNot implemented
Referrer PolicyReferrers leakedReferrers leaked
Cookies23 first-party; 131 third-party14 first-party; 131 third-party
Third-party requests318 requests to 95 unique hosts371 requests to 96 unique hosts
Server locationCZ – 185.17.117.32CZ – 185.17.117.45

As of December 2018, page 2b sends the “content-security-policy-report-only” HTTP response header which isn’t recognized by Webbkoll. (“Content Security Policy (CSP) header not implemented.”) As in example 1, you silently navigate to another IP address if you go from page 2a to page 2b and the results of cookies and third-party requests differ.

Example 3: The Office for Personal Data Protection

We scanned the following pages:

TopicResult for page 3aResult for page 3b
HTTPSYesYes
CSPNot implementedNot implemented
Referrer PolicyReferrers leakedReferrers leaked
Cookies8 first-party; 2 third-party7 first-party; 0 third-party
Third-party requests12 requests to 6 unique hosts7 requests to 4 unique hosts
Server locationCZ – 80.95.253.230CZ – 80.95.253.230

This example shows that even different language versions of the same website can make a difference when you scan them.

You can see that Webbkoll shows different results although we didn’t change the domain. Why are these observations important? Because normally you wouldn’t notice that there are new cookies set or additional third-party requests. Image the following scenario: You likely only check the homepage of a website, find nothing while all other pages of the domain set cookies and embed third-party content.

External scanning is limited in general

One can argue that there are even more scanning tools like the Observatory by Mozilla, privacyscore.org etc. However, all scanning tools are somehow limited and don’t offer a holistic view of the security or privacy level of a web server. The problem is that most configuration files and most types of processing of personal data remain hidden for external scanning. Webbkoll and other tools won’t be able to access or evaluate these files or processes in future purely due to technical reasons. Our article “Pros and cons of online assessment tools for web server security” shows the most important pros and cons of these online assessment tools (there are more, of course).

Moreover, you have to keep in mind that privacy isn’t only about one single web server or even website. Imagine a company, where your personal data is printed and accessible by everyone including cleaning staff. Their website could still be fine …

Follow us on Mastodon:
@infosechandbook

Summary

Results of Webbkoll and other scanning tools are only one of many indicators to classify how privacy-friendly or secure websites are. On the one hand these tools don’t include functionality to assess all important aspects of a website, on the other hand there are technical limitations which require internal scanning. Webbkoll can tell you that everything is fine, while website operators collect your personal data using log files and JavaScript and sell it to third parties. Keep this in mind when you read that someone scanned a website using Webbkoll and it showed lots of zeros and green check marks next time.

Sources

Changelog

  • Dec 11, 2018: Updated this article including Webbkoll’s update from 2018-11-30.
  • May 23, 2018: Changed introduction due to feedback of Webbkoll developers/operators.
  • May 8, 2018: Added examples to show differences in Webbkoll when analyzing different websites of the same domain.

See also