Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special rule: www.example.org & m.example.org redirects to example.org #185

Closed
1 task done
spirillen opened this issue Jan 10, 2021 · 18 comments
Closed
1 task done
Assignees

Comments

@spirillen
Copy link
Contributor

spirillen commented Jan 10, 2021

Is your feature request related to a problem? Please describe.
There are no such thing as ^(www|m)\..*\.tumblr\.com$

  • Check response status for these cases, in the past they responded 200

Describe the solution you'd like
Wee should append a 302 rule

curl -I 'http://www.sensual-kiss.tumblr.com' 'http://m.sensual-kiss.tumblr.com'
HTTP/1.1 302 Found
Server: openresty
Date: Sun, 10 Jan 2021 17:52:39 GMT
Content-Type: text/html; charset=UTF-8
X-Rid: 853406b998b4af2af248c41f442e2565
P3p: CP="Tumblr's privacy policy is available here: https://www.tumblr.com/policy/en/privacy"
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=15552001
Location: https://sensual-kiss.tumblr.com/#_=_
X-UA-Compatible: IE=Edge,chrome=1
X-Cache: MISS from firewall.matrix.lan
X-Cache-Lookup: MISS from firewall.matrix.lan:3128
Via: 1.1 firewall.matrix.lan (squid)
Connection: keep-alive

HTTP/1.1 302 Found
Server: openresty
Date: Sun, 10 Jan 2021 17:52:40 GMT
Content-Type: text/html; charset=UTF-8
X-Rid: 406103b229eb27730826e4000e1c2063
P3p: CP="Tumblr's privacy policy is available here: https://www.tumblr.com/policy/en/privacy"
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=15552001
Location: https://sensual-kiss.tumblr.com/#_=_
X-UA-Compatible: IE=Edge,chrome=1
X-Cache: MISS from firewall.matrix.lan
X-Cache-Lookup: MISS from firewall.matrix.lan:3128
Via: 1.1 firewall.matrix.lan (squid)
Connection: keep-alive

Describe alternatives you've considered
Even better would be

if [ dest == `^(www|m)\..*\.tumblr\.com($|\/.*)` ]
then
    return INVALID
fi

Additional context
Add any other context or screenshots about the feature request here.

@spirillen spirillen changed the title Special rule tumblr.com (Remeber (DRAFT)) Special rule tumblr.com Jan 10, 2021
@funilrys funilrys self-assigned this Jan 16, 2021
@funilrys funilrys added this to the __future__ milestone Jan 16, 2021
@funilrys
Copy link
Owner

This one is odd ... Who does that ?

@spirillen
Copy link
Contributor Author

user who don't know better: Clefspeare13/pornhosts#60

Have seen that other places as well actually, that why I suggested it in a global "scaled", other times I simply suspect some are using a script to completely headless append the m. and www. just to make there lists grow

@funilrys
Copy link
Owner

I was thinking about this, and I'm not sure if it's really in the scope of the SPECIAL rule ...

When I created the SPECIAL rule, it was really just to take things UP and DOWN if things are really away or back. It's an extra layer of test.

302 Found is not something I considered as criteria for taking something DOWN ...

What do you think of that ?

@spirillen
Copy link
Contributor Author

spirillen commented Oct 10, 2021

I sometimes think a HTTP code 302 is down, most cases actually, unless it is part of the HSTS (HTTP Strict Transport Security) as the specified target obviously is moved.

Then the HUGE exception.... redirecting spyware like t.co bit.ly etc they are all redirecting (Didn't check there response code at they are blocked here)

That's why I suggested this as a special rule, check for a forth level domain and if there is mark it INVALID, in that way we cancircumwent the 302 question and we can't use the --complements as that is purely for the www or not www

On the other hand if it is a bigger work... and then again, when remembering the exact domain, I've seen the same "rule" could be applied elsewhere.

IF domain-level >= 4
then
    rule is INVALID
fi

The question might then become, is this a module we would like to be able to make special rules based on domain level?

@spirillen
Copy link
Contributor Author

NB: as reply to the 302 specific question. 302 + 308 clearly says, don't come back here, there is nothing to see,

image

you need to go to xyz to see anything while 301+307 is temporary moved

@funilrys
Copy link
Owner

I understand, but this will have some consequences. It's actually not INVALID ... It actually redirects to the right domain ... At least that is what the Location header is saying.

The browser can't follow it for some obscure reason but it is actually working as-it-should:

$ curl -IL 'http://www.sensual-kiss.tumblr.com' 
HTTP/1.1 302 Found
Server: openresty
Date: Sun, 10 Oct 2021 10:35:37 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Rid: 0ea88a24be0398a789080c4690f3d87a
P3p: CP="Tumblr's privacy policy is available here: https://www.tumblr.com/policy/en/privacy"
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=15552001
Location: https://sensual-kiss.tumblr.com/#_=_
X-UA-Compatible: IE=Edge,chrome=1

HTTP/2 200 
server: openresty
date: Sun, 10 Oct 2021 10:35:37 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
vary: Accept-Encoding
x-rid: f0347074f015230059675a339d117709
p3p: CP="Tumblr's privacy policy is available here: https://www.tumblr.com/policy/en/privacy"
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
strict-transport-security: max-age=15552001
x-tumblr-user: sensual-kiss
x-tumblr-pixel-0: https://px.srvcs.tumblr.com/impixu?T=1633862137&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDovL3NlbnN1YWwta2lzcy50dW1ibHIuY29tLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiLyJ9&U=KHHNEPCHIJ&K=3c17a9ae01752bbd52f5c333effe64d0e0ba0b7996b712c6147438227d16a98b--https://px.srvcs.tumblr.com/impixu?T=1633862137&J=eyJ0eXBlIjoicG9zdCIsInVybCI6Imh0dHA6Ly9zZW5zdWFsLWtpc3MudHVtYmxyLmNvbS8iLCJyZXF0eXBlIjowLCJyb3V0ZSI6Ii8iLCJwb3N0cyI6W3sicG9zdGlkIjoiNjUyNTk5NDY3ODg4MDE3NDA4IiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9LHsi
x-tumblr-pixel-1: cG9zdGlkIjoiNjQ2MTgzNjQ2MTE2NjkxOTY4IiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9LHsicG9zdGlkIjoiNjQ1NTExODY3MzI1OTIzMzI5IiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9LHsicG9zdGlkIjoiNjQ0NTIwNzgzMzk2MzcyNDgwIiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9XX0=&U=JNBPKHMDFF&K=7bdd23e4b63545c561b864f08fb2ef49cc3394bc9338b16d83272730f79d06e6
x-tumblr-pixel: 2
link: <https://64.media.tumblr.com/c734fc3e754e30ec2711f1e34829e448/e35d615ef95041c4-89/s128x128u_c1/5eeae975e3ba6d53334dca994719fbc8a57d7537.png>; rel=icon
x-ua-compatible: IE=Edge,chrome=1

This is another level of SPECIAL rule ...

@spirillen
Copy link
Contributor Author

This is another level of SPECIAL rule ...

It is, and you should be considering if it is worth the effort or we might end up in a rule management hell that's better addressed with other scripts/programs

It's actually not INVALID ... It actually redirects to the right domain ... At least that is what the Location header is saying.

True, my outcome should have been INACTIVE. From the view of both maintaining a source + generating the output of those extensible huge hosts files would benefit from the removals of 302+308 while either obtaining or keeping the LOCATION in there source's

This actually open a hole new situation, debate about how to handle redirects, We have touched the topic in the past, maybe it's time to make a new issue/talk on the subject.

The browser can't follow it for some obscure reason

That is because the SSL do not cover fourth level domains, so your are redirected to an insecure zone, where the browser are stopping the site handling with a warning.

@spirillen
Copy link
Contributor Author

This one is odd ... Who does that ?

Let me take a very fresh examlpe...

I duplicated the previous list twice, once adding www. subdomains, and once adding cdn.; resulting in two new lists of the formats: www.websitename.abc and cdn.websitename.abc.
Source: StevenBlack/hosts#1671 (comment) (§2)

@funilrys funilrys reopened this Feb 5, 2022
@funilrys
Copy link
Owner

Note to self:
The idea is not bad. We should implement this. But subjects should be switched as INACTIVE not INVALID.

@funilrys funilrys removed this from the __future__ milestone Oct 25, 2022
@funilrys funilrys moved this to 📋 Backlog in PyFunceble Backlog Oct 25, 2022
@funilrys funilrys changed the title Special rule tumblr.com Special rule: www.example.org & m.example.org redirects to example.org Oct 25, 2022
@funilrys
Copy link
Owner

funilrys commented Oct 25, 2022

Side notes on the implementation - itself:

  1. Follow all redirects.
  2. Compare the start domain with the end-domain and switch status accordingly.
    Example:
    • m.example.com -> example.com | Outcome: m.example.com as INACTIVE.
    • m.example.com -> example.org | Outcome: NO Status switch.
    • m.example.com -> a.example.com -> example.com | Outcome: m.example.com as INACTIVE.

This should only apply if the status code is in one of the 3XY.

@funilrys
Copy link
Owner

funilrys commented Oct 25, 2022

Side notes on the implementation - itself - when URLs are tested:

  1. Follow all redirects.
  2. Compare the start domain with the end-domain and switch status accordingly.
    Example:
    • m.example.com/hello/world -> example.com/hello/world | Outcome: m.example.com/hello/world as INACTIVE.
    • m.example.com/hello/world -> example.com/world/hello | Outcome: NO Status switch.
    • m.example.com/hello/world -> example.org/hello/world | Outcome: NO Status switch.
    • m.example.com/hello/world -> a.example.com/world/hello -> example.com/hello/world | Outcome: m.example.com/hello/world as INACTIVE.

@spirillen
Copy link
Contributor Author

To continue https://matrix.to/#/!frMIeLrTTlrGiRMLBM:matrix.org/$gHxSAP8rlCIFF40wKybFK6VceyKc7NPmjj_RxLN7kFQ?via=matrix.org&via=anontier.nl

This is a special rule, but should be a global one as it is following the requests to the final destination, all "middlemen" is marked as potential dead

#185 (comment)
^(www|m)\..*\.tumblr\.com$

We will remove any useless (m.|www.|www.).domain.ccTLD and only leave potential ACTIVE records in our ACTIVE/list

You can call this --complements on steroids as it removes any middlemen from the finished result ACTIVE/list

IF domain-level >= 4
then
    rule is INVALID
fi

The question might then become, is this a module we would like to be able to make special rules based on domain level?

The rewrite for this would be:

IF the domain is in file some internal db file of domains then we do know; that any records with ^(www|m)\..*\.domain\.ccTLD$ are INVALID, we strip the prefixes and test those records that is left.

Example of such regex compliant file could be

tumblr.com = ^(www|m)\..*\. | !^([0-9a-z]{0-255}[.])?
bit.ly = !^bit.ly

#185 (comment)

It's actually not INVALID ... It actually redirects to the right domain ... At least that is what the Location header is saying.

Yes, but you have no use of the record in any output list as it is redirecting, you would need the destination as it would help keeping the final lists as small and accurate as possible.

#185 (comment)

The idea is not bad. We should implement this. But subjects should be switched as INACTIVE not INVALID.

That would depend on the domain.... for bit.ly and tumblr.com INVALID is the correct results while other redirecting devils might be, by default, INACTIIVE

@funilrys
Copy link
Owner

IF the domain is in file some internal db file of domains then we do know; that any records with ^(www|m)..*.domain.ccTLD$ are INVALID, we strip the prefixes and test those records that is left.

That is actually another improvement for the mining mechanism...

Here we are only talking about subjects that redirect to their 2ndLD. Example m.example.org -> example.org and www.example.org -> example.org . And this SPECIAL ruler will only be triggered if the given subject starts with www. or m..

All URL shorteners are never triggered by this feature because the tested domain won't match the expected domain.

For example:

  • bit.ly/xyz -> example.org/hello/world --> Never trigger.
  • m.bit.ly/hello -> bit.ly/hello -> example.org/hello --> Nothing change.
  • www.bit.ly/hello -> bit.ly/hello -> example.com/hello --> Nothing change.
  • www.bit.ly/ -> bit.ly/ --> Trigger SPECIAL rule. www.bit.ly will be dropped as INACTIVE.

Also note: The path will be compared. If it doesn't match, nothing changes.

@funilrys
Copy link
Owner

There is a drawback with flaging a subject as INVALID ... A lot of users just drop and definitely delete INVALID and leave PyFunceble to retest all INACTIVE ... That's also something we have to keep in mind ...

We are only a few people in the issues section but we are a lot more users than we think 😰...

@spirillen
Copy link
Contributor Author

spirillen commented Oct 26, 2022

NOTE:

Stumbled on this special domain case

  1. www.subdomain.skyblog.com = NOT supported (https://mypdns.org/my-privacy-dns/matrix/-/issues/980951)
  2. subdomain.skyblog.com = IS supported
  3. *.skyblog.com = Redirects straight to skyrock.com (https://mypdns.org/my-privacy-dns/matrix/-/issues/980951) = we now know skyblog.com is invalid

💭 🤔 maybe a new result list? that could also help on your comment in #185 (comment) about the INVALID as they defacto are invalid cases and should be attended by list owner?

@spirillen
Copy link
Contributor Author

spirillen commented Nov 22, 2022

UPDATE: About the special rule for tumblr, then they have made a change for which I have NOT investigated, ONLY observed

teen-make-selfies.tumblr.com
thesweetelite.tumblr.com

This url is empty and redirects to the default homepage, I have found about 30 of these today and they was marked active against expectation.

Any change you (@funilrys ) could spend a few minutes on this?

note to self (@spirillen)

tumblr.com: https://mypdns.org/my-privacy-dns/matrix/-/issues/1774

Repository owner moved this from 📋 Backlog to ✅ Done in PyFunceble Backlog Nov 26, 2022
@funilrys
Copy link
Owner

@spirillen they actually don't redirect to the home page per-say. It's all javascript. Therefore, the rule should be about the 404 status code.

funilrys added a commit that referenced this issue May 29, 2023
Fixed:
  * Security / Dependency Management: cryptography
    Mitigation of CVE-2023-0286 & CVE-2023-23931 through
    version bump.
  * Fatal Error: When no nameservers are configured or provided by the
    hosting system. (#328)
  * Semantic: git.io (#341)
    URLs with git.io were replaced with other one.
  * New linting issues.

Improved:
  * SPECIAL Rules: weebly.com
    We now take down subdomains that return the 406 status code.
  * SPECIAL Rules: wordpress.com (#321)
    We now recognize subjects that were took down by Wordpress.
  * SPECIAL Rules: internal
    Uniformiization of the method for better and quicker development.
  * Dependency Management: sqlalchemy
    We upgraded to SQLAlchemy v2.x+.
  * Converters: internal
    Convertion can now be performed directely without initialization of
    subjects through the convet method.

Removed:
  * Python Support: <=3.7
    We do not test or support any usage of PyFunceble with python<=3.7.

New:
  * Python Support: ~=3.11
    We now test (CI/CD) and support python~=3.11.
  * Testing: pytest (#328)
    pytest can now be used by packager to tests pyfunceble before
    deployment.
  * Database: PostgreSQL
    We now support PostgreSQL as database type.
  * Filesystem: IPs as first-class citizens in plain text outputs (#268)
    From now on, IPs will be stored into the `ips` subdirectory when the
    plain text format is active.
  * SPECIAL Rules: subject-switch (#185 | #185#issuecomment-1290866362)
    We now support the subject switch from any domains.
    For example:

    - m.example.com -> example.com
      Outcome: m.example.com as INACTIVE
    - m.example.com -> example.org
      Outcome: NO status switch.
    - m.example.com -> a.example.com -> example.com
      Outcome: m.example.com as INACTIVE.
    - m.example.com/hello/world -> example.com/hello/world
      Outcome: m.example.com/hello/world as INACTIVE.
  * SPECIAL Rules: changeip (#311)
    When one of the known changeip domains provides `abuse.change.com`
    in the SOA record, the subject will be flagged as INACTIVE.
  * SPECIAL Rules: imgur.com (#319)
    We now flag removed images.
  * SPECIAL Rules: eToxic (#334)
    When a blog from the eToxic infrastructure (known domains) doesn't
    exists anymore, we flag them as INACTIVE.

Contributors:
  * @Nilsonfsilva
  * @smed79
  * @spirilln
  * @T145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants