Project

General

Profile

Actions

tickets #36862

closed

Re-enable sync of pontifex logs to langley

Added by jberry almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2018-06-06
Due date:
2018-07-04
% Done:

0%

Estimated time:

Description

The logs used to be synced to langley, but no longer are.

http://langley.suse.de/pub/pontifex3-opensuse.suse.de/download.opensuse.org/
http://langley.suse.de/pub/pontifex2-opensuse.suse.de/download.opensuse.org/

Can the sync be re-enabled/setup? This will simplify renewed efforts for analyzing the logs for metrics purposes.

It may also be a good chance to merge those directories on langley as it seems a bit odd that running pontifex2 will sync to pontifex3 (more recent logs).

Actions #1

Updated by jberry almost 6 years ago

Also a topic for discussion, adding a vhost on pontifex to serve the access logs for download.o.o so that metrics.o.o can easily access the daily copies ...once a day without need for ssh/rsync access to the machine.

Actions #2

Updated by jberry almost 6 years ago

Also need a way to (hopefully one time) move 12GB of data from suse.de network to heroes (metrics.o.o) machine. Is there a tunnel or somesuch available?

Actions #3

Updated by jberry almost 6 years ago

:(((( Well I started copying the files to my machine. After completion I'll upload them to metrics.o.o.

If no one says otherwise I'll proceed with setting up internal vhost for log access and look into parsing ipv6 logs as well.

Actions #4

Updated by pjessen almost 6 years ago

jberry wrote:

:(((( Well I started copying the files to my machine. After completion I'll upload them to metrics.o.o.

If no one says otherwise I'll proceed with setting up internal vhost for log access and look into parsing ipv6 logs as well.

I see no problem in a vhost for accessing the logs. If that works for you, go ahead.

Actions #5

Updated by jberry almost 6 years ago

Based on the output from ip a I plan to bind the vhost to the private IP which should restrict the outside world from using it. Works locally, but have not tried on pontifex. The second issue is read access for wwwrun to /var/logs/apache2. I could see changing group to www or other to r, but not sure if first will cause issues and latter is likely not ideal. Any thoughts on how to resolve permission issue?

Actions #6

Updated by jberry almost 6 years ago

Could also use a different port or IP based access rules, but the IP seems the most straight forward although it gives anyone in heroes network read access to the logs. If that is not alright I'll need to use an IP based access rule for metrics or somesuch.

From ip a:

4: private: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:23:09:80 brd ff:ff:ff:ff:ff:ff
    inet 192.168.47.73/24 brd 192.168.47.255 scope global private
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe23:980/64 scope link 
       valid_lft forever preferred_lft forever

My proposed vhost:

<VirtualHost 192.168.47.73:80>
  DocumentRoot "/var/log/apache2"
  ServerName logs.download.opensuse.org

  ErrorLog /dev/null
  CustomLog /dev/null common
</VirtualHost>

With an entry in /etc/hosts on metrics.o.o machine to set logs.download.opensuse.org to 192.168.47.73.

After which a chmod -R o+rX /var/log/apache2/ should do the trick if others do not have issue with this setup.

Actions #7

Updated by jberry almost 6 years ago

If there is a method for making the IP a variable or other fancy polish I would appreciate the insight into setup or assistance.

Actions #8

Updated by pjessen almost 6 years ago

a) using the private IP sounds good.
b) server name - maybe downloadlogs.infra.opensuse.org ? we probably don't want to add a subdomain onto download.o.o.
c) permissions - not sure what the right solution is.
d) ip address in vhosts - it's hardcoded.

Actions #9

Updated by lnussel almost 6 years ago

what about just an rsync module?

Actions #10

Updated by pjessen almost 6 years ago

  • Status changed from New to Feedback

lnussel wrote:

what about just an rsync module?

Yeah, I agree, would be a lot less hassle. Jimmy?

# cat /var/log/apache2/download.opensuse.org/OLD_LOGFILES
old logfiles are stored on langley.suse.de

# host langley.suse.de
Host langley.suse.de not found: 3(NXDOMAIN)

How were the log files transferred previoulsy - pushed out by rsync?

Actions #11

Updated by jberry almost 6 years ago

Apparently, you can type a comment while not logged in at which point your are redirected to login and comment lost...nice...typing for second time so likely lower quality.

what about just an rsync module?
Yeah, I agree, would be a lot less hassle. Jimmy?

Still has the same permission issue unless rsync daemon runs as root? I assume there is a way to dump a single file to stdout, but not default behavior. Metrics.o.o does not have the need nor space for copying full log files so the same approach used with langley makes sense to stream the files. Without seeing a benefit (as rsync would need to be setup apart from the other ones to be private?) seems just as easy to use http for both.

How were the log files transferred previoulsy - pushed out by rsync?

Ludwig indicated that this was done before the network split.

Actions #12

Updated by jberry almost 6 years ago

Tomorrow, I will go ahead and run the following:

  • rcapache2 reload (to ensure current config is actually loaded and not an issue, if it is hopefully easy to fix or others can help)
  • chmod o+rX /var/log/apache2/
  • chmod -R o+rX /var/log/apache2/download.opensuse.org/
  • chmod -R o+rX /var/log/apache2/ipv6.download.opensuse.org/
  • write apache vhost config to /etc/apache2/vhosts.d/downloadlogs.conf
    <VirtualHost 192.168.47.73:80>
      DocumentRoot /var/log/apache2
      ServerName downloadlogs.infra.opensuse.org

      <Directory /var/log/apache2>
        Options Indexes
        Require all granted
      </Directory>

      ErrorLog /dev/null
      CustomLog /dev/null common
    </VirtualHost>
  • rcapache2 reload

If someone else in European timezone would rather do this to have more folks around to help fix that would be great. I verified this setup works locally on my dev machine to access apache logs.

Actions #13

Updated by jberry almost 6 years ago

Can/Should the DNS entry, downloadlogs.infra.opensuse.org, be added to internal DNS or should I just add to hosts file on metrics.o.o?

Actions #15

Updated by pjessen almost 6 years ago

  • Private changed from Yes to No

jberry wrote:

  • rcapache2 reload (to ensure current config is actually loaded and not an issue, if it is hopefully easy to fix or others can help)
  • chmod o+rX /var/log/apache2/
  • chmod -R o+rX /var/log/apache2/download.opensuse.org/
  • chmod -R o+rX /var/log/apache2/ipv6.download.opensuse.org/

Jimmy, to be honest, I'm am a little uneasy with changing those permissions willy-nilly. I would much rather have an rsync solution, or perhaps run a daily extract for you. We could put the extracts in directories accessible to downloadlogs.i.o.o. It would be straight forward to keep e.g. logs from the last 7 days available separately ?

Actions #16

Updated by lnussel almost 6 years ago

pjessen wrote:

jberry wrote:

  • rcapache2 reload (to ensure current config is actually loaded and not an issue, if it is hopefully easy to fix or others can help)
  • chmod o+rX /var/log/apache2/
  • chmod -R o+rX /var/log/apache2/download.opensuse.org/
  • chmod -R o+rX /var/log/apache2/ipv6.download.opensuse.org/

Jimmy, to be honest, I'm am a little uneasy with changing those permissions willy-nilly. I would much rather have an rsync solution, or perhaps run a daily extract for you. We could put the extracts in directories accessible to downloadlogs.i.o.o. It would be straight forward to keep e.g. logs from the last 7 days available separately ?

We could make the directory group readable. Advantage with rsync would be that we can define the group there.

Actions #17

Updated by jberry almost 6 years ago

I'm am a little uneasy with changing those permissions willy-nilly.

Not sure I'd call it that, but sure. Everyone with ssh access to the machine presumably has sudo as there is no other reason to access the box. Effectively everyone already has read access. Making the files accessible to another machine without further lock down means that everyone in heroes network has access...which is more than sudoers on pontifex.

I would much rather have an rsync solution

What is the rsync solution? A module would have the same permission issue. A push solution would require extensive disk space on metrics.o.o for no real gain.

perhaps run a daily extract for you. We could put the extracts in directories accessible to downloadlogs.i.o.o. It would be straight forward to keep e.g. logs from the last 7 days available separately ?

Sure, if someone will implement it and maintain it. Given the state of this machine I was going for minimal surface area. Just have to make sure metrics.o.o doesn't have any extensive outages. :)

This is also similar to the original request in that it would sync them to another machine which is then 100% accessible.

Actions #18

Updated by jberry almost 6 years ago

We could make the directory group readable. Advantage with rsync would be that we can define the group there.

I imagine this would be the same as chown -R root:www ?

Actions #19

Updated by lnussel almost 6 years ago

Ok, so rsync is a bit tricky on the client side. We'd need a group anyways. /var/log/apache2 is in the package, so we have to overide this in /etc/permissions.local. Let's go for the vhost and use the www group then.

Actions #20

Updated by jberry almost 6 years ago

For whatever reason the vhost never seems to be served. I'd have to look at apache config further to have any idea why. Rsync still possible if sync to in-memory and then read. Holding off until tomorrow.

Actions #21

Updated by pjessen almost 6 years ago

I thought it was working from baloo with a wget:

# wget -S http://downloadlogs.infra.opensuse.org
--2018-06-19 15:44:52-- http://downloadlogs.infra.opensuse.org/
Resolving downloadlogs.infra.opensuse.org (downloadlogs.infra.opensuse.org)... 192.168.47.73
Connecting to downloadlogs.infra.opensuse.org (downloadlogs.infra.opensuse.org)|192.168.47.73|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
^

Actions #22

Updated by pjessen almost 6 years ago

nginx has a 002-download-private.conf which listens on 192.168.47.73:80. It also aliases /logs to /var/log/apache2/. To be continued.

Actions #23

Updated by pjessen almost 6 years ago

# wget -S http://pontifex.infra.opensuse.org/logs/download.opensuse.org/2018/06/download.opensuse.org-20180618-access_log.xz
--2018-06-19 15:58:53-- http://pontifex.infra.opensuse.org/logs/download.opensuse.org/2018/06/download.opensuse.org-20180618-access_log.xz
Resolving pontifex.infra.opensuse.org (pontifex.infra.opensuse.org)... 192.168.47.73
Connecting to pontifex.infra.opensuse.org (pontifex.infra.opensuse.org)|192.168.47.73|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 403 Forbidden
Server: nginx
Date: Tue, 19 Jun 2018 15:58:53 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
2018-06-19 15:58:53 ERROR 403: Forbidden.

I don't know why I'm getting a 403.

Actions #24

Updated by pjessen almost 6 years ago

That would be because nginx is running as nginx:nginx. It clearly had access to /var/log/apache2 at some point, but not with this config.

Actions #25

Updated by jberry almost 6 years ago

Resolved via:

sudo setfacl -m g:nginx:rx /var/log/apache2

I will point ingest code at http://pontifex.infra.opensuse.org/logs as base and hopefully report back everything is working

Actions #26

Updated by jberry almost 6 years ago

Seems to be working, ingest is running, but metrics.o.o only has two cores so it will take a bit for the last 6 months.

Actions #27

Updated by jberry almost 6 years ago

Resolved minor issues as there are some fun invalid data and log format changes in most recent stuff. Assuming there is no need to move the logs for long-term storage I guess we can mark this complete?

Actions #28

Updated by jberry almost 6 years ago

Otherwise, perhaps it makes more sense to archive the data on another machine/disk in heroes network?

Actions #29

Updated by lnussel almost 6 years ago

  • Due date set to 2018-07-04
  • Status changed from Feedback to New

ok, so the problem to get the logs to metrics is solved. There are two things left to address

  • archiving the logs so they don't stay on download.o.o forever and take space there
  • restrict access to the /log route to the hosts that actually need it. The logs shouldn't be accessible to all machines in the heroes network.
Actions #30

Updated by pjessen almost 6 years ago

  • Status changed from New to Feedback
  • Priority changed from High to Normal

lnussel wrote:

ok, so the problem to get the logs to metrics is solved. There are two things left to address

  • archiving the logs so they don't stay on download.o.o forever and take space there

What was the previous solution to that? download.o.o has 100Gb for logs at the moment, but it is filling up.

  • restrict access to the /log route to the hosts that actually need it. The logs shouldn't be accessible to all machines in the heroes network.

According to the nginx config, it was not restricted before. Is it really a problem leaving the logs accessible to all?

Actions #31

Updated by jberry almost 6 years ago

What was the previous solution to that? download.o.o has 100Gb for logs at the moment, but it is filling up.

The previous solution was langley.suse.de, but as discussed that is troublesome to go between networks. My suggestion is a separate volume/machine for archival data within heroes network.

Actions #32

Updated by pjessen almost 6 years ago

Okay, I didn't know langley was being used as an archive.

Well, for apache, we produce about 11Gb compressed logs per month. (approx 9Gb for IPv4, 1-2Gb for IPv6). I don't know why we are logging them separately.
nginx currently takes up 22Gb.

Actions #33

Updated by lnussel almost 6 years ago

pjessen wrote:

lnussel wrote:

[...]

  • restrict access to the /log route to the hosts that actually need it. The logs shouldn't be accessible to all machines in the heroes network.

According to the nginx config, it was not restricted before. Is it really a problem leaving the logs accessible to all?

I am neither responsible for the current nor the previous setup. I do not know if or how the previous network setup limited access to the logs outside of nginx itself. In the current setup access is too lax for my taste.

Actions #34

Updated by jberry over 5 years ago

  • Status changed from Feedback to Resolved

Using previous setup for access and deal with archive being in separate network.

Actions

Also available in: Atom PDF