The Blog

I am a computer engineer who sometimes blog, and this is my space. Welcome :)

Download as .zip Download as .tar.gz View on GitHub

GuRuTuX

How MySQL Replication Works?

MySQL Binary Log Replication.

MySQL binary Log replication is one the most used feature in MySQL flavoured Databases as it is the most simple way to replicate data changes across several MySQL nodes. And because it is an Asynchronous replication it has low to no impact on the Master node(s).

Binary Log

What is binary log?

The binary log is a set of log files that contain information about data modifications made to a MySQL server instance. It contains all statements that update data. It also contains statements that potentially could have updated it (for example, a DELETE which matched no rows), unless row-based logging is used. Statements are stored in the form of "events" that describe the modifications. The binary log also contains information about how long each statement took that updated data.

What is the Purpose of the Binary log?

  • The binary log has two important purposes:
  1. For replication, the binary log is used on master replication servers as a record of the statements to be sent to slave servers. Many details of binary log format and handling are specific to this purpose. The master server sends the events contained in its binary log to its slaves, which execute those events to make the same data changes that were made on the master. A slave stores events received from the master in its relay log until they can be executed. The relay log has the same format as the binary log.
  2. Certain data recovery operations require use of the binary log. After a backup file has been restored, the events in the binary log that were recorded after the backup was made are re-executed. These events bring databases up to date from the point of the backup.

Types of binary logging:

  • There are two types of binary logging:
  1. Statement-based logging: Events contain SQL statements that produce data changes (inserts, updates, deletes).
  2. Row-based logging: Events describe changes to individual rows.
  3. Mixed logging uses statement-based logging by default but switches to row-based logging automatically as necessary.

Is there a tool to explore Binary logging?

mysqlbinlog is a utility that can be used to print binary or relay log contents in readable form.

Replication

The MySQL replication feature allows a server - the master - to send all changes to another server - the slave - and the slave tries to apply all changes to keep up-to-date with the master. Replication works as follows:

  • Whenever the master's database is modified, the change is written to a file (binary log, or binlog). This is done by the client thread that executed the query that modified the database.
  • Binary log dump thread. The source creates a thread to send the binary log contents to a replica when the replica connects. The binary log dump thread acquires a lock on the source's binary log for reading each event that is to be sent to the replica. As soon as the event has been read, the lock is released, even before the event is sent to the replica.
  • Replication I/O receiver thread. When a START REPLICA statement is issued on a replica server, the replica creates an I/O (receiver) thread, which connects to the source and asks it to send the updates recorded in its binary logs. The replication receiver thread reads the updates that the source's Binlog Dump thread sends (see previous item) and copies them to local files that comprise the replica's relay log.
  • Replication SQL applier thread. The replica creates an SQL (applier) thread to read the relay log that is written by the replication receiver thread and execute the transactions contained in it.

MySQL is a lying

Something that you need to know that MySQL is a big lier because it always lies about the Slave lag behind the Master. The logical definition of lag is the amount of time needed for the slave to be able to reach the data state of the master, however this is not what MySQL Slave is reporting. The Thread that reports the second behind master (lag) is the SQL thread of the slave itself. and it calculates it by getting the difference between the current machine time and the timestamp of the last executed log from the relay log, and this don't reflect the actual lag by any means. This is because the single threaded nature of the binary log threads, and that there is no calculations occurs on the relay log before executing it.

Files in Linux

What is a file in Linux ? A file is a collection of data blocks and has an inode number which holds metadata about this file.

What is the inodes? An inode is a data structure that is pre-alocated during the filesystem creation and contains specific metadata about a file like: File type. Permissions. User ID (Owner). Group ID (Owner Group). logical file size. last access timestamp. last modification timestamp. last inode number change timestamp. File deletion time. Number of hard links. pointers for the data blocks.

Maybe your next question will be: where is the file name? The file name exists in the data block of the parent directory pointing to the inode number that has the metadata of this file.

The files structure in Linux was built to unify the operations of the files ignoring the fact that these files can be located on different filesystems, as in linux all the files on any filesystem should be treated the same because Linux uses VFS “Virtual Filesystem Switch” to access all the filesystems types transparently without the client application noticing the difference. VFS can be used to bridge the differences in Windows, classic Mac OS/macOS and Unix filesystems, so that applications can access files on local filesystems of those types without having to know what type of filesystem they are accessing.

For more information about the VFS check this link: https://www.ibm.com/developerworks/library/l-virtual-filesystem-switch/index.html

Lets talk about the file size a bit. There are several ways to check the file size like below:

# root @ ub05ada39a41857ef4a39
$ ls -lh zeros
-rw-r–r– 1 root root 1 Sep 30 17:02 zeros

root @ ub05ada39a41857ef4a39

$ stat zeros File: ‘zeros’ Size: 1 Blocks: 8 IO Block: 4096 regular file Device: fc01h/64513d Inode: 7866457 Links: 1 Access: (0644/-rw-r–r–) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2017-09-30 16:51:53.746819123 +0200 Modify: 2017-09-30 17:02:53.407742697 +0200 Change: 2017-09-30 17:02:53.407742697 +0200 Birth: –

root @ ub05ada39a41857ef4a39

$ du -sh zeros 4.0K zeros

You can notice the difference between the same file size if you use the “du” command vs the “ls” and “state” command. “du” command will show you the actual size of the file that is allocated on the physical storage which shows 4.0k. unlike the “du” and state command which shows the logical size of the file, so how does this work?

Any filesystem that doesn’t support “Variable block sizes” will never be able to allocate space less than the default block size, as each file points to a single inode, which will point to the blocks that contains the data of the file. a file’s inode can point to zero blocks if the file never had data. For example:

# root @ ub05ada39a41857ef4a39
$ touch noblocks

root @ ub05ada39a41857ef4a39

$ ls -la noblocks -rw-r–r– 1 root root 0 Sep 30 17:35 noblocks

root @ ub05ada39a41857ef4a39

$ stat noblocks File: ‘noblocks’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: fc01h/64513d Inode: 7866459 Links: 1 Access: (0644/-rw-r–r–) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2017-09-30 17:35:57.023274029 +0200 Modify: 2017-09-30 17:35:57.023274029 +0200 Change: 2017-09-30 17:35:57.023274029 +0200 Birth: –

root @ ub05ada39a41857ef4a39

$ du -sh noblocks 0 noblocks

This file “noblocks” has been created, but never had data in it, thus the inode didn’t point to any block to save data, however if this file contained a single byte then the inode will point to one block which will allocate the default block size of the filesystem. The block size of any file system can be known by the below command: blockdev –getbsz /dev/partition

# blockdev –getbsz /dev/sdb1
4096

A single block can’t store data of more than 1 file, as the inodes will not able to identify the length of the data that belong to each file inside the block, and expect that the data of this block belongs to a single file only.

Ip Tables Concepts

I will try -As much as I can- to explain IP-Tables Concepts in a simple way.

What is IP-Tables ?

(My Definition) The IP-Tables is a peace of software to filter the network transition (Packets).

(Wikipedia’s Definition) iptables is a user space application program that allows a system administrator to configure the tables provided by the Linux kernel firewall (implemented as different Netfilter modules) and the chains and rules it stores.

The Contents of IP-Tables:

Name Description
Tables The Table is: A set of chains which designed to do a specific function
Chains The Chain is: A set of rules that are applied on packets that traverses the chain. Every Chain have a specified purpose which you will know while we are talking.
Rules A rule is a set of a condition or several conditions together with a single action. the action of a rule will be applied if all the conditions of that rule have been achieved.

What is NAT – Network Address Translation ?

NAT allows a host or several hosts to share the same IP address in a way.

HOW ?

let’s say we have a local network consisting of 5-10 clients. We set their default gateways to point through the NAT server. Normally the packet would simply be forwarded by the gateway machine, but in the case of an NAT server it is a little bit different.

NAT servers translates the source and destination addresses of packets as we already said to different addresses. The NAT server receives the packet, rewrites the source and/or destination address and then recalculates the checksum of the packet. One of the most common usages of NAT is the SNAT (Source Network Address Translation) function. Basically, this is used in the above example if we can’t afford or see any real idea in having a real public IP for each and every one of the clients. In that case, we use one of the private IP ranges for our local network (for example, 192.168.1.0/24), and then we turn on SNAT for our local network. SNAT will then turn all 192.168.1.0 addresses into it’s own public IP (for example, 217.115.95.34). This way, there will be 5-10 clients or many many more using the same shared IP address.

There is also something called DNAT, which can be extremely helpful when it comes to setting up servers etc. First of all, you can help the greater good when it comes to saving IP space, second, you can get an more or less totally impenetrable firewall in between your server and the real server in an easy fashion, or simply share an IP for several servers that are separated into several physically different servers. For example, we may run a small company server farm containing a webserver and ftp server on the same machine, while there is a physically separated machine containing a couple of different chat services that the employees working from home or on the road can use to keep in touch with the employees that are on-site. We may then run all of these services on the same IP from the outside via DNAT.

In Linux, there are actually two separate types of NAT that can be used, either Fast-NAT or Netfilter-NAT. Fast-NAT is implemented inside the IP routing code of the Linux kernel, while Netfilter-NAT is also implemented in the Linux kernel, but inside the netfilter code. Since this article won’t touch the IP routing code too closely, we will pretty much leave it here, except for a few notes. Fast-NAT is generally called by this name since it is much faster than the netfilter NAT code. It doesn’t keep track of connections, and this is both its main pro and con. Connection tracking takes a lot of processor power, and hence it is slower, which is one of the main reasons that the Fast-NAT is faster than Netfilter-NAT. As we also said, the bad thing about Fast-NAT doesn’t track connections, which means it will not be able to do SNAT very well for whole networks, neither will it be able to NAT complex protocols such as FTP, IRC and other protocols that Netfilter-NAT is able to handle very well. It is possible, but it will take much, much more work than would be expected from the Netfilter implementation.

There is also a final word that is basically a synonym to SNAT, which is the Masquerade word. In Netfilter, masquerade is pretty much the same as SNAT with the exception that masquerading will automatically set the new source IP to the default IP address of the outgoing network interface.

IP-Tables Chains:

Chain Explanation
PREROUTING Packets will enter this chain before a routing decision is made.
INPUT Packet is going to be locally delivered. (N.B.: It does not have anything to do with processes having a socket open. Local delivery is controlled by the “local-delivery” routing table: ip route show table local.)
FORWARD All packets that have been routed and were not for local delivery will traverse this chain.“”:
OUTPUT Packets sent from the machine itself will be visiting this chain.
POSTROUTING Routing decision has been made. Packets enter this chain just before handing them off to the hardware.

IP-Tables Tables:

Table Explanation
NAT The NAT table is used mainly for Network Address Translation. “NAT”ed packets get their IP addresses altered, according to our rules. Packets in a stream only traverse this table once. We assume that the first packet of a stream is allowed. The rest of the packets in the same stream are automatically “NAT”ed or Masqueraded etc, and will be subject to the same actions as the first packet. These will, in other words, not go through this table again, but will nevertheless be treated like the first packet in the stream. This is the main reason why you should not do any filtering in this table, which we will discuss at greater length further on. The PREROUTING chain is used to alter packets as soon as they get in to the firewall. The OUTPUT chain is used for altering locally generated packets (i.e., on the firewall) before they get to the routing decision. Finally we have the POSTROUTING chain which is used to alter packets just as they are about to leave the firewall.
MANGLE This table is used mainly for mangling packets. Among other things, we can change the contents of different packets and that of their headers. Examples of this would be to change the TTL,TOS or MARK. Note that the MARK is not really a change to the packet, but a mark value for the packet is set in kernel space. Other rules or programs might use this mark further along in the firewall to filter or do advanced routing on; tc is one example. The table consists of five built in chains, the PREROUTING, POSTROUTING, OUTPUT, INPUT and FORWARD chains.PREROUTING is used for altering packets just as they enter the firewall and before they hit the routing decision. POSTROUTING is used to mangle packets just after all routing decisions have been made. OUTPUT is used for altering locally generated packets after they enter the routing decision. INPUT is used to alter packets after they have been routed to the local computer itself, but before the user space application actually sees the data. FORWARD is used to mangle packets after they have hit the first routing decision, but before they actually hit the last routing decision. Note that mangle can’t be used for any kind of Network Address Translation orMasquerading, the nat table was made for these kinds of operations.
FILTER The filter table should be used exclusively for filtering packets. For example, we could DROP,LOG, ACCEPT or REJECT packets without problems, as we can in the other tables. There are three chains built in to this table. The first one is named FORWARD and is used on all non-locally generated packets that are not destined for our local host (the firewall, in other words). INPUT is used on all packets that are destined for our local host (the firewall) and OUTPUT is finally used for all locally generated packets.
RAW Iptable’s Raw table is for configuration excemptions. Raw table has the following built-in chains.

SMTP-Gated

What is SMTP-Gated ?

It is a server which have the ability to Scan, Recognize, and  Block Mails that Containing Spam or Viruses.

How it works ?

It acts like proxy, intercepting outgoing SMTP connections and scanning session data on-the-fly. When messages is infected, the SMTP session is terminated.

Features:

- Transparency – is meant to be totally transparent for users, but stone-build for worms 😉
- Message data is intercepted on-the-fly, and scanned just before acknowledged to SMTP server
- Does not break AUTH, PIPELINING or STARTTLS (TLS without scanning)
- Can block messages if AUTH is not used (optionally passing if AUTH is not supported by MSA)
- Can insert source IP (pre-NAT) and ident* into message header
- Can block any mail from infected hosts for defined time
- Logging of MAIL FROM and RCPT TO (plain or as base64-ed MD5)
- Logging of HELO/EHLO hostname
- Can impose some limits on number of SMTP sessions: total, per IP, per ident*
- Can reject connections when load exceeds some limit
- Can skip spam-scanning if load is high
- Executing user script on certain events
- Scanning limited to messages up to configured size
- Can be used to build scanning-farm for one or more routers*
- Logs all connections via syslog
- Has nifty status screen 😉
- Message size limit (since 1.4.16-rc1)
- Outgoing XCLIENT support (since 1.4.16-rc1)
- Conditional content scanning depending on SMTP-AUTH status (since 1.4.16-rc1)
- Regular expression (regex) conditions for HELO/MAIL FROM/RCPT TO (since 1.4.16-rc1)
- SPF checking (since 1.4.16-rc1)

Supports:

### Content scanning:
    - Clam AntiVirus daemon (clamd)
    - mksd – daemonised version of mks_vir
    - SpamAssassin antispam scanning
### Access checking:
    - libpcre for HELO/MAIL FROM/RCPT TO regular expressions (not-)match
    - libspf2 for SPF (tested with debian libspf2 1.2.1)
### Uses various NAT frameworks (for standalone mode), or ident/proxy-helper* for external mode
    - patched ident daemon
    - proxy-helper daemon
    - netfilter framework of Linux
    - ipfw on FreeBSD
    - BSD/pf (packetfilter)
    - BSD/ipfilter