Post

Web Dev Notes

Web Scraping

Charles

Install a certificate on the device and set up the proxy. Then, you can see the request and response in Charles.

Store the curl command, convert them in to scripts.

Use the pause and store function to capture the traffics you want.

Python

The scraped request is either post or get request. The requests typically contains url, cookies and headers, along with possible other data, Seperate and interpret them.

The returned response is usually in json format. Use json.loads to convert them into dictionary. It could be handy to write a script to display the json in a more readable way.

Remember to store cookies as environment variables in a open source project.

The scraping script could be progressive. First get the id of the items, then get the details of the items.

Python Package Stucture

  • Package Initialization: The presence of an init.py file in a directory marks it as a Python package, allowing it to be recognized and imported by other Python scripts. Any code within this file is executed during the package’s initial import, facilitating package-level setup or initializations.

  • Enabling Relative Imports: Within a package, init.py enables the use of relative imports for modules. This means that modules within the same package can be imported using paths relative to the package’s location. The base path for these imports is considered to be the parent directory of the package, allowing for a structured and hierarchical organization of modules.

Flask

Flask can serve as the backend of a python web app.

Framework

  • app.py: The main file of the app.
  • templates/: The html files.
  • static/: The static files, such as css and js.

Flask operates on the concept of routes. The minimal structure of a app file is:

1
2
3
4
5
6
7
8
9
from flask import Flask, render_template

# Init a flask object
app = Flask(__name__)

# The view function for the route
@app.route('/')
def index():
    return render_template('index.html')

When the script starts running, the app will listen to the port and process the requests, calling the view function when the route is matched.

Jinja2

Jinja2 is a template engine for Python. It allows you to embed variables, loops and conditions in the html files.

Flask incorperates Jinja2 to render templates. The syntax is ``.

Jinja2 also supports for and if statements, as well as extends and import to reuse codes.

The ‘url_for’ function can be used to generate the url for a route or a static file.

Server

Gunicorn

flask run can run the app on the local server, along with rich funtions for dev use. But it is not suitable for production use.

Produnction servers provide better performance(e.g. concurrency) and security. gunicorn is a popular choice. This is a gunicorn system service file /etc/systemd/system/myapp.service example:

1
2
3
4
5
6
7
8
9
10
11
12
13
[Unit]
Description=Gunicorn instance to serve myapp
After=network.target

[Service]
User=ubuntu
Group=www-data
WorkingDirectory=/home/ubuntu/fairarchive/flask
Environment="PATH=/home/ubuntu/fairarchive/venv/bin"
ExecStart=/home/ubuntu/fairarchive/venv/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 app:app

[Install]
WantedBy=multi-user.target

Nginx

The app.sock file is created by gunicorn and can be used by the Nginx server. Nginx is a reverse proxy server that can handle the static files and forward the dynamic requests to the gunicorn server. It also provides better security(SSL/TLS) and performance(e.g. compression).

Nginx is configured as a system service on installation. This is a configuration file located at /etc/nginx/sites-available/app:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
server {
    listen 80;
    server_name 175.178.187.192 fairarchive.icu www.fairarchive.icu;

    location / {
        proxy_pass http://unix:/home/ubuntu/fairarchive/flask/app.sock;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    location /static/ {
        alias /home/ubuntu/fairarchive/flask/static/;
    }
}

The configuration file should like to the sites-enabled directory: sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled

Remember to modify the permisions of app.sock file and its parents as gunicorn and nginx are running as different users typically.

Database

SQLite

SQLite is a lightweight database that can be used for small projects. It is a file-based database, which means it is easy to use and deploy. It is well-suited for the use case where only one user is writing the database at a time.

Crontab

Crontab is a time-based job scheduler in Unix-like computer operating systems. It can be used to schedule a job to run at a specific time or at a specific interval.

This is a crontab command to run the database update script periodically: 0 0,8,16 * * * /bin/bash -c 'cd ~/fairarchive && source venv/bin/activate && python3 scripts/db_update.py'

SQLAlchemy

SQLAlchemy serves as a comprehensive framework for Python that facilitates interaction with databases. It employs Object-Relational Mapping (ORM) to provide a high-level interface, allowing developers to manage database operations using Python objects rather than SQL queries directly.

The essence of SQLAlchemy lies in its ability to generate raw SQL queries through a more intuitive Pythonic interface. It’s crucial to understand that SQLAlchemy is not a database itself; rather, it acts as an intermediary tool that simplifies database interactions.

  • Engine & Session: SQLAlchemy introduces concepts like ‘engine’ and ‘session’ to manage database connections and transactions, respectively. These components are fundamental in establishing and controlling interactions with the database.
  • Model Classes: To leverage SQLAlchemy, developers must define model classes that mirror the structure of database tables. These classes serve as the primary means of interacting with the database. Moreover, SQLAlchemy supports generating model classes from existing databases through a process known as reflection.
  • Relationships: The framework allows for the representation of table relationships within model classes using the relationship function. This feature is instrumental in navigating and managing the associations between different data entities.
  • Querying: With the introduction of SQLAlchemy2, the selection process has been streamlined. The select function now serves as the cornerstone for constructing queries. By chaining methods to this function, developers can build queries that closely resemble traditional SQL syntax. For retrieving single results in their respective class objects, the scalar() method is used.

Beginners are encouraged to follow the official SQLAlchemy tutorial to grasp the fundamental concepts and practical applications of these components. This step-by-step guide provides a solid foundation for understanding how to effectively utilize SQLAlchemy in real-world projects.

This post is licensed under CC BY 4.0 by the author.