snto

Swedish National Test Organizer
git clone https://noxz.tech/git/snto.git
Log | Files | README | LICENSE

README
1# snto (Swedish National Test Organizer)
2
3snto is a project designed to organize and extract information from PDFs of
4Swedish National Tests. It includes a set of tools to extract data, create a
5SQLite database, and serve a web page for filtering tasks from multiple
6national tests.
7
8## Contents
9
10- [Files](#files)
11- [Usage](#usage)
12- [Requirements](#requirements)
13- [Installation](#installation)
14- [Extract.dat Format](#extractdat-format)
15- [Developers](#developers)
16- [License](#license)
17- [Legal Note](#legal-note)
18
19## Files
20
21The project consists of the following files:
22
231. `extract.dat`: Rules for extracting data/tasks from PDFs of the Swedish
24   national tests.
252. `extract.sh`: Script using `extract.dat` to extract cutouts from PDFs using
26   ImageMagick and creating an SQLite database (`npo.db`) containing tasks,
27   task images, and filterable tags/keywords.
283. `serve.py`: Simple web server written in Python that serves a web page for
29   filtering tasks from multiple national tests using `npo.db`. Clicking on
30   task images provides more images about correct solutions with assessment
31   guidelines -- also based on cutouts from PDFs.
32
33## Usage
34
351. **Data Extraction:**
36   ```
37   ./extract.sh
38   ```
39
402. **Serve Web Page:**
41   ```
42   ./serve.py
43   ```
44
45Visit [http://localhost:8080](http://localhost:8080) in your browser to
46interact with the web page.
47
48## Requirements
49
50- [ImageMagick](https://imagemagick.org/)
51- Python 3.x (including: sqlite3, base64, http.server and urllib.parse)
52- SQLite
53- md5sum (Ensure it's available on your system. If not, install it.)
54
55## Installation
56
571. Clone the repository:
58   ```
59   git clone https://noxz.tech/git/snto.git
60   ```
61
622. Navigate to the project directory:
63   ```
64   cd snto
65   ```
66
673. Install dependencies:
68   ```
69   # Install ImageMagick, Python (including: sqlite3, base64, http.server and
70   # urllib.parse), SQLite, and md5sum as per your system requirements
71   ```
72
73## Extract.dat Format
74
75The rules in the `extract.dat` file follows the format:
76<PDF-url>|<course>|<semester>|<task-number>|<comma,separated,tags>|<cropbox-1>|...|<cropbox-n>
77
78Each cropbox format (used by ImageMagick):
79<pageindex>.<width>x<height>+<x-coord>+<y-coord>
80
81## Developers
82
83The cropbox instructions are based on images scaled at 200ppi.
84Top (y-coord) based on the upper limit of the task - 23 pixels, bottom based on 
85the baseline of the lowest text of the task +23 pixels. X-coordinate is always
86175 and width always 1300.
87
88## License
89
90This project is licensed under the [GNU General Public License v3.0](./LICENSE).
91
92## Legal Note
93
94The PDFs referenced in the `extract.dat` file are copyrighted materials, and as 
95such, they are not included in the published code. The cutouts from the PDFs
96are generated dynamically only when the `extract.sh` script is executed. It is
97important to note that the content contained in these PDFs may be subject to
98legal restrictions, and it could be illegal to publicly publish or distribute
99them without proper authorization.
100
101Each user of this codebase should exercise caution and comply with all relevant 
102copyright and intellectual property laws in their jurisdiction. The
103responsibility for ensuring legal use of the extracted content rests solely
104with the user.
105
106The authors and contributors of this project are not liable for any misuse or
107unauthorized distribution of the extracted materials. Users are strongly
108advised to seek legal advice if they have any uncertainties regarding the
109legality of using or sharing the extracted content.