commit: 4a9110a5671e916edfc2caf8c6d46f252e8f01a0
parent: 2256ac335c261548ed2504d3135d74a639a9d1fc
author: Chris Noxz <chris@noxz.tech>
date: Fri, 8 Dec 2023 16:38:05 +0100
add README
A | README | 109 | ++++++++++++++++++++ |
1 file changed, 109 insertions(+)
diff --git a/README b/README
@@ -0,0 +1,109 @@
+# snto (Swedish National Test Organizer)
+
+snto is a project designed to organize and extract information from PDFs of
+Swedish National Tests. It includes a set of tools to extract data, create a
+SQLite database, and serve a web page for filtering tasks from multiple
+national tests.
+
+## Contents
+
+- [Files](#files)
+- [Usage](#usage)
+- [Requirements](#requirements)
+- [Installation](#installation)
+- [Extract.dat Format](#extractdat-format)
+- [Developers](#developers)
+- [License](#license)
+- [Legal Note](#legal-note)
+
+## Files
+
+The project consists of the following files:
+
+1. `extract.dat`: Rules for extracting data/tasks from PDFs of the Swedish
+ national tests.
+2. `extract.sh`: Script using `extract.dat` to extract cutouts from PDFs using
+ ImageMagick and creating an SQLite database (`npo.db`) containing tasks,
+ task images, and filterable tags/keywords.
+3. `serve.py`: Simple web server written in Python that serves a web page for
+ filtering tasks from multiple national tests using `npo.db`. Clicking on
+ task images provides more images about correct solutions with assessment
+ guidelines -- also based on cutouts from PDFs.
+
+## Usage
+
+1. **Data Extraction:**
+ ```
+ ./extract.sh
+ ```
+
+2. **Serve Web Page:**
+ ```
+ ./serve.py
+ ```
+
+Visit [http://localhost:8080](http://localhost:8080) in your browser to
+interact with the web page.
+
+## Requirements
+
+- [ImageMagick](https://imagemagick.org/)
+- Python 3.x (including: sqlite3, base64, http.server and urllib.parse)
+- SQLite
+- md5sum (Ensure it's available on your system. If not, install it.)
+
+## Installation
+
+1. Clone the repository:
+ ```
+ git clone https://noxz.tech/git/snto.git
+ ```
+
+2. Navigate to the project directory:
+ ```
+ cd snto
+ ```
+
+3. Install dependencies:
+ ```
+ # Install ImageMagick, Python (including: sqlite3, base64, http.server and
+ # urllib.parse), SQLite, and md5sum as per your system requirements
+ ```
+
+## Extract.dat Format
+
+The rules in the `extract.dat` file follows the format:
+<PDF-url>|<course>|<semester>|<task-number>|<comma,separated,tags>|<cropbox-1>|...|<cropbox-n>
+
+Each cropbox format (used by ImageMagick):
+<pageindex>.<width>x<height>+<x-coord>+<y-coord>
+
+## Developers
+
+The cropbox instructions are based on images scaled at 200ppi.
+Top (y-coord) based on the upper limit of the task - 23 pixels, bottom based on
+the baseline of the lowest text of the task +23 pixels. X-coordinate is always
+175 and width always 1300.
+
+## License
+
+This project is licensed under the [GNU General Public License v3.0](./LICENSE).
+
+## Legal Note
+
+The PDFs referenced in the `extract.dat` file are copyrighted materials, and as
+such, they are not included in the published code. The cutouts from the PDFs
+are generated dynamically only when the `extract.sh` script is executed. It is
+important to note that the content contained in these PDFs may be subject to
+legal restrictions, and it could be illegal to publicly publish or distribute
+them without proper authorization.
+
+Each user of this codebase should exercise caution and comply with all relevant
+copyright and intellectual property laws in their jurisdiction. The
+responsibility for ensuring legal use of the extracted content rests solely
+with the user.
+
+The authors and contributors of this project are not liable for any misuse or
+unauthorized distribution of the extracted materials. Users are strongly
+advised to seek legal advice if they have any uncertainties regarding the
+legality of using or sharing the extracted content.