2020. május 29., péntek

PDF fájl átalakítása Wiki formátumra

sudo apt install pandoc
sudo apt-get install poppler-utils

PDF --> HTML
sudo mkdir kimenet
sudo pdftohtml -s -p -fmt png -nodrm "file.pdf" "file/file.html"

You can type pdftohtml -h to gain a better understanding of available parameters.
I've explained the parameters used here for the sake of understanding the command:
  • -s contains all of the output within one HTML document (excluding the outline.
  • -p attempts to replaces pdf internal linking with html links.
  • -fmt controls the output format of images, with png and jpg being valid options.
  • -nodrm igores download rights management restrictions on the PDF.
  • -i ignores images. I didn't use this, but it felt prudent to mention as in some cases it may massively speed your output format.

Alternatív módszer: Poppler pdftotext

pdftotext -htmlmeta "file.pdf" "file.html"

 Replace "file" with the name of the file you want to parse and with the name of the HTML file you want to write your text output to. 
 The `-htmlmeta` option creates an HTML version of the text in your PDF. (This is much less fancy than the previous command and only puts the text in `pre` tags). You should see an HTML file in your directory which you can open to check the results of. Depending on the formatting of your source PDF file, you may find that Poppler is variable in it's effectiveness. You can try running `pdftotext -h` for information on other command options that may improve or worsen your results. 

Pandoc: HTML --> MediaWiki

 pandoc file.html -f html -t mediawiki -s -o file.txt
  • -f bemeneti formátum
  • -t kimeneti formátum
  • -s Standalone adds a header and footer to the document, rather than producing a document fragment.
  • -o The name of the output file.
Pandoc user guide.
It is possible you may run into an error with Pandoc, presumably caused by your file being too large. I ran into this error and some fixes can be found here.

Opció: rossz kódolás kitakarítása

Depending on your PDF encoding, you may find strange Unicode charecters in your HTML output. This step is intended to clean up this output to the best possible degree of accuracy. ftfy, stands for fixes text for you, and it's a Python library with a command-line interface. We'll be using the command line to clean our files. This step is preformed before using Pandoc.

ftfy telepítése:
git clone https://github.com/LuminosoInsight/python-ftfy.git
cd python-ftfy
sudo python setup.py install
Or, if you system has pip, pip install ftfy. Note that if you want to use a version of 5.0 (most recent available at time of writing) or later, you need Python 3. I used Python 2.x with ftfy 4.1.1 for this answer. Using the same directory, type the following command:
 ftfy -o file_clean.html --preserve-entities file.html
Optionally, you may include the --guess option to have ftfy guess your encoding, or --encoding if you know your encoding. This may produce better results.

2020. május 28., csütörtök

Resolving the 502 Bad Gateway Error in Nginx, Ubuntu 16.04 - 20.04 upgrade and PHP5 - PHP7 upgrade

PHP 7.2 Ubuntu 20.04 502 Bad gateway Error message

Set path correctly in
sudo nano /etc/nginx/sites-available/default
sudo nano /etc/nginx/snippets

BAD: fastcgi_pass unix:/var/run/php/php7.0-fpm.sock;
GOOD: fastcgi_pass unix:/var/run/php/php7.2-fpm.sock;
---


sudo nano /etc/php/7.0/fpm/pool.d/www.conf

change
listen = 127.0.0.1:9000
to
listen = /var/run/php7.2-fpm.sock


sudo apt-get -y install php7.2 php7.2-mysql php7.2-fpm php-fpm

chown :www-data /var/run/php/php7.2-fpm.sock

sudo apt install php-mysql

Reading package lists... Done
Building dependency tree     
Reading state information... Done
The following additional packages will be installed:
  php7.2-mysql
The following NEW packages will be installed:
  php-mysql php7.2-mysql
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 125 kB of archives.
After this operation, 432 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://us.archive.ubuntu.com/ubuntu bionic-updates/main i386 php7.2-mysql i386 7.2.24-0ubuntu0.18.04.6 [123 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu bionic/main i386 php-mysql all 1:7.2+60ubuntu1 [2,004 B]
Fetched 125 kB in 0s (266 kB/s)     
Selecting previously unselected package php7.2-mysql.
(Reading database ... 86153 files and directories currently installed.)
Preparing to unpack .../php7.2-mysql_7.2.24-0ubuntu0.18.04.6_i386.deb ...
Unpacking php7.2-mysql (7.2.24-0ubuntu0.18.04.6) ...
Selecting previously unselected package php-mysql.
Preparing to unpack .../php-mysql_1%3a7.2+60ubuntu1_all.deb ...
Unpacking php-mysql (1:7.2+60ubuntu1) ...
Setting up php7.2-mysql (7.2.24-0ubuntu0.18.04.6) ...

Creating config file /etc/php/7.2/mods-available/mysqlnd.ini with new version

Creating config file /etc/php/7.2/mods-available/mysqli.ini with new version

Creating config file /etc/php/7.2/mods-available/pdo_mysql.ini with new version
Setting up php-mysql (1:7.2+60ubuntu1) ...
Processing triggers for libapache2-mod-php7.2 (7.2.24-0ubuntu0.18.04.6) ...
Processing triggers for php7.2-fpm (7.2.24-0ubuntu0.18.04.6) ...
NOTICE: Not enabling PHP 7.2 FPM by default.
NOTICE: To enable PHP 7.2 FPM in Apache2 do:
NOTICE: a2enmod proxy_fcgi setenvif
NOTICE: a2enconf php7.2-fpm
NOTICE: You are seeing this message because you have apache2 package installed.



sudo service php7.2-fpm restart
sudo service php-fpm restart
sudo service nginx restart


Check /var/log/nginx/error.log if sth is still not ok.

2020. május 26., kedd

Syntax errors in translated files - Letter to poeditor.com

I've used poeditor with Google Translate to translate open source software - LearnPress and Give-WP plugins.

I've paid 8$ yet the output is full of syntax errors - special characters like & and $ are messed up, some important extra lines omitted starting with # (see e.g. Learnpress WP plugin PO file, lines: "#, php-format").

Examples from Give-WP translation - full source below:

1. Unwanted character conversion - and worse, bad output due to extra space:
msgid "Next »"
msgstr "Következő & raquo;"

2. Extra spaces added that breaks variables:
msgid "Edit Donor: %1$s %2$s"
msgstr "Adományozó szerkesztése:% 1 $ s% 2 $ s"

3. Wrong character encoding:
msgid "Before - %s‎10"
msgstr "Előtt -% s &# x200e; 10"

etc...

Original:
https://www.pastefs.com/pid/211223

Translated but with syntax errors
https://www.pastefs.com/pid/211222

The Transifex translation portal also does not accept the file you generate as an input file due to the syntax errors.

Can you fix these errors?