po: stop stripping non-translated strings from po files

We previously adopted a minimization technique for po files which
stripped source locations and non-translated msgids in order to save
space in the git repos and have saner commit diffs.

At this time it is not possible to integrate with weblate while having
non-translated msgids stripped, as it will immediately add them back
again.

By keeping all non-translated msgids, our .po files are about x2 the
size at 37 MB vs the original 18 MB. This is still way better than the
original po/ directory which was 109 MB. We're saving 38 MB by still
omitting source file locations, and another 34 MB are saved by the
dropping of all languages which are 100% untranslated.

Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This commit is contained in:
Daniel P. Berrangé 2020-05-18 17:49:03 +01:00
parent 737cadff77
commit 35d68db5b1
3 changed files with 1 additions and 58 deletions

View File

@ -66,7 +66,6 @@ EXTRA_DIST = \
scripts/header-ifdef.py \
scripts/hvsupport.py \
scripts/hyperv_wmi_generator.py \
scripts/minimize-po.py \
scripts/mock-noinline.py \
scripts/prohibit-duplicate-header.py \
scripts/reformat-news.py \

View File

@ -48,9 +48,7 @@ update-po: $(POTFILE)
echo "Minimizing $$lang content" && \
$(MSGMERGE) --no-location --no-fuzzy-matching --sort-output \
$$lang.po $(POTFILE) | \
$(SED) $(SED_PO_FIXUP_ARGS) | \
$(RUNUTF8) $(PYTHON) $(top_srcdir)/scripts/minimize-po.py > \
$(srcdir)/$$lang.po-t && \
$(SED) $(SED_PO_FIXUP_ARGS) > $(srcdir)/$$lang.po-t && \
mv $$lang.po-t $$lang.po
done

View File

@ -1,54 +0,0 @@
#!/usr/bin/env python3
#
# Copyright (C) 2018-2019 Red Hat, Inc.
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library. If not, see
# <http://www.gnu.org/licenses/>.
import re
import sys
block = []
msgstr = False
empty = False
unused = False
fuzzy = False
for line in sys.stdin:
if line.isspace():
if not empty and not unused and not fuzzy:
for b in block:
print(b, end='')
block = []
msgstr = False
fuzzy = False
block.append(line)
else:
if line.startswith("msgstr"):
msgstr = True
empty = True
if line[0] == '#' and "fuzzy" in line:
fuzzy = True
if line.startswith("#~ msgstr"):
unused = True
if msgstr and re.search(r'".+"', line):
empty = False
block.append(line)
if not empty and not unused and not fuzzy:
for b in block:
print(b, end='')