πŸ“’ Actions Speak Louder Than Words!

Nenez9595 (bhgn 3)

Posted: Jul 6, 2021 | Reading time: 4 min
post

Assalamualaikum! Pada artikel bahagian ke-2 yang lepas, aku ada cakap untuk guna VPS personal untuk buat kerja cloning… Fuhh take time jugak rupanya walau pakai remote server utk fetch dari nenez9595.blogspot.com :

$ time httrack -q -%i -iC2 nenez9595.blogspot.com -O "/home/robbi/httrack" -n -%P -N0 -s2 -p7 -D -a -K0 -c10 -%k -A25000 -%c10 -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -%s -%u
Mirror launched on Sun, 04 Jul 2021 10:31:42 by HTTrack Website Copier/3.49-2+libhtsjava.so.2 [XR&CO'2014]
mirroring nenez9595.blogspot.com +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* with the wizard help..
* https://79d206c1-a-62cb3a1a-s-sites.googlegroups.com/site/123funjokes4all/creationjdate.js?attachauth=ANoY7cpRQ9lcY_QaSXG51nMX9B6Rh_yEWa4uCVOfi1W9oEmCvOMBxPW60ISSTXsw7lQTaG0oph901yfgGh6K21rTHkbku0Kxa5qhD9xP1kTaaL7Cmq18Op6QboJBPIL0H9d97548/9515: nenez9595.blogspot.com/search/label/Emak dan abah -  u will find them there if you want else .. go n find their faces elsewhere%2F got u%3F?updated-max=2012-04-17T02:59:00-07:00&max-results=20&start=20&by-date=false (65752 byPANIC! : Too many URLs : >99999 [3031]d-max=2016-07-10T14:51:00-07:00&max-results=3&reverse-paginate=true&start=102&by-date=false (77312 bytes) - OK
Done.
Thanks for using HTTrack!

real    1839m37.861s
user    45m21.864s
sys     5m44.017s

Kemudian, aku mv ke folder git dan cuba upload tapi ada isu limited fail size kat Github pulak:

$ git push
Enumerating objects: 105557, done.
Counting objects: 100% (105557/105557), done.
Compressing objects: 100% (97826/97826), done.
Writing objects: 100% (105556/105556), 1.79 GiB | 8.88 MiB/s, done.
Total 105556 (delta 96309), reused 2 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (96309/96309), done.
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 8098efb9e77359e435cccb71f3f68514e9b63c36b06203a32f540e0907c6835e
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File hts-cache/new.txt is 281.40 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File hts-cache/new.zip is 1721.38 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/RobbiNespu/nenez9595.blogspot.com.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/RobbiNespu/nenez9595.blogspot.com.git'

Hmm.. kalau buat LFS ni boleh solve kot, tapi aku duk fikir-fikir nanti kalau deploy kat Github pages, aku ada banyak sangat limitation, lebih baik aku migrate ke BitBucket atau Gitlab terus.

So aku buat repository workspace kat Gitlab dan commit kat sana semua , Nicely je dapat simpan kat remote source repository, takde issue saiz fail.

Kemudian, aku pun buat la fail .yml untuk proses CI/CD supaya fail static HTML ni akan available publicly melalui Gitlab pages. Boom! Jumpa issue lagi:

Running with gitlab-runner 14.0.1 (c1edb478)
  on docker-auto-scale 72989761
  feature flags: FF_SKIP_DOCKER_MACHINE_PROVISION_ON_CREATION_FAILURE:true
Preparing the "docker+machine" executor 00:15
Using Docker executor with image alpine:latest ...
Pulling docker image alpine:latest ...
Using docker image sha256:d4ff818577bc193b309b355b02ebc9220427090057b54a59e73b79bdfe139b83 for alpine:latest with digest alpine@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0 ...
Preparing environment 00:01
Running on runner-72989761-project-27918780-concurrent-0 via runner-72989761-srm-1625504483-7949ba0d...
Getting source from Git repository
$ eval "$CI_PRE_CLONE_SCRIPT"
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/nenez9595/nenez9595.gitlab.io/.git/
Created fresh repository.
Checking out 26301a63 as master...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
Using docker image sha256:d4ff818577bc193b309b355b02ebc9220427090057b54a59e73b79bdfe139b83 for alpine:latest with digest alpine@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0 ...
$ echo 'Nothing to do...'
Nothing to do...
Uploading artifacts for successful job
Uploading artifacts...
public: found 105564 matching files and directories 
ERROR: Uploading artifacts as "archive" to coordinator... too large archive  id=1400570193 responseStatus=413 Request Entity Too Large status=413 token=8RyUGyNS
FATAL: too large                                   
Cleaning up file based variables 00:01
ERROR: Job failed: exit code 1

Kat stage paling last dah tu iaitu part nak upload artifacts. Aku pun check la size folder tu sebab nak tahu berapa besor (tadi masa issue kat Github, aku tak check pun) dan fail apa yang besor sangat tu:

$ du -sh public/
6.8G    public/
$ find . -printf '%s %p\n'| sort -nr | head -10 | grep -v ".git"
3731456 ./public/nenez9595.blogspot.com
1037750 ./public/4.bp.blogspot.com/-eSv8FTdg_sE/WHczxmty8vI/AAAAAAAABEc/XpiSmC2Bw3AAqVIJBrplkASnPGepF8uWACLcB/s1600/Screenshot_2017-01-12-15-41-58.png
710617 ./public/1.bp.blogspot.com/-otXaWMUt6HA/UVacLqRsuaI/AAAAAAAAAJg/SQlf2OXknzY/s1600/Photo 0321.jpg
677596 ./public/4.bp.blogspot.com/-HdD4RIYIGIo/UV6DwKGTXKI/AAAAAAAAAMI/yQKh3iHdTJI/s1600/02042011155.jpg
636023 ./public/4.bp.blogspot.com/-zzD7B3BKwyk/UV6D6hwUhfI/AAAAAAAAAMY/J1QSfwnCiAg/s1600/02042011154.jpg
636023 ./public/4.bp.blogspot.com/-E-2E-Jd7KTE/UV6DzqG5taI/AAAAAAAAAMQ/Jh72UHEVGCQ/s1600/02042011154.jpg

Saiz assets tu memang besar gedabak juga, pastu bila aku sort out fail mana yang besor..hasilnya takde pun fail yang besar. Cuma aku perasan banyak fail-fail search<random chars here>.html, aku tengok isi dia takde apa yang penting pun so, okey je nak padam semua fail junky ni

$ ls -la | grep -v 'search*.html' | wc
  88467  796196 5829860

$ find . -name 'search*.html' -type f -delete

Aku pun commit changes dan tunggu pipeline CI/CD run. Alhamdulillah, aku berhasil

Salinan mirror tu boleh akses di nenez9595.gitlab.io/nenez9595.blogspot.com/index.html tapi pages tu tak render properly dan ada javascript dan css tak berfungsi dengan baik..hmmm πŸ₯²

Hish tak boleh jadi ni. Takkan nak sampai sini je? So aku plan nak improve skrip scrapper yang aku mentioned pada bahagian ke-2 yang yang lepas. UI pun aku kena ubah , aku akan buat nampak profesional dan cleaner. Aku boleh list papers, buku dan apa-apa yang sesuai juga yang arwah pernah publish atau pre-print πŸ€”

Nanti free-free aku sambung balik, sekarang aku busy sikit dengan projek Park N Shop Hongkong, maybe hujung tahun (2021) sampai tahun (2022) depan aku takde kat Malaysia sebab kena fly pergi sana. Musim-musim pandemik ni tak tahu la macamna. Eh sembang pasal diri aku pula.. haha.. ok la bye! Tunggu bahagian ke-4 ya 🀀

Edit

Have some thoughts, discussion or feedback on this post?
IndieWeb Interactions

Below you can find the interactions that this page has had using Indieweb. Which means, you can mentioned this URL on any website that support WebMention. Have you written a response to this post? Let me know the URL:

((Do you use a website that do not set up with WebMention capabilities? You can use Comment Parade.)