New access services in HbbTV based on a deep learning approach for media content analysis

Silvia Uribe; Alberto Belmonte; Francisco Moreno; Álvaro Llorente; Juan Pedro López; Federico Álvarez

doi:10.1017/S0890060419000350

New access services in HbbTV based on a deep learning approach for media content analysis

Published online by Cambridge University Press: 04 December 2019

and

Silvia Uribe*: Affiliation:
Grupo de Aplicación de Telecomunicaciones Visuales, ETSIT, Universidad Politécnica de Madrid, Madrid, Spain
Alberto Belmonte: Affiliation:
Grupo de Aplicación de Telecomunicaciones Visuales, ETSIT, Universidad Politécnica de Madrid, Madrid, Spain
Francisco Moreno: Affiliation:
Grupo de Aplicación de Telecomunicaciones Visuales, ETSIT, Universidad Politécnica de Madrid, Madrid, Spain
Álvaro Llorente: Affiliation:
Grupo de Aplicación de Telecomunicaciones Visuales, ETSIT, Universidad Politécnica de Madrid, Madrid, Spain
Juan Pedro López: Affiliation:
Grupo de Aplicación de Telecomunicaciones Visuales, ETSIT, Universidad Politécnica de Madrid, Madrid, Spain
Federico Álvarez: Affiliation:
Grupo de Aplicación de Telecomunicaciones Visuales, ETSIT, Universidad Politécnica de Madrid, Madrid, Spain
*: Author for correspondence: Silvia Uribe, E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Universal access on equal terms to audiovisual content is a key point for the full inclusion of people with disabilities in activities of daily life. As a real challenge for the current Information Society, it has been detected but not achieved in an efficient way, due to the fact that current access solutions are mainly based in the traditional television standard and other not automated high-cost solutions. The arrival of new technologies within the hybrid television environment together with the application of different artificial intelligence techniques over the content will assure the deployment of innovative solutions for enhancing the user experience for all. In this paper, a set of different tools for image enhancement based on the combination between deep learning and computer vision algorithms will be presented. These tools will provide automatic descriptive information of the media content based on face detection for magnification and character identification. The fusion of this information will be finally used to provide a customizable description of the visual information with the aim of improving the accessibility level of the content, allowing an efficient and reduced cost solution for all.

Keywords

Computer vision deep learning face detection media accessibility

Type: Research Article
Information: AI EDAM , Volume 33 , Special Issue 4: Intelligent Interaction Design , November 2019 , pp. 399 - 415

DOI: https://doi.org/10.1017/S0890060419000350 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agustsson, E, Timofte, R, Escalera, S, Baro, X, Guyon, I and Rothe, R (2017) Apparent and real age estimation in still images with deep residual regressors on APPA-REAL database. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, pp. 87–94.Google Scholar

Ahmed, AH, Kpalma, K and Guedi, AO (2017) Human detection using HOG-SVM, mixture of Gaussian and background contours subtraction. 2017 13th International Conference on Signal-Image Technology Internet-Based Systems (SITIS), pp. 334–338. doi:10.1109/SITIS.2017.62CrossRef Google Scholar

Belhumeur, PN, Jacobs, DW, Kriegman, DJ and Kumar, N (2013) Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2930–2940.CrossRef Google Scholar PubMed

Bertinetto, L, Valmadre, J, Henriques, JF, Vedaldi, A and Torr, PHS (2016) Fully-convolutional Siamese networks for object tracking. CoRR, abs/1606.0. Available at http://arxiv.org/abs/1606.09549 CrossRef Google Scholar

Broad, A, Jones, M and Lee, T-Y (2018) Recurrent multi-frame single shot detector for video object detection. British Machine Video Conference (BMVC), Newcastle, UK.Google Scholar

Cao, Z, Simon, T, Wei, S-E and Sheikh, Y (2016) Realtime multi-person 2D pose estimation using part affinity fields. CoRR, abs/1611.0. Available at http://arxiv.org/abs/1611.08050 Google Scholar

Castellano, B (2018) Pyscenedetect. Available at https://pyscenedetect.readthedocs.io Google Scholar

Chakraborty, S and Das, D (2014) An overview of face liveness detection. CoRR, abs/1405.2. Available at http://arxiv.org/abs/1405.2227 CrossRef Google Scholar

Claudy, L (2012) The broadcast empire strikes back. IEEE Spectrum 49, 52–58. doi:10.1109/MSPEC.2012.6361764Google Scholar

CNMC (2017) Informe sobre el seguimiento de las obligaciones impuestas en materia de accesibilidad correspondiente al año 2016. Available at https://www.cnmc.es/sites/default/files/1855187_9.pdf Google Scholar

CSA (2017) L'accessibilité des programmes de télévision aux personnes handicapées et la représentation du hándicap à l'antenne. Conseil Supérieur de L'audiovisuel. Rapport annuel 2016.Google Scholar

Cuimei, L, Zhiliang, Q, Nan, J and Jianhua, W (2017) Human face detection algorithm via Haar cascade classifier combined with three additional classifiers. 2017 13th IEEE International Conference on Electronic Measurement Instruments (ICEMI), pp. 483–487. doi:10.1109/ICEMI.2017.8265863CrossRef Google Scholar

Danelljan, M, Häger, G, Khan, FS and Felsberg, M (2014) Accurate scale estimation for robust visual tracking. British Machine Vision Conference (BMVC), Nottingham, UK.CrossRef Google Scholar

Domínguez, A, Agirre, M, Flörez, J, Lafuente, A, Tamayo, I and Zorrilla, M (2018) Deployment of a hybrid broadcast-internet multi-device service for a live TV programme. IEEE Transactions on Broadcasting 64, 153–163. doi:10.1109/TBC.2017.2755403CrossRef Google Scholar

EasyTV Project (n.d.) EasyTV project website. Available at https://easytvproject.eu/Google Scholar

eMarketer (2017) US simultaneous media users: eMarketer's estimates for 2017. Available at https://www.emarketer.com/Report/US-Simultaneous-Media-Users-eMarketers-Estimates-2017/2002163 Google Scholar

ETSI (2016) Hybrid broadcast broadband TV ETSI standard TS 102 796 2016. Available at https://www.etsi.org/deliver/etsi_ts/102700_102799/102796/01.04.01_60/ts_102796v010401p.pdf Google Scholar

European Commission (2010) European disability strategy 2010-2020: a renewed commitment to a barrier-free Europe. Available at https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=COM:2010:0636:FIN:en:PDF Google Scholar

Feichtenhofer, C, Pinz, A and Zisserman, A (2017) Detect to track and track to detect. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawai, USA, pp. 3038–3046.Google Scholar

Fiaz, M, Mahmood, A and Jung, SK (2018) Tracking noisy targets: a review of recent object tracking approaches. ArXiv Preprint ArXiv:1802.03098.Google Scholar

Gordon, D, Farhadi, A and Fox, D (2017) Re3: real-time recurrent regression networks for object tracking. CoRR, abs/1705.0. Available at http://arxiv.org/abs/1705.06368 Google Scholar

Güler, RA, Neverova, N and Kokkinos, I (2018) DensePose: dense human pose estimation in the wild. CoRR, abs/1802.0. Available at http://arxiv.org/abs/1802.00434 Google Scholar

Hassaballah, M, Abdelmgeid, AA and Alshazly, HA (2016) Image Feature Detectors and Descriptors. In Awad, Ali Ismail and Hassaballah, Mahmoud (eds), Image Feature Detectors and Descriptors. Springer International Publishing (Verlag), pp. 11–45.CrossRef Google Scholar

He, K, Zhang, X, Ren, S and Sun, J (2016) Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, pp. 770–778.CrossRef Google Scholar

He, K, Gkioxari, G, Dollár, P and Girshick, RB (2017) Mask {R-CNN}. CoRR, abs/1703.0. Available at http://arxiv.org/abs/1703.06870 Google Scholar

Held, D, Thrun, S and Savarese, S (2016) Learning to track at 100 {FPS} with deep regression networks. CoRR, abs/1604.0. Available at http://arxiv.org/abs/1604.01802 Google Scholar

Henriques, JF, Caseiro, R, Martins, P and Batista, J (2014) High-speed tracking with kernelized correlation filters. CoRR, abs/1404.7. Available at http://arxiv.org/abs/1404.7584 Google Scholar

Howard, AG, Zhu, M, Chen, B, Kalenichenko, D, Wang, W, Weyand, T, Andreetto, M, Adam, H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. ArXiv Preprint ArXiv:1704.04861.Google Scholar

Immersive Accessibility Project (n.d.) Immersive accessibility project website. Available at http://www.imac-project.eu/Google Scholar

Jain, V and Learned-Miller, E (2010) FDDB: a benchmark for face detection in unconstrained settings.Google Scholar

Le, V, Brandt, J, Lin, Z, Bourdev, L and Huang, TS (2012) Interactive facial feature localization. European Conference on Computer Vision, Florence, Italy, pp. 679–692.CrossRef Google Scholar

Lin, T-Y, Goyal, P, Girshick, RB, He, K and Dollár, P (2017) Focal loss for dense object detection. CoRR, abs/1708.0. Available at http://arxiv.org/abs/1708.02002 Google Scholar

Liu, A, Du, Y, Wang, T, Li, J, Li, EQ, Zhang, Y and Zhao, Y (2011) Fast facial landmark detection using cascade classifiers and a simple 3D model. 2011 18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, pp. 845–848.CrossRef Google Scholar

Liu, W, Anguelov, D, Erhan, D, Szegedy, C, Reed, SE, Fu, C-Y and Berg, AC (2015) SSD: Single Shot MultiBox Detector. CoRR, abs/1512.0. Available at http://arxiv.org/abs/1512.02325.Google Scholar

Lukezic, A, Vojir, T, Cehovin, L, Matas, J and Kristan, M (2016) Discriminative correlation filter with channel and spatial reliability. CoRR, abs/1611.0. Available at http://arxiv.org/abs/1611.08461 Google Scholar

Luo, W, Xing, J, Milan, A, Zhang, X, Liu, W, Zhao, X and Kim, T-K (2014) Multiple object tracking: a literature review. ArXiv Preprint ArXiv:1409.7618.Google Scholar

Malhotra, R (2013) Hybrid broadcast broadband TV: the way forward for connected TVs. IEEE Consumer Electronics Magazine 2, 10–16. doi:10.1109/MCE.2013.2251760CrossRef Google Scholar

Matamala, A, Orero, P, Rovira-Esteva, S, Casas Tost, H, Morales Morante, F, Soler Vilageliu, O and Tor-Carroggio, I (2018) User-centric approaches in access services evaluation: profiling the end user. Proceedings of the Eleventh International Conference on Language Resources Evaluation (LREC 2018), Miyazaki, Japan, pp. 1–7.Google Scholar

McNally, J and Harrington, B (2017) How millennials and teens consume mobile video. Proceedings of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video. New York, NY, USA: ACM, pp. 31–39. doi:10.1145/3077548.3077555.CrossRef Google Scholar

Messer, K, Matas, J, Kittler, J, Luettin, J and Maitre, G (1999) XM2VTSDB: The extended M2VTS database. Second International Conference on Audio and Video-Based Biometric Person Authentication, Washington, DC, USA, Vol. 964, pp. 965–966.Google Scholar

Monzo, D, Albiol, A, Albiol, A and Mossi, JM (2010) A comparative study of facial landmark localization methods for face recognition using hog descriptors. 2010 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, pp. 1330–1333.CrossRef Google Scholar

NIELSEN a (2017) The Nielsen comparable metrics report, Q1-2016. Available at https://www.nielsen.com/us/en/insights/reports/2016/the-comparable-metrics-report-q1-2016.html Google Scholar

NIELSEN b (2017) The Nielsen comparable metrics report, Q2-2016. Available at https://www.nielsen.com/us/en/insights/reports/2016/the-comparable-metrics-report-q2-2016.html Google Scholar

NIELSEN c (2017) The Nielsen comparable metrics report, Q3-2016. Available at https://www.nielsen.com/us/en/insights/reports/2017/the-comparable-metrics-report-q3-2016.html Google Scholar

NIELSEN d (2017) The Nielsen comparable metrics report, Q4-2016. Available at https://www.nielsen.com/us/en/insights/reports/2017/the-comparable-metrics-report-q4-2016.html Google Scholar

NIELSEN e (2018) The Nielsen comparable metrics report, Q1-2017. Available at https://www.nielsen.com/us/en/insights/reports/2017/the-nielsen-comparable-metrics-report-q1-2017.html Google Scholar

NIELSEN f (2018) The Nielsen comparable metrics report, Q2-2017. Available at https://www.nielsen.com/us/en/insights/reports/2017/the-nielsen-comparable-metrics-report-q2-2017.html Google Scholar

Ning, G, Zhang, Z, Huang, C, Ren, X, Wang, H, Cai, C and He, Z (2017) Spatially supervised recurrent convolutional neural networks for visual object tracking. 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4.CrossRef Google Scholar

Orero, P, Martín, CA and Zorrilla, M (2015) HBB4ALL: deployment of HbbTV services for all. 2015 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, Baltimore, Maryland, USA, pp. 1–4, doi:10.1109/BMSB.2015.7177252.CrossRef Google Scholar

Padilla, R, Filho, C and Costa, M (2012) Evaluation of Haar cascade classifiers designed for face detection. World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering 6, 466–469Google Scholar

Prosperity4All Project (n.d.) Prosperity 4All project website. Available at http://www.prosperity4all.eu/Google Scholar

Redmon, J, Divvala, SK, Girshick, RB and Farhadi, A (2015) You only look once: unified, real-time object detection. CoRR, abs/1506.0. Available at http://arxiv.org/abs/1506.02640 Google Scholar

Ren, S, He, K, Girshick, RB and Sun, J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.0. Available at http://arxiv.org/abs/1506.01497.Google Scholar

Rothe, R, Timofte, R and Van Gool, L (2018) Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision 126, 144–157.CrossRef Google Scholar

Sáez Trigueros, D, Meng, L and Hartnett, M (2018) Face recognition: from traditional to deep learning methods. CoRR, abs/1811.00116.Google Scholar

Sagonas, C, Antonakos, E, Tzimiropoulos, G, Zafeiriou, S and Pantic, M (2016) 300 faces in-the-wild challenge: database and results. Image and Vision Computing 47, 3–18.CrossRef Google Scholar

Simonyan, K and Zisserman, A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv Preprint ArXiv:1409.1556.Google Scholar

Sodagar, I (2011) The MPEG-DASH standard for multimedia streaming over the internet. IEEE MultiMedia 18, 62–67. doi:10.1109/MMUL.2011.71CrossRef Google Scholar

Statista (2017) Smart TV shipments worldwide. Available at https://www.statista.com/statistics/461561/smart-tv-shipments-worldwide-by-region/Google Scholar

Vinayagamoorthy, V, Allen, P, Hammond, M and Evans, M (2012) Researching the user experience for connected Tv: a case study. CHI ‘12 Extended Abstracts on Human Factors in Computing Systems. New York, NY, USA: ACM, pp. 589–604. doi:10.1145/2212776.2212832.CrossRef Google Scholar

Voulodimos, A, Doulamis, N, Doulamis, A and Protopapadakis, E (2018) Deep learning for computer vision: a brief review. Computational Intelligence and Neuroscience 2018, 7068349, 13 pages.CrossRef Google Scholar PubMed

Wang, M and Deng, W (2018) Deep face recognition: a survey. ArXiv Preprint ArXiv:1804.06655.Google Scholar

Wolf, L, Hassner, T and Maoz, I (2011) Face recognition in unconstrained videos with matched background similarity. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, pp. 529–534.Google Scholar

Woods, RL and Satgunam, P (2011) Television, computer and portable display device use by people with central vision impairment. Ophthalmic and Physiological Optics 31, 258–274CrossRef Google Scholar PubMed

World Health Organization and others (2013) Universal eye health: a global action plan 2014-2019.Google Scholar

Xu, Y, Xu, L, Li, D and Wu, Y (2011) Pedestrian detection using background subtraction assisted Support Vector Machine. 2011 11th International Conference on Intelligent Systems Design and Applications, pp. 837–842. doi:10.1109/ISDA.2011.6121761CrossRef Google Scholar

Yuheng, S and Hao, Y (2017) Image segmentation algorithms overview. CoRR, abs/1707.0. Available at http://arxiv.org/abs/1707.02051 Google Scholar

Zagoruyko, S and Komodakis, N (2016) Wide residual networks. ArXiv Preprint ArXiv:1605.07146.CrossRef Google Scholar

Zhang Zhifei, SY and Qi, H (2017) Age progression/regression by conditional adversarial autoencoder. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawai, USA.CrossRef Google Scholar

Zhu, X and Ramanan, D (2012) Face detection, pose estimation, and landmark localization in the wild. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Provicende, Rhode Island, USA, pp. 2879–2886.Google Scholar

Ziegler, C (2013) Second screen for HbbTV — Automatic application launch and app-to-app communication enabling novel TV programme related second-screen scenarios. 2013 IEEE Third International Conference on Consumer Electronics - Berlin (ICCE-Berlin), pp. 1–5. doi:10.1109/ICCE-Berlin.2013.6697990.CrossRef Google Scholar

Article contents

New access services in HbbTV based on a deep learning approach for media content analysis

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests