Vision Pro的十年前景预测

苹果展示了Vision Pro,十年磨一剑。无论如何,从设计构思创新软硬件整合,市面上暂时没有同类产品,不要说去比较了。

第一代产品,不能轻易因为一些方面不行就否定。很多都可以完善,甚至革新,当产业链和规模上去了。第三方app,续航,重量,售价(也就是成本),甚至眼睛外透有些悚然等,都会随着时间变得越来越可以接受。

同时,我看产品一般留意一些不会变的东西。这就引申到,我唯一不确定的,是它的定位,看架势苹果是要打造一个生产力工具,而不是一个娱乐设备。因为设计里面有很多执着的地方,例如可以跟外界交流,操控用手眼声音不用手柄,单眼超4K分辨率(发布强调看文本不累)等等。

什么是不会轻易变的,一是一天24小时,二是人在鼻子上架眼镜或者脸上套护目镜的不适感。

人类生产围绕着昼夜,一般一天生产(大人工作,小孩学习),平均下来八个小时或者更长。自古以来就算工具怎么变(锄头,纸笔或者电脑)这个生产时长没有大的变化,将来无论改成用什么新工具,这个时长也不会有大的变化。

我是个近视眼,戴一天的眼镜说真的鼻子耳朵挺累的。我滑雪也戴护目镜,一天下来脸也不舒服。往人脸上挂一个东西,八个小时是个大挑战,每天都挂,那基本现在的20克左右的近视眼镜就是舒适度可以忍受的极限了。这个忍受极限也不会随时间而有多大的变化。

所以,终极挑战就是,如果目标是做成一个大众能“忍受”的挂在脸上的生产力工具,必须是20克以下。按照现在的软硬件和电池的发展速度,把当前展示的Vision Pro缩到20克,我觉得就算是苹果,也不是在十年之内能做到的。况且就算做到了,也只是一个可以忍受的产品,那并不很苹果。

五年、十年后,就知道是不是我这个推理过程有什么漏洞,因为苹果并不需要去开发这样一个产品。又除非有另外一个娱乐设备在后面等着推出,这Vision Pro只不过是用来打通技术关卡而已。

编程范式转移以及其他一些

接下来三五年的人机交互方向将如何变化?

先说结论:基本就是钢铁侠里面的贾维斯(Jarvis)

老一辈的人不懂很好的使用电脑,学不会怎么使用一些app,其实原因只有一个,现在的机器,包括各种系统、软件、界面,人要先学习如何操控,想要得到结果,对于不同使用者,只有同一种正确的操控方式,你没学会,就得不到那个结果。但如果帮你出活儿的是贾维斯,那就算对方是老人家,他也照样把事情办好。因为他的界面是开放的,是灵活的,是人性化的,是只关心语义和语意的。

由此可以得出,最近所谓的高薪工作“提示工程师”(prompt engineer),也只会是昙花一现的工作,因为现在的贾维斯还在牙牙学语。当一个小孩刚开始跟世界沟通,有一段时间,只有她的父母明白,需要父母这种提示工程师,才可以让其他人跟她有效的沟通。但随着孩子长大,这个时间段通常不会延续很久。

另一个角度来继续聊人机交互。人要跟机器沟通,不能说人话,一直以来,机器听不懂人话。这里所谓听不懂,有两层意思。第一层很浅显,就是要用晦涩的编程语言来“对话”,一边人类给出定义好的指令,另一边机器执行,完成指令或者给出定义好的出错信息。第二层需要好好玩味一下,就是机器“听到”人类的话之后,不会给出自己的“理解”。也就是说机器的反应是确定的(deterministic),当然如果简单的加上随机,结果只能是变成不确定,但跟相关(relevant)是不同的。就算你跟婴儿交互,使用简单的语言,也不需要预先定义,她听到之后的反应,更不可能总是唯一确定的。这就是人跟机器一直都没法像跟人那样无缝交互的原因。

但现在大语言模型不同了,既可以跟它说人话,它还听得懂人话。

那对于软件工程师们,就出现了一种范式转移。并不是说改成用copilot或者ChatGPT帮你写代码那种转移,那是必须的。我指的是,你要把面前的机器当作人,至少当作钢铁侠的贾维斯。

举例说,你要编写一个军棋的电脑裁判,以前的你,首先要把军棋的棋子吃子规则转化成一堆if…else和switch…case代码,然后用大量标注好的棋子图库,去训练一个识别军棋棋子的模型,用它连起摄像头和你之前的评判代码。但现在,你只需要让GPT4去看一下双方的棋子,问它谁吃了谁,搞定。(我现在还在OpenAI图像的API的waitlist里面,等拿到了试一下是不是可行)

以上的是程序员和机器的沟通范式转移,下面来聊一下机器和机器之间的沟通范式转移。机器跟机器指的是代码调用其他API。

一直以来,API接口各种参数需要定义得滴水不漏。那是不得已的,因为对接的是机器;如果对接的是人,这些不单止不需要,而且是多余的,有时甚至是不可行的。

我这个星期做了一个Bing Chat翻版,当然了,Bing Chat比我的这app多许多功能。在写到其中一段逻辑,我需要把对话的内容输入给GPT4,让它根据之前对话内容,自己给出五个可能追问的问题,然后我可以把这五个问题做成五个按钮给用户一键提问。

我当时开始的提示是这样的:

Based on our discussion above, can you suggest five follow-up questions for me to ask in order to dive deeper into this topic? Answer with a JSON array of strings.

一试,就成功了。感叹GPT4的厉害之余再试,结果出错了。

它说sure thing, …. 这不是画蛇添足吗?!于是我加上一句:

nothing else. no explaination needed just the array.

然后再试,果然没有废话了。于是试一下问中文问题,结果有时返回的是五个英文问题。虽然我的system message有包含下面这句话,但是显然还是不够明确。(也不全怪它,因为是我用坚持用英语出的提示)

Keep using the same language as my first question. Do not switch to a different language.

于是再加上一句:

respond in the same language as your last answer.

就这样,无端变回英语的bug解决了。但偶尔还是会报错,因为它返回的不是正确的JSON格式,例如["asdf" "qwer" "tuio"],正确的应该是["asdf","qwer","tuio"].

于是加上强调要正确的JSON,还给了例子,算是one-shot了。最后我的提示是长这样的:

Based on our discussion above, can you suggest five follow-up questions for me to ask in order to dive deeper into this topic? Answer with a JSON array of strings, nothing else. no explaination needed just the array. make sure it is a well formed JSON array. Example: [“What is your favorite color?”, “What is your favorite food?”]. respond in the same language as your last answer.

一番操作下来,我的感悟是,在大语言模型的机器和机器交互,如果还是要回归以前的思路,就是简单如一个JSON的payload,也折腾得这么辛苦,明显这是无用功。真正的机器之间交互,如果两边都是大语言模型,那应该直接用语言而不是JSON等规范来沟通。例如GPT4生成提示直接调用MidJourney画图;又如AutoGPT自己给自己分派子任务。

每个API提供者,现在有的RESTFul,有的GraphQL,但以后估计最重要的是要提供一个自然语言接口,因为上游很有可能是一个大语言模型。由此估计,现在的OpenAI的plugin策略只是过渡期的产物,毕竟市面上现在绝大多数的API都不是大语言模型API。

这个新时代,每个后端都有必要重新写一次,每个应用都有必要重新发明一次。很多人担心程序员失业,但是灭霸被击败前,钢铁侠们还是要好好活着,哈哈。

Troubleshoot ssh – Permission denied (publickey) error

ssh my_dev_box_host_name

But I got this error

Permission denied (publickey).

then I turned on debuging

ssh -vvv my_dev_box_host_name

See these lines

...
debug1: Next authentication method: publickey
debug1: Offering public key: /Users/wyehearsay/.ssh/id_rsa RSA SHA256:FzMfrbORgYEtcIaWTg2iZOBctxYeNZ9bz/vFxLLtefw agent
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
...

So the public key has been attempted. But server rejected it for some reason.
Note: the same public/private key pair works on another machine to ssh into the same dev box.
Then I logged in to the dev box (via the working machine). And tailing the auth.log.

less +F /var/log/auth.log

At the same time, trying again to ssh from the machine that has issue. Then I noticed this log.

Invalid user xyz from 10.11.50.126 port 50310

Problem solved. On the machine that has issue, I am using a different username (xyz) on the MacOS. While the user defined on chef for the dev box is abc. To fix it, just need to

ssh abc@my_dev_box_host_name

Alternatively, put the username to ssh.config

my_dev_box_host_name
    User abc

Done.

[Notes] GraphQL vs. REST: What you didn’t know

The focus with GraphQL is more on how data is queried and less on how resources are modeled.

To start with, a GraphQL query could map to many different resolving functions, any of which could fail. As a result, a response could be partially successful and partially failed at the same time.

This vulnerability exists with other servers as well, but in the case of a GraphQL server, your API schema may expose potentially complex and expensive query patterns that could bring down your system easily.

[Notes] The No Code Delusion

Many businesses fail attempting to do digital transformation to access these benefits. The downside of trying to make this jump is that suddenly you’re becoming, at least in part, a software development company. Surprise: most companies are not good at this! A software environment is one of infinite possibility because most everything is achievable, with enough resource (time, money, people).

But the representation of the logic doesn’t reduce the fundamental complexity of the thing that it describes. In the same I way I can write “two” and “2” and mean the same thing, there are many ways of writing out business logic.

That means no code is possible if the end result involves no logic or just simple logic. At the same time, if logic is not heavily involved then that wouldn’t be the software system that the company is looking for so badly, and won’t be able to get away from being democratized.

as an example, you can define extremely complex software in Salesforce Cloud, without having to write a single line of code. It’s a mix of visual programming, basic rule setting and configuration.

With “no code”, it tends to be difficult or impossible to have a non-production environment. Even if you do have one, accurately copying changes over from one to the other is non-trivial. Salesforce has some excellent tooling available to make this work, and even in that environment it’s extremely difficult to do.

There are many tools which, while not “no code” per se, also allow users to produce more technical output. My favourite example is Looker, the business intelligence tool, but there are many such in different niches. As an aside on Looker: I find it extremely interesting that a lot of the model development in that environment happens in plain text, using regular software development tooling. I think this is one of the reasons it has ended up being successful.

[Notes] Responsibilities that are mistakenly seen as Scrum Master work

https://link.medium.com/LSCnr0TjA0

As you can see the Scrum Master is NOT the person to update people on the progress. This is for the Product Owner and the Development Team to do and they make use of the artifacts and the Sprint Review to achieve this. This makes other types of progress reporting redundant.

Extra report may be the single most visible sign of a team not being truly agile.

[Notes] 10 Signs You Don’t Do Continuous Delivery

10 Signs You Don’t Do Continuous Delivery

My take away is, all those things created throughout decades to help smoothen the process of pushing code once in a “long” while become evils in the CD world.

By “long”, it could be a day, a week, a month, or a year.

Those things include release management, feature branches, dedicated deployment team, village exercise, etc.

The right way to deal with those evils is undo it.

All the planning, meetings, approvals, tickets, politics, and general BS associated with managing “releases” will actually slow you down or kill your business in today’s world.

Go to market team can do whatever “release” they want. But engineering team should not care. In this case, a “release” should be no more than a summary of what have been gone live in the past.

[Notes] Strategies for implementing user authentication in serverless applications

https://serverless.com/blog/strategies-implementing-user-authentication-serverless-applications/

Typically, you would store session data in either Redis or Memcached. But for Serverless projects, it makes sense to use hosted datastores instead—Amazon ElastiCache or DynamoDB, Google Cloud Datastore, etc.

AWS Lambda offers a convenient way to perform authentication outside of your core functions. With API Gateway’s Custom Authorizers, you can specify a separate Lambda function that is only going to take care of authenticating your users.

Still a lot of heavy-lifting. The whole point of going serverless is to focusing more on coding the business logic. 😦